Yes Jeff Thanks again.
I could successfully run standalone TF training application with
Tensorboard on docker container. Will definitely take care of silent ssh
once I start with Distributed TF..



On Tue, Feb 19, 2019 at 9:44 PM Jeff Hubbs <jhubbsl...@att.net> wrote:

> Great, Vinay - I'm glad that made a difference. When you get to the point
> where you are running a cluster, the same sort of thing will have to carry
> over to all nodes, with the added issue that ssh and keys must be
> configured such that each of those users can shell to other nodes without
> supplying a password.
>
> On 2/18/19 11:41 PM, Vinay Kashyap wrote:
>
> Perfect Jeff, I clearly understand.
> After changing the setup to the appropriate users and folder permissions,
> I can see some progress..
>
> Cheers..
>
> On Fri, Feb 15, 2019 at 10:05 AM Jeff Hubbs <jhubbsl...@att.net> wrote:
>
>> On 2/14/19 11:09 PM, Vinay Kashyap wrote:
>>
>> I am running hadoop on my mac and all the folders have *myuser:staff* as
>> the owner. I have verified the permissions for the local dirs to be 755.
>>
>> This doesn't sound right. By-the-book, there are supposed to be separate
>> "users" for hdfs, yarn, and mapred to run their respective daemons. The
>> directories they read/write in are supposed to be permed and owned to
>> expect that. One possible approach for purposes of log-writing etc. is to
>> put those user accounts in a group (perhaps named "hadoop") so that
>> read/written areas in common are owned by that group and permed accordingly.
>>
>> If you're going to ad-lib that arrangement then you'll have to ad-lib a
>> lot of the rest of how worker nodes and edge nodes behave accordingly.
>>
>> I run all hadoop services with myuser and I have configured
>> *yarn.nodemanager.linux-container-executor.group**=staff *accordingly
>> both in *yarn-site.xml* and *container-executor.cfg*
>>
>> 1. Is the container-executor binary certified to work as expected on
>> OSX.?
>> 2. When linux container executor is configured, is there any hard
>> expectation that users of the running hadoop services to be part of [*root,
>> hdfs, yarn...*] and group to be *hadoop*.? So that the directory
>> permissions fall in line accordingly?
>>
>> Can you please help me understand this.? Could not find any write up on
>> this.
>>
>> On Thu, Feb 14, 2019 at 11:13 PM Prabhu Josephraj <pjos...@cloudera.com>
>> wrote:
>>
>>> In case of Distributed Shell Job - ApplicationMaster runs in normal
>>> linux container and the subsequent shell command runs inside Docker
>>> container. The job fails even before launching AM, that is before
>>> starting Docker Container. I think the Distributed Shell job will fail even
>>> without Docker Settings.
>>>
>>> As per the error code 20 , it is mostly related to accessing of NM local
>>> directory.
>>>
>>>
>>> https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html
>>>
>>> 20
>>>
>>> INITIALIZE_USER_FAILED
>>>
>>> Couldn't get, stat, or secure the per-user NodeManager directory.
>>>
>>> Can we try below steps on (all) NodeManager machine.
>>>
>>> Remove all contents under /data/yarn and make sure the /data and
>>> /data/yarn directory permission is 755 with owner root:root and local
>>> directory
>>> is owned by yarn:hadoop.
>>>
>>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /
>>> drwxr-xr-x.   5 root root    44 Oct 24 11:47 data
>>>
>>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
>>> drwxr-xr-x. 4 root      root   28 Oct 24 14:30 yarn
>>>
>>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
>>> total 4
>>> drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
>>> drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log
>>>
>>> And also check if Distributed Shell jobs runs fine without Docker
>>> Settings.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap <vinu.k...@gmail.com>
>>> wrote:
>>>
>>>> Hi Prabhu,
>>>>
>>>> Thanks for your reply.
>>>> I tried the configurations as per your suggestion. But I get the
>>>> same error.
>>>> Is this related to container localization by any chance?.
>>>> Also, is there any log or out information which says that the docker
>>>> container runtime has been picked up.?
>>>>
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj <pjos...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Hi Vinay,
>>>>>
>>>>>     Can you try specifying below configs under Docker section in
>>>>> container-executor.cfg which will allow Docker Containers to use the NM
>>>>> Local Dirs.
>>>>>
>>>>>
>>>>> docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
>>>>>       docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log
>>>>>
>>>>> Thanks,
>>>>> Prabhu Joseph
>>>>>
>>>>> On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap <vinu.k...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I am using Hadoop 3.2.0 and trying to run a simple application in a
>>>>>> docker container and I have made the required configuration changes both 
>>>>>> in
>>>>>> *yarn-site.xml* and *container-executor.cfg* to choose
>>>>>> LinuxContainerExecutor and docker runtime.
>>>>>>
>>>>>> I use the example of distributed shell in one of the hortonworks
>>>>>> blog.
>>>>>> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
>>>>>>
>>>>>> The problem I face here is when the application is submitted to YARN
>>>>>> it fails with a reason related to directory creation issue with the below
>>>>>> error
>>>>>>
>>>>>> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
>>>>>> report from ASM for, appId=2, clientToAMToken=null,
>>>>>> appDiagnostics=Application application_1550156488785_0002 failed 2 times
>>>>>> due to AM Container for appattempt_1550156488785_0002_000002 exited with
>>>>>> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
>>>>>> 20:51:16.282]Application application_1550156488785_0002 initialization
>>>>>> failed (exitCode=20) with output: main : command provided 0 main : user 
>>>>>> is
>>>>>> myuser main : requested yarn user is myuser Failed to create directory
>>>>>> /data/yarn/local/nmPrivate/container_1550156488785_0002_02_000001.tokens/usercache/myuser
>>>>>> - Not a directory
>>>>>>
>>>>>> I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and
>>>>>> I can see the same reflected in YARN web ui *localhost:8088/conf*
>>>>>>
>>>>>> <property>
>>>>>>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>     <value>/data/yarn/local</value>
>>>>>>     <final>false</final>
>>>>>>     <source>yarn-site.xml</source>
>>>>>> </property>
>>>>>>
>>>>>> I do not understand why is it trying to create usercache dir inside
>>>>>> the nmPrivate directory.
>>>>>>
>>>>>> Note : I have verified the permissions for myuser to the directories
>>>>>> and also have tried clearing the directories manually as suggested in a
>>>>>> related post. But no fruit. I do not see any additional information about
>>>>>> container launch failure in any other logs.
>>>>>>
>>>>>> How do I debug why the usercache dir is not resolved properly??
>>>>>>
>>>>>> Really appreciate any help on this.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Vinay Kashyap
>>>>>>
>>>>>
>>>>
>>>> --
>>>> *Thanks and regards*
>>>> *Vinay Kashyap*
>>>>
>>>
>>
>> --
>> *Thanks and regards*
>> *Vinay Kashyap*
>>
>>
>>
>
> --
> *Thanks and regards*
> *Vinay Kashyap*
>
>
>

-- 
*Thanks and regards*
*Vinay Kashyap*

Reply via email to