Great, Vinay - I'm glad that made a difference. When you get to the point where you are running a cluster, the same sort of thing will have to carry over to all nodes, with the added issue that ssh and keys must be configured such that each of those users can shell to other nodes without supplying a password.

On 2/18/19 11:41 PM, Vinay Kashyap wrote:
Perfect Jeff, I clearly understand.
After changing the setup to the appropriate users and folder permissions, I can see some progress..

Cheers..

On Fri, Feb 15, 2019 at 10:05 AM Jeff Hubbs <jhubbsl...@att.net <mailto:jhubbsl...@att.net>> wrote:

    On 2/14/19 11:09 PM, Vinay Kashyap wrote:
    I am running hadoop on my mac and all the folders have
    *myuser:staff* as the owner. I have verified the permissions for
    the local dirs to be 755.

    This doesn't sound right. By-the-book, there are supposed to be
    separate "users" for hdfs, yarn, and mapred to run their
    respective daemons. The directories they read/write in are
    supposed to be permed and owned to expect that. One possible
    approach for purposes of log-writing etc. is to put those user
    accounts in a group (perhaps named "hadoop") so that read/written
    areas in common are owned by that group and permed accordingly.

    If you're going to ad-lib that arrangement then you'll have to
    ad-lib a lot of the rest of how worker nodes and edge nodes behave
    accordingly.

    I run all hadoop services with myuser and I have configured
    /yarn.nodemanager.linux-container-executor.group/*=staff
    *accordingly both in *yarn-site.xml* and *container-executor.cfg*

    1. Is the container-executor binary certified to work as expected
    on OSX.?
    2. When linux container executor is configured, is there any hard
    expectation that users of the running hadoop services to be part
    of [*root, hdfs, yarn...*] and group to be *hadoop*.? So that the
    directory permissions fall in line accordingly?

    Can you please help me understand this.? Could not find any write
    up on this.

    On Thu, Feb 14, 2019 at 11:13 PM Prabhu Josephraj
    <pjos...@cloudera.com <mailto:pjos...@cloudera.com>> wrote:

        In case of Distributed Shell Job - ApplicationMaster runs in
        normal linux container and the subsequent shell command runs
        inside Docker
        container. The job fails even before launching AM, that is
        before starting Docker Container. I think the Distributed
        Shell job will fail even
        without Docker Settings.

        As per the error code 20 , it is mostly related to accessing
        of NM local directory.

        
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html

        20

                

        INITIALIZE_USER_FAILED

                

        Couldn't get, stat, or secure the per-user NodeManager directory.


        Can we try below steps on (all) NodeManager machine.

        Remove all contents under /data/yarn and make sure the /data
        and /data/yarn directory permission is 755 with owner
        root:root and local directory
        is owned by yarn:hadoop.

        [root@tparimi-tarunhdp26-4 ~]# ls -lrt /
        drwxr-xr-x.   5 root root    44 Oct 24 11:47 data

        [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
        drwxr-xr-x. 4 root      root   28 Oct 24 14:30 yarn

        [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
        total 4
        drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
        drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log

        And also check if Distributed Shell jobs runs fine without
        Docker Settings.





        On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap
        <vinu.k...@gmail.com <mailto:vinu.k...@gmail.com>> wrote:

            Hi Prabhu,

            Thanks for your reply.
            I tried the configurations as per your suggestion. But I
            get the same error.
            Is this related to container localization by any chance?.
            Also, is there any log or out information which says that
            the docker container runtime has been picked up.?



            On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj
            <pjos...@cloudera.com <mailto:pjos...@cloudera.com>> wrote:

                Hi Vinay,

                    Can you try specifying below configs under Docker
                section in container-executor.cfg which will allow
                Docker Containers to use the NM Local Dirs.

                
docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
                docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log

                Thanks,
                Prabhu Joseph

                On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap
                <vinu.k...@gmail.com <mailto:vinu.k...@gmail.com>> wrote:


                    I am using Hadoop 3.2.0 and trying to run a
                    simple application in a docker container and I
                    have made the required configuration changes both
                    in */yarn-site.xml/* and
                    */container-executor.cfg/* to choose
                    LinuxContainerExecutor and docker runtime.

                    I use the example of distributed shell in one of
                    the hortonworks blog.
                    
https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/

                    The problem I face here is when the application
                    is submitted to YARN it fails with a reason
                    related to directory creation issue with the
                    below error

                        2019-02-14 20:51:16,450 INFO
                        distributedshell.Client: Got application
                        report from ASM for, appId=2,
                        clientToAMToken=null,
                        appDiagnostics=Application
                        application_1550156488785_0002 failed 2 times
                        due to AM Container for
                        appattempt_1550156488785_0002_000002 exited
                        with exitCode: -1000 Failing this
                        attempt.Diagnostics: [2019-02-14
                        20:51:16.282]Application
                        application_1550156488785_0002 initialization
                        failed (exitCode=20) with output: main :
                        command provided 0 main : user is myuser main
                        : requested yarn user is myuser Failed to
                        create directory
                        
/data/yarn/local/nmPrivate/container_1550156488785_0002_02_000001.tokens/usercache/myuser
                        - Not a directory

                    I have configured
                    *yarn.nodemanager.local-dirs* in yarn-site.xml
                    and I can see the same reflected in YARN web ui
                    *localhost:8088/conf*

                    |<property>
                    <name>yarn.nodemanager.local-dirs</name>
                    <value>/data/yarn/local</value>
                    <final>false</final>
                    <source>yarn-site.xml</source> </property> |

                    I do not understand why is it trying to create
                    usercache dir inside the nmPrivate directory.

                    Note : I have verified the permissions for myuser
                    to the directories and also have tried clearing
                    the directories manually as suggested in a
                    related post. But no fruit. I do not see any
                    additional information about container launch
                    failure in any other logs.

                    How do I debug why the usercache dir is not
                    resolved properly??

                    Really appreciate any help on this.

                    Thanks

                    Vinay Kashyap



-- */Thanks and regards/*
            */Vinay Kashyap/*



-- */Thanks and regards/*
    */Vinay Kashyap/*




--
*/Thanks and regards/*
*/Vinay Kashyap/*


Reply via email to