[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

Eric Yang (JIRA) Wed, 05 Sep 2018 10:14:12 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604686#comment-16604686
 ]


Eric Yang commented on YARN-8569:
---------------------------------

YARN localizer only support tarball, archive, and individual file.  It does not 
support directory with files in it.  This causes docker to mount the path to a 
specific file instead of a directory with files in it.  We get conflicts of 
double mounting the same path when merge patch 1 and patch 5 together with 
attempt to double mount the same subdirectory:

{code}
{
                "Type": "bind",
                "Source": 
"/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1536164978190_0001/container_1536164978190_0001_01_000005/sysfs",
                "Destination": "/hadoop/yarn/sysfs",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": 
"/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1536164978190_0001/filecache/10/service.json",
                "Destination": "/hadoop/yarn/sysfs/service.json",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
{code}

Docker would error out at this point.  Therefore, I am going to try to generate 
an archive or tarball with service.json in it, and let localizer decompress the 
directory, and mount the directory.  The follow up update logic will locate the 
localizer directory and replace information in the localizer directory.  In 
case, if anyone wonder that why go through the compress service.json file to 
tarball step in the next patch.

> Create an interface to provide cluster information to application
> -----------------------------------------------------------------
>
>                 Key: YARN-8569
>                 URL: https://issues.apache.org/jira/browse/YARN-8569
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

Reply via email to