[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

Wangda Tan (JIRA) Tue, 28 Aug 2018 22:09:34 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595949#comment-16595949
 ]


Wangda Tan commented on YARN-8569:
----------------------------------

[~eyang],

As we discussed offline, the use case is not clear to me. For example, TF 
doesn't assume tasks set will be changed during job run, and TF_CONFIG (which 
includes all endpoints of each instances) can be calculated upfront before 
launching service. There could be some services can be benefit from the spec, 
however, we cannot find a strong real world example to support this feature. 

In spite of above comment, to me it is always good to have new features added. 
However there're two big concerns not addressed:

1) Frequently syncing service spec could cause cluster performance issue, a one 
thousand component instances service can cause several thousands of DFS calls 
for any service state change. Even if the feature is on by demand, a 
non-malicious user can still cause big perf impact to a cluster.

2) Changes of this patch are very involved, all the way from native service to 
C code. It seems that local_dir is passed by user's command, is it possible a 
malicious user can use c-e to update service spec of other users which can 
potentially cause security issues? I haven't checked much details yet, just 
want to point out possibilities.

Instead of doing this, to me it gonna be very useful to write 
container-specific information to spec or even ENV, such as resource, etc. It 
requires much less changes and 70-80% of issues can be addressed. 

> Create an interface to provide cluster information to application
> -----------------------------------------------------------------
>
>                 Key: YARN-8569
>                 URL: https://issues.apache.org/jira/browse/YARN-8569
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-8569.001.patch, YARN-8569.002.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

Reply via email to