[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595949#comment-16595949 ]
Wangda Tan commented on YARN-8569: ---------------------------------- [~eyang], As we discussed offline, the use case is not clear to me. For example, TF doesn't assume tasks set will be changed during job run, and TF_CONFIG (which includes all endpoints of each instances) can be calculated upfront before launching service. There could be some services can be benefit from the spec, however, we cannot find a strong real world example to support this feature. In spite of above comment, to me it is always good to have new features added. However there're two big concerns not addressed: 1) Frequently syncing service spec could cause cluster performance issue, a one thousand component instances service can cause several thousands of DFS calls for any service state change. Even if the feature is on by demand, a non-malicious user can still cause big perf impact to a cluster. 2) Changes of this patch are very involved, all the way from native service to C code. It seems that local_dir is passed by user's command, is it possible a malicious user can use c-e to update service spec of other users which can potentially cause security issues? I haven't checked much details yet, just want to point out possibilities. Instead of doing this, to me it gonna be very useful to write container-specific information to spec or even ENV, such as resource, etc. It requires much less changes and 70-80% of issues can be addressed. > Create an interface to provide cluster information to application > ----------------------------------------------------------------- > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Eric Yang > Assignee: Eric Yang > Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org