[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602345#comment-16602345 ]
Wangda Tan commented on YARN-8569: ---------------------------------- [~eyang], I still think it is a bad idea to support "sys info" by using the API. As mentioned in the doc, the info appears with possible delay after container launch. So containers need to add additional logics to get what is the resources allocated to the container. However, such "sys info" for individual containers should be created when container got launched. (Do you think does it make sense which "/sys/fs/cgroups or /" created minutes after process launched by OS?). Second, given this is completely created/managed by AM, we should never call it "/hadoop/yarn/sysfs" or "cluster info", etc., as well as APIs, given it is not controlled by YARN. It should be renamed to "am-info" or "app-info". > Create an interface to provide cluster information to application > ----------------------------------------------------------------- > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Eric Yang > Assignee: Eric Yang > Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org