[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665887#comment-16665887 ] Hudson commented on YARN-8569: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15328 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15328/]) YARN-8569. Create an interface to provide cluster information to (billie: rev d07e873b7db6cb317eccb4768607c1afb505c99b) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/string-utils.h * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/client/ServiceClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/utils/ServiceApiUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRunCommand.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceAM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/string-utils.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMWebServices.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/utils/HttpUtil.java > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf,
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665803#comment-16665803 ] Hadoop QA commented on YARN-8569: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 56s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 147 unchanged - 1 fixed = 147 total (was 148) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 36s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 37s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color}
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664254#comment-16664254 ] Eric Yang commented on YARN-8569: - The failed test cases are not related to this patch. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch, > YARN-8569.015.patch, YARN-8569.016.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664143#comment-16664143 ] Hadoop QA commented on YARN-8569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 57s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 146 unchanged - 1 fixed = 146 total (was 147) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 11s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 22s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662884#comment-16662884 ] Billie Rinaldi commented on YARN-8569: -- Thanks [~eyang]! I am +1 for patch 16, pending precommit build. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch, > YARN-8569.015.patch, YARN-8569.016.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662826#comment-16662826 ] Eric Yang commented on YARN-8569: - [~billie.rinaldi] Thank you for the review. Patch 16 includes your suggestion. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch, > YARN-8569.015.patch, YARN-8569.016.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662804#comment-16662804 ] Billie Rinaldi commented on YARN-8569: -- Thanks, [~eyang]. I tested out patch 15 in an insecure environment and it worked. One thing I noticed is that the AM log says "YARN sysfs synchronized" when no components have sysfs configured, so the sysfs has not actually been synchronized. It would be nice to skip this output when sysfs is not configured. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch, > YARN-8569.015.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659797#comment-16659797 ] Hadoop QA commented on YARN-8569: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 147 unchanged - 1 fixed = 147 total (was 148) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 9s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 45s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color}
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659618#comment-16659618 ] Eric Yang commented on YARN-8569: - Patch 015 rebased on YARN-8922. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch, > YARN-8569.015.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657452#comment-16657452 ] Eric Yang commented on YARN-8569: - [~rkanter] Thank you for filing the problem as separate JIRA. I will remove the hotfix from this JIRA, and link both JIRA together. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657385#comment-16657385 ] Robert Kanter commented on YARN-8569: - [~eyang], I think I've come up with the proper fix for the issue. I've created YARN-8922 to take care of it; please take a look. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657311#comment-16657311 ] Hadoop QA commented on YARN-8569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 49s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 146 unchanged - 1 fixed = 146 total (was 147) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 28s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 46s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} |
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657208#comment-16657208 ] Robert Kanter commented on YARN-8569: - [~eyang], I tried it out and with the YARN-8569 014 patch, test-container-executor fails when run as root: {noformat} [root@rkanter-dev hadoop-yarn-server-nodemanager]# target/native/target/usr/local/bin/test-container-executor systest Attempting to clean up from any previous runs chmod: cannot access ‘/tmp/test-container-executor’: No such file or directory Our executable is /root/hadoop-upstream/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor Starting tests test_is_empty() Testing is_empty function Directory is not empty / Could not open directory /tmp/test-container-executor/noexist - No such file or directory Could not open directory /tmp/test-container-executor/emptydir - No such file or directory FAIL: /tmp/test-container-executor/emptydir should be empty {noformat} It looks like it can't create the {{/tmp/test-container-executor/emptydir}} directory. Here's the output when run on trunk: {noformat} [root@rkanter-dev hadoop-yarn-server-nodemanager]# target/native/target/usr/local/bin/test-container-executor systest Attempting to clean up from any previous runs chmod: cannot access ‘/tmp/test-container-executor’: No such file or directory Our executable is /root/hadoop-upstream/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor Starting tests test_is_empty() Testing is_empty function Directory is not empty / Could not open directory /tmp/test-container-executor/noexist - No such file or directory Testing recursive_unlink_children() Testing resolve_config_path() Testing resolve_config_path Testing get_user_directory() Testing check_nm_local_dir() Testing get_app_directory() Testing get_container_work_directory() Testing get_container_launcher_file() Testing get_container_credentials_file() Testing get_container_keystore_file() Testing get_container_truststore_file() Testing get_app_log_dir() Testing check_configuration_permissions File /tmp/test-container-executor must not be world or group writable, but is 777 Testing delete_container() Testing delete_app() Testing delete race Testing is_feature_enabled() Illegal value '1klajdflkajdsflk' for 'feature.name3.enabled' in configuration. Using default value: 0. Illegal value 'asdkjfasdkljfklsdjf0' for 'feature.name4.enabled' in configuration. Using default value: 0. Illegal value '-1' for 'feature.name5.enabled' in configuration. Using default value: 1. Illegal value '2' for 'feature.name6.enabled' in configuration. Using default value: 0. Testing test_check_user Requested user lp is not whitelisted and has id 4,which is below the minimum allowed 500 Running as root is not allowed Testing clean_docker_cgroups clean_docker_cgroups: Invalid mount table clean_docker_cgroups: Invalid yarn_hierarchy clean_docker_cgroups: Invalid container_id: null clean_docker_cgroups: Invalid container_id: not_a_container_123 Running test test_signal_container_group in child process Testing group signal_container Child container launched as 16189 Killing process group 16189 with 9 Testing init app Testing launch container without HTTPS Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /tmp/test-container-executor/pid.txt.tmp Writing to cgroup task files... Creating local dirs... Launching container... Testing launch container with HTTPS Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /tmp/test-container-executor/pid.txt.tmp Writing to cgroup task files... Creating local dirs... Launching container... Testing delete_user baseDir "/tmp/test-container-executor/local-1/usercache/systest/appcache/app_3/test.cfg" is a file and cannot contain subdir "file1". 0 Trying banned default user() Testing test_check_user Requested user bin is banned Running as root is not allowed Testing test_check_user User sys not found Running as root is not allowed Testing trim function Finished tests {noformat} > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, >
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657096#comment-16657096 ] Robert Kanter commented on YARN-8569: - Sorry about that. I had to fix a number of problems with test-container-executor to get it to run under root, which runs also runs a number of additional tests (I think the pre-commit build must only run it as a normal user). I had changed the dir permissions as part of that, but perhaps it wasn't necessary. I'll double check that it still works as root, using your patch here. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657077#comment-16657077 ] Eric Yang commented on YARN-8569: - Patch 14 includes the test-container-executor hot fix to pass pre-commit build and rebase to current trunk. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657073#comment-16657073 ] Eric Yang commented on YARN-8569: - Unit test failure is caused by YARN-8448, where /tmp/test-container-executor is changed to 0777, but cetest expects it to be 0755. [~rkanter] By changing the /tmp/test-container-executor to 0755, it seems to pass fine. Hence, I include the change in this JIRA instead of reopening YARN-8448. Let me know if this works for you. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657027#comment-16657027 ] Eric Yang commented on YARN-8569: - Node manager unit test failure is caused by YARN-8449 where the tmp directory is not clean up after test, and not related to changes in this patch. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656163#comment-16656163 ] Hadoop QA commented on YARN-8569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 4s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 147 unchanged - 1 fixed = 147 total (was 148) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 30s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 47s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} |
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656050#comment-16656050 ] Eric Yang commented on YARN-8569: - Patch 13 addressed issues to run this patch in non-secure cluster. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654432#comment-16654432 ] Billie Rinaldi commented on YARN-8569: -- Adding a file name parameter later wouldn't need to be an incompatible API change; we could have the file name default to app.json when not specified. I'm not sure it makes sense to support more than one sysfs file per application, so I don't have a problem with it having a single file name for the initial implementation. Since sysfs-aware apps are going to have to read this file, it makes sense for it to be in a consistent location. Another future consideration might be whether apps would want to have access to the same information in other file formats besides JSON. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647222#comment-16647222 ] Hadoop QA commented on YARN-8569: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 146 unchanged - 1 fixed = 146 total (was 147) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 5s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 6s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color}
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647030#comment-16647030 ] Eric Yang commented on YARN-8569: - [~leftnoteasy] . Patch 12 includes set_user fix, and added sysfs path name to nmPrivate/[appId]/sysfs. In the current implementation, the sync will make n rest api call to node managers where n is less than total number of bare metal host that hosting containers. This reduced required network traffic to keep information in sync. With the introduced sysfs prefix in nmPrivate directory, it will pave ways to add more than just app.json to sysfs directory and prevent path traversal attack. When information is populated into other files, then there is higher chance of race condition, where state is changed in some files but not all files. Multiple files population mechanism will require more thoughts to keep the information transactional. The first version does not add add container id, or arbitrary filename support to reduce the transaction commits and ensure the information propagation is idempotent from container point of view. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645324#comment-16645324 ] Eric Yang commented on YARN-8569: - {quote}If you will do #1, IIUC, the #2 has to be done, otherwise as a normal user we cannot read from nmPrivate dir. {quote} Line 2632-2638 of container-executor does exactly as open file as node manager user, and it already drop privileges while writing the file. I am not disagree with what you said, but it is already part of the patch. {quote}We already limit to read file under .../nmPrivate/app../sys/fs/ correct? How it is possible to read token file from that directory?{quote} App.json is writing to nmPrivate/[appid]/app.json, there is no sub-directory in nmPrivate/[appid]/sysfs. Sysfs directory only exists in distributed cache location. Good security practice is to make generated paths have hard coded depth and fix filename to prevent path exploits. If we allow API to specify arbitrary filename, user can easy specify "sysfs/../container_tokens" from rest api to defeat security measurement that were put in place. Readlink can flatten the path, but there is no real indicator that source can be copied. For now, I can do source path check to make sure it is jailed in nmPrivate/[appid]/sysfs path. However, the first version of API will not accept arbitrary filename to minimize security concerns. It would be easy to extend the REST API method with support of arbitrary filename when more thoughts are given. I leave that responsibility to the one that wants to bare the problem. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644427#comment-16644427 ] Wangda Tan commented on YARN-8569: -- [~eyang], {quote}In sync_yarn_sysfs, check_nm_local_dir make sure the nm_local_dir is owned by yarn user. {quote} If you will do #1, IIUC, the #2 has to be done, otherwise as a normal user we cannot read from nmPrivate dir. {quote}I am concerned that tokens are localized in the same nmPrivate directory. {quote} We already limit to read file under .../nmPrivate/app../sys/fs/ correct? How it is possible to read token file from that directory? {quote}Point 4 is a bit concerning to me. I am error on side of caution.. .. {quote} Given the file hierarchy is part of the API, we should make it correct in the first cut. Any changes in the future will be an incompatible change. So I prefer to get the API correct in the first patch. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644277#comment-16644277 ] Eric Yang commented on YARN-8569: - {quote} 1) We use use set_user before the method to drop privileges. {quote} Good catch, check_user will be applied in the next patch. {quote} 2) Root privilege should be only used by open_file_as_nm. {quote} In sync_yarn_sysfs, check_nm_local_dir make sure the nm_local_dir is owned by yarn user. This ensure the copy file operation is safe guarded against possible path exploits. This is equivalent to next point to ensure nm local dirs are indeed owned by yarn. {quote} 3) We should check all in/out file paths under local dirs defined by container_executor.cfg. {quote} Initialize container, launch container, launch docker container, they are all using nm-local-dirs path passed from node manager to container executor. There is no definition of yarn.nodemanager.local-dirs in container-executor.cfg file. For containing the scope of this JIRA, I opened YARN-8863 to track the proposed global pattern change. {quote} 4) Regarding to API, I think we should allow filename specified. {quote} I am concerned that tokens are localized in the same nmPrivate directory. If filename can be specified, it may increase vulnerable surface for attack to use the REST API to copy hdfs token into sysfs directory. I don't unsure this appear to be innocent action is safe. {quote}5) For the mounted dir for container, for app-level information, I would prefer to put under ...sys/fs/app/ instead of ...sys/fs/ since we want to support per-container information in the future.{quote} Point 4 is a bit concerning to me. I am error on side of caution. Therefore, I do not plan to make arbitrary filename change in this JIRA. Please open a separate one, if you feel strongly about supporting custom filename in the rest API. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644223#comment-16644223 ] Wangda Tan commented on YARN-8569: -- Discussed with [~eyang]/[~suma.shivaprasad] offline, Several comments: 1) We use use set_user before the method to drop privileges. 2) Root privilege should be only used by open_file_as_nm. 3) We should check all in/out file paths under local dirs defined by container_executor.cfg. 4) Regarding to API, I think we should allow filename specified. 5) For the mounted dir for container, for app-level information, I would prefer to put under ...sys/fs/app/ instead of ...sys/fs/ since we want to support per-container information in the future. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644071#comment-16644071 ] Suma Shivaprasad commented on YARN-8569: [~eyang] Thanks for updating the patch. +1 > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643972#comment-16643972 ] Hadoop QA commented on YARN-8569: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 148 unchanged - 1 fixed = 148 total (was 149) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 39s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 42s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 57s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color}
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642483#comment-16642483 ] Suma Shivaprasad commented on YARN-8569: [~eyang] Patch 10 Looks good. A few minor checkstyle issues would need to be resolved. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642465#comment-16642465 ] Hadoop QA commented on YARN-8569: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 29s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 14 new + 146 unchanged - 1 fixed = 160 total (was 147) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 17s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black}
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642226#comment-16642226 ] Eric Yang commented on YARN-8569: - Patch10 renamed YARN_CONTAINER_RUNTIME_YARN_SYSFS to YARN_CONTAINER_RUNTIME_YARN_SYSFS_ENABLE. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640484#comment-16640484 ] Suma Shivaprasad commented on YARN-8569: [~eyang] YARN_CONTAINER_RUNTIME_YARN_SYSFS_ENABLE sounds good. Thanks ! > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640461#comment-16640461 ] Eric Yang commented on YARN-8569: - [~suma.shivaprasad] Good catch that handleYarnSysFSUpdate method was out dated code, it will be removed in next patch. YARN_CONTAINER_RUNTIME_YARNSYSFS following the existing nomenclature and generate readable flag in launcher script that looks like: {code} export YARN_CONTAINER_RUNTIME_YARN_SYSFS=true {code} This looks better than "export YARN_CONTAINER_RUNTIME_EXPORT_YARN_SYSFS=true" in my opinion. The existing nomenclature looks like this: YARN_CONTAINER_RUNTIME_[TYPE|DOCKER_*|YARN_SYSFS]. How about YARN_CONTAINER_RUNTIME_YARN_SYSFS_ENABLE=true? > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640432#comment-16640432 ] Suma Shivaprasad commented on YARN-8569: Thanks [~eyang] for updating the patch. Looks like DockerLinuxContainerRuntime. handleYarnSysFSUpdate is not used. If so, can we remove it? YARN_CONTAINER_RUNTIME_YARN_SYSFS - Could we rename this to YARN_CONTAINER_RUNTIME_EXPORT_YARN_SYSFS for better understanding of its use and update documentation that it should be set to true/false. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640392#comment-16640392 ] Hadoop QA commented on YARN-8569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 11 new + 149 unchanged - 1 fixed = 160 total (was 150) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 59s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 15s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 10s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 50s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} |
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640262#comment-16640262 ] Eric Yang commented on YARN-8569: - [~suma.shivaprasad] Patch 009 includes the following fix: - Synchronize updateYarnSysFS. - Fixed file descriptor leaks. - Renamed service.json to app.json - Removed PrivilegedOperationException from ContainerExecutor to prevent implementation bleeding to abstract class. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640186#comment-16640186 ] Eric Yang commented on YARN-8569: - [~suma.shivaprasad] The list of hosts required the update is normalized in AM logic first, then one call to the node instead of calling on every container. Multiple writers to the same file shouldn't happen, but I will sync the file write access just in case. Thanks > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640094#comment-16640094 ] Suma Shivaprasad commented on YARN-8569: [~eyang] Missed this earlier. There could be potential race conditions between containers running on the same host while executing updateYarnSysFs since they are writing to the same file. Can we synchronize the file writes/deletes? > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639178#comment-16639178 ] Eric Yang commented on YARN-8569: - [~suma.shivaprasad] Thanks for the review, appinfo.json is probably better choice than AMInfo.json because the API can be used by code other than Application master even though it might be rare. Info in appinfo.json provides no additional information to the file name. How about app.json in short for easy nomenclature of naming this file? Good catch on the file close, will update in the next patch. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639205#comment-16639205 ] Suma Shivaprasad commented on YARN-8569: [~eyang] app.json sounds good. Thanks > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639012#comment-16639012 ] Suma Shivaprasad commented on YARN-8569: Thanks [~eyang] . I agree with [~leftnoteasy] that the local file name should be renamed to something like "appInfo.json" or "AMInfo.json" instead of service.json since it could be used by other custom AMs other than Yarn Service like Spark AM. Patch generally LGTM. A few minor comments ServiceClient.addFilesToCompression and DockerLInuxContainerRuntime.handleYarnSysFSUpdate are opening *Stream classes which could be closed in finally like in addYarnSysFs to prevent resource leaks. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609755#comment-16609755 ] Hadoop QA commented on YARN-8569: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 147 unchanged - 1 fixed = 151 total (was 148) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 25s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 16s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black}
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609645#comment-16609645 ] Eric Yang commented on YARN-8569: - Patch 008 fixed more check style issues, and change sync sysfs api to be based on application id instead of combination of application id and container id. This reduces the number of network requests and repetitive syncing of same cluster spec information. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608313#comment-16608313 ] Hadoop QA commented on YARN-8569: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 10s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 8 new + 147 unchanged - 1 fixed = 155 total (was 148) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 46s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 43s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 1s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 57s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black}
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607802#comment-16607802 ] Hadoop QA commented on YARN-8569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 43 new + 149 unchanged - 1 fixed = 192 total (was 150) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 0s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607714#comment-16607714 ] Eric Yang commented on YARN-8569: - [~leftnoteasy] Patch 6 addressed the following: - Use localizer to distribute initial copy of service.json in a tarball. - Mount expanded sysfs.tar to container. - Added logic to replace sysfs.tar local copy with the latest status after service reaches stable state. - Added test cases in Java and C level - On-off feature switch in container-executor.cfg to disable the feature. I didn't make sysfs REST API a generic mechanism for tarball replacer for distributed cache to prevent people from abusing this API for unintended purpose. If you still like that generic feature, please open a separate JIRA for that. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604686#comment-16604686 ] Eric Yang commented on YARN-8569: - YARN localizer only support tarball, archive, and individual file. It does not support directory with files in it. This causes docker to mount the path to a specific file instead of a directory with files in it. We get conflicts of double mounting the same path when merge patch 1 and patch 5 together with attempt to double mount the same subdirectory: {code} { "Type": "bind", "Source": "/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1536164978190_0001/container_1536164978190_0001_01_05/sysfs", "Destination": "/hadoop/yarn/sysfs", "Mode": "ro", "RW": false, "Propagation": "rprivate" }, { "Type": "bind", "Source": "/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1536164978190_0001/filecache/10/service.json", "Destination": "/hadoop/yarn/sysfs/service.json", "Mode": "ro", "RW": false, "Propagation": "rprivate" }, {code} Docker would error out at this point. Therefore, I am going to try to generate an archive or tarball with service.json in it, and let localizer decompress the directory, and mount the directory. The follow up update logic will locate the localizer directory and replace information in the localizer directory. In case, if anyone wonder that why go through the compress service.json file to tarball step in the next patch. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603707#comment-16603707 ] Wangda Tan commented on YARN-8569: -- Thanks [~eyang], And forgot to mention: if we're going to add new functionalities to c-e, it could be helpful to add a flag to c-e.cfg, and make it off by default. Just like what we did for YARN-6623 and YARN-6033 to avoid exposing security holes. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603686#comment-16603686 ] Eric Yang commented on YARN-8569: - [~leftnoteasy] {quote}How we can make it available prior to container launch? By localizing service.json for each launched container? {quote} Yes, then the new file will replace the old file by removing the symlink to replace the file in symlink. I need to experiment with this. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603680#comment-16603680 ] Wangda Tan commented on YARN-8569: -- [~eyang], How we can make it available prior to container launch? By localizing service.json for each launched container? {quote}We could also make a subdirectory in nmPrviate to ensure that prefix path is not wide open for exploit {quote} This sounds also good :). > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602513#comment-16602513 ] Eric Yang commented on YARN-8569: - {quote}As mentioned in the doc, the info appears with possible delay after container launch. So containers need to add additional logics to get what is the resources allocated to the container. However, such "sys info" for individual containers should be created when container got launched. (Do you think does it make sense which "/sys/fs/cgroups" created minutes after process launched by OS?).{quote} I plan to do another round to merge patch 1 and patch 5 to present resource information prior to container launch. However, IP addresses will not be available until docker reports them. Delay for obtaining IP is unavoidable. {quote}And in implementation, AM should have ability to write files to an arbitrary file instead of hard coded to "service.json". {quote} This restriction is to prevent hiked YARN user from leaking more senstive information like hdfs delegation token. We could also make a subdirectory in nmPrviate to ensure that prefix path is not wide open for exploit. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602350#comment-16602350 ] Wangda Tan commented on YARN-8569: -- And in implementation, AM should have ability to write files to an arbitrary file instead of hard coded to "service.json". > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602345#comment-16602345 ] Wangda Tan commented on YARN-8569: -- [~eyang], I still think it is a bad idea to support "sys info" by using the API. As mentioned in the doc, the info appears with possible delay after container launch. So containers need to add additional logics to get what is the resources allocated to the container. However, such "sys info" for individual containers should be created when container got launched. (Do you think does it make sense which "/sys/fs/cgroups or /" created minutes after process launched by OS?). Second, given this is completely created/managed by AM, we should never call it "/hadoop/yarn/sysfs" or "cluster info", etc., as well as APIs, given it is not controlled by YARN. It should be renamed to "am-info" or "app-info". > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599318#comment-16599318 ] genericqa commented on YARN-8569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 20s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 11s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 1m 11s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 11s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 8s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 28 new + 130 unchanged - 1 fixed = 158 total (was 131) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 22s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 50s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 53s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 14s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | |
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599266#comment-16599266 ] Eric Yang commented on YARN-8569: - Patch 005 fixed white space and findbugs issues. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598002#comment-16598002 ] Eric Yang commented on YARN-8569: - Patch 004 minor code clean up and added unit test for LinuxContainerExecutor changes. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597940#comment-16597940 ] Eric Yang commented on YARN-8569: - Patch 003 added the following improvements: - unit tests for container-executor - unit tests for ServiceScheduler in YARN Service AM. - Site documentation to explain YARN sysfs concept. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596578#comment-16596578 ] Eric Yang commented on YARN-8569: - {quote}There were tons of debates regarding to yarn user should be treated as root or not before. We saw some issues of c-e causes yarn user can manipulate other user's directories, or directly escalate to root user. All of these issues become CVE.{quote} If I recall correctly, I reported and fixed container-executor security issues like YARN-7590 and YARN-8207. I think I have written proper security check to make sure the caller via network has the right Kerberos tgt that matches end user's container directory and also validated the source of the data is coming from node manager private directory. There is a permission validation to copy spec file information from node manager private directory to end user container directory. This design is similar to transporting delegation tokens to container working directory. I think there are good enough security validations to ensure no security hole has been added to this work. Let me know if you find security holes. {quote} >From YARN's design purpose, ideally all NM/RM logics should be as general as >possible, all service-related stuffs should be handled by service framework >like API server or ServiceMaster. I really don't like the idea of adding >service-specific API to NM API.{quote} The new API is not YARN service framework specific. ContainerExecutor provide basic API for starting, stopping, and clean up containers, but it is missing more sophisticated API like synchronize configuration among containers. The new API syncYarnSysFS is proposed to allow ContainerExecutor developer to write their own implementation of populating text information to /hadoop/yarn/sysfs. Custom AM can be written to use the new API and populate other text information. The newly added API in node manager is generic and avoid the double serialization cost exists in container manager protobuf and rpc code. There is no extra serialization cost of the content during transport, thus the new API is more efficient and light weight. There is nothing specific to YARN service, although YARN service is the first consumer for this API. {quote} 1) ServiceMaster ro mount a local directory (under the container's local dir) when launch docker container (example like: ./service-info -> /service/sys/fs/) 2) ServiceMaster request to re-localize new service spec json file to the ./service-info folder. {quote} What is "ro mount" in the first sentence? Is it remote mount or read-only mount? ServiceMaster is not guarantee to run on the same node as other container. Hence, there is no practical way to mount service master's directory to container's local directory cross nodes. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596014#comment-16596014 ] Wangda Tan commented on YARN-8569: -- [~eyang], {quote}Unless malicious user already hacked into yarn user account and populate data as yarn user, there is no easy parameter hacking to container-executor to trigger exploits. {quote} There were tons of debates regarding to yarn user should be treated as root or not before. We saw some issues of c-e causes yarn user can manipulate other user's directories, or directly escalate to root user. All of these issues become CVE. {quote}This is the reason that this solution is invented to lower the bar of writing clustering software for Hadoop. {quote} It gonna be help if you can share some real-world examples. >From YARN's design purpose, ideally all NM/RM logics should be as general as >possible, all service-related stuffs should be handled by service framework >like API server or ServiceMaster. I really don't like the idea of adding >service-specific API to NM API. If you do think update service spec json file is important, another approach could be: 1) ServiceMaster ro mount a local directory (under the container's local dir) when launch docker container (example like: ./service-info -> /service/sys/fs/) 2) ServiceMaster request to re-localize new service spec json file to the ./service-info folder. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595986#comment-16595986 ] Eric Yang commented on YARN-8569: - {quote} 1) Frequently syncing service spec could cause cluster performance issue, a one thousand component instances service can cause several thousands of DFS calls for any service state change. Even if the feature is on by demand, a non-malicious user can still cause big perf impact to a cluster. {quote} There is no dfs calls in patch 2. The spec information is transferred from AM's memory to node manager local directory. It is light weight HTTP REST calls. Ten years ago, I would agree with you that one thousand HTTP request to copy spec information to all node managers could choke some older generation of network equipment from 2005. However, network has advanced a lot in the past 10 years. Most 10GB Ethernet can achieve [one million IOPS|https://phys.org/news/2010-01-intel-million-iops-gigabit-ethernet.html]. Doing 1000 REST calls with 4kbytes payload can be completed in 1-5 millisecond and only require 4MB of bandwidth. Base on my local testing, the synchronization happens in faction of milliseconds. The described performance impact hard to achieve because the docker container launch and reach stable state took some amount of time. Spec information distribution only happens when all information are reported to AM, and AM has reached stable state. The spec information distribution is minuscule that machine stay idle through out the operation. {quote} 2) Changes of this patch are very involved, all the way from native service to C code. It seems that local_dir is passed by user's command, is it possible a malicious user can use c-e to update service spec of other users which can potentially cause security issues? I haven't checked much details yet, just want to point out possibilities. {quote} Malicious user can not use container-executor to update service spec because container-executor checks the permission of the source directory is owned by node manager before it proceed with update. This means the source of truth is all coming from yarn user. Unless malicious user already hacked into yarn user account and populate data as yarn user, there is no easy parameter hacking to container-executor to trigger exploits. The technique of local_dir passing is same as container launch, and check the local_dir is owned by yarn user. This allows us to use local_dir from yarn-site.xml instead of hard code in container-executor.cfg for local directories. Hence, this new patch to container-executor provides 0% more chance to cause exploits. {quote} Instead of doing this, to me it gonna be very useful to write container-specific information to spec or even ENV, such as resource, etc. It requires much less changes and 70-80% of issues can be addressed. {quote} Container information is already updated in spec file on HDFS by YARN service application master that is outside scope of this JIRA. The limitation of this information on HDFS is still inaccessible to container that has no knowledge about Hadoop API. This is the reason that this information is exposed as a file mounted to container without require container to carry any custom code to process cluster information. ENV approach has been identified as a non-starter because there is no way to update ENV once docker container started running. Hence, using flex command, or the container restarts on another node. ENV variable will have out-dated information because ENV variable are immutable to the cluster changes. This is the reason that this solution is invented to lower the bar of writing clustering software for Hadoop. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ >
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595949#comment-16595949 ] Wangda Tan commented on YARN-8569: -- [~eyang], As we discussed offline, the use case is not clear to me. For example, TF doesn't assume tasks set will be changed during job run, and TF_CONFIG (which includes all endpoints of each instances) can be calculated upfront before launching service. There could be some services can be benefit from the spec, however, we cannot find a strong real world example to support this feature. In spite of above comment, to me it is always good to have new features added. However there're two big concerns not addressed: 1) Frequently syncing service spec could cause cluster performance issue, a one thousand component instances service can cause several thousands of DFS calls for any service state change. Even if the feature is on by demand, a non-malicious user can still cause big perf impact to a cluster. 2) Changes of this patch are very involved, all the way from native service to C code. It seems that local_dir is passed by user's command, is it possible a malicious user can use c-e to update service spec of other users which can potentially cause security issues? I haven't checked much details yet, just want to point out possibilities. Instead of doing this, to me it gonna be very useful to write container-specific information to spec or even ENV, such as resource, etc. It requires much less changes and 70-80% of issues can be addressed. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595777#comment-16595777 ] Eric Yang commented on YARN-8569: - Patch 2 is a fully working implementation. When YARN service become stable, YARN service application master will populate the container information to /hadoop/yarn/sysfs/service.json. I will add test case and documentation in patch 3. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589430#comment-16589430 ] Eric Yang commented on YARN-8569: - >From today's docker meetup discussion, distributed cache is not ideal >interface for replicate frequently changing cluster information. If the file >checksum changes due to cluster information update, the file may not get >replicated to distributed cache. This information is more similar to token >generation and population instead of jar file distribution. I will change the >population mechanism to align with token population instead of distributed >cache. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586769#comment-16586769 ] Eric Yang commented on YARN-8569: - First patch for demo what the interface looks like. Here is my testing json for launching the app: {code} { "name": "sleeper-service", "version": "1.0", "components" : [ { "name": "ping", "number_of_containers": 2, "artifact": { "id": "hadoop/centos:latest", "type": "DOCKER" }, "launch_command": "sleep,1", "resource": { "cpus": 1, "memory": "256" }, "restart_policy": "NEVER", "configuration": { "env": { "YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true", "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true", "YARN_CONTAINER_RUNTIME_YARN_SYSFS":"true" }, "properties": { "docker.network": "host" } } } ] } {code} The patch will localize a copy of service.json in HDFS, and distribute it as a localized resource to nodes that are going to start the container. The other copy of [appname].json is not used because the content of the file changes too frequently that triggers IOException while localizing. Let me know if this is the direction that we want to continue. If all looks good, I will provide live update to this file when application completes transition between STARTED, FLEXING or STABLE, etc. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576681#comment-16576681 ] Shane Kumpf commented on YARN-8569: --- Thanks for filing this [~eyang]. I have a use case that could benefit from this as well. When running in containers, one challenging piece is determining how much CPU and memory was allocated to the container. Traditional os tooling shows the totals from the host. This is especially problematic for tools like Ambari, which use os tooling to dynamically set configuration. Exposing the resource request details via this mechanism could be used to solve this problem. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575445#comment-16575445 ] Eric Yang commented on YARN-8569: - Sysfs is a pseudo file system provided by Linux Kernel to expose system related information to user space. YARN can mimic the same ideology to export cluster information to container. The proposal is to expose cluster information to: {code} /hadoop/yarn/fs/cluster.json {code} This basically have the runtime information about the deployed application, and getting updated when state changes happen. The file is replicated from hdfs to host system in appcache for the application. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556348#comment-16556348 ] Eric Yang commented on YARN-8569: - [~oliverhuh...@gmail.com] Your approach works fine for application that carry hadoop client or zookeeper client. This proposed interface is to lower the bar of entry to obtain cluster information for non-Hadoop native applications. This is the main reason to offer a file based interface for nodes. The high level view of the design looks like this: # Application Master received YARN service JSON from yarn cli. # Application Master write the hostname information to YARN service JSON resides in /user/${USER}/.yarn/services/[service]/[service].json # The file is added to distributed cache and localized during container launch. # The file is bind-mount into docker container for consumption at a predefined location. # Flex operation will trigger update of [service].json and repopulate distributed cache when nodes involved in the cluster has changed. User application can poll file changes from docker container to be notified of cluster information changes. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556244#comment-16556244 ] Keqiu Hu commented on YARN-8569: I can see the value of sharing some information between cluster nodes with this, just want to throw how we tackle the problem. We did this by storing the information in application master, each worker node would have a TaskExecutor to heartbeat with AM to get latest cluster information. How do you ensure the file is atomic, for example, multiple nodes can modify the mounted file the same time? > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org