[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657208#comment-16657208 ]
Robert Kanter commented on YARN-8569: ------------------------------------- [~eyang], I tried it out and with the YARN-8569 014 patch, test-container-executor fails when run as root: {noformat} [root@rkanter-dev hadoop-yarn-server-nodemanager]# target/native/target/usr/local/bin/test-container-executor systest Attempting to clean up from any previous runs chmod: cannot access ‘/tmp/test-container-executor’: No such file or directory Our executable is /root/hadoop-upstream/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor Starting tests test_is_empty() Testing is_empty function Directory is not empty / Could not open directory /tmp/test-container-executor/noexist - No such file or directory Could not open directory /tmp/test-container-executor/emptydir - No such file or directory FAIL: /tmp/test-container-executor/emptydir should be empty {noformat} It looks like it can't create the {{/tmp/test-container-executor/emptydir}} directory. Here's the output when run on trunk: {noformat} [root@rkanter-dev hadoop-yarn-server-nodemanager]# target/native/target/usr/local/bin/test-container-executor systest Attempting to clean up from any previous runs chmod: cannot access ‘/tmp/test-container-executor’: No such file or directory Our executable is /root/hadoop-upstream/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor Starting tests test_is_empty() Testing is_empty function Directory is not empty / Could not open directory /tmp/test-container-executor/noexist - No such file or directory Testing recursive_unlink_children() Testing resolve_config_path() Testing resolve_config_path Testing get_user_directory() Testing check_nm_local_dir() Testing get_app_directory() Testing get_container_work_directory() Testing get_container_launcher_file() Testing get_container_credentials_file() Testing get_container_keystore_file() Testing get_container_truststore_file() Testing get_app_log_dir() Testing check_configuration_permissions File /tmp/test-container-executor must not be world or group writable, but is 777 Testing delete_container() Testing delete_app() Testing delete race Testing is_feature_enabled() Illegal value '1klajdflkajdsflk' for 'feature.name3.enabled' in configuration. Using default value: 0. Illegal value 'asdkjfasdkljfklsdjf0' for 'feature.name4.enabled' in configuration. Using default value: 0. Illegal value '-1' for 'feature.name5.enabled' in configuration. Using default value: 1. Illegal value '2' for 'feature.name6.enabled' in configuration. Using default value: 0. Testing test_check_user Requested user lp is not whitelisted and has id 4,which is below the minimum allowed 500 Running as root is not allowed Testing clean_docker_cgroups clean_docker_cgroups: Invalid mount table clean_docker_cgroups: Invalid yarn_hierarchy clean_docker_cgroups: Invalid container_id: null clean_docker_cgroups: Invalid container_id: not_a_container_123 Running test test_signal_container_group in child process Testing group signal_container Child container launched as 16189 Killing process group 16189 with 9 Testing init app Testing launch container without HTTPS Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /tmp/test-container-executor/pid.txt.tmp Writing to cgroup task files... Creating local dirs... Launching container... Testing launch container with HTTPS Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /tmp/test-container-executor/pid.txt.tmp Writing to cgroup task files... Creating local dirs... Launching container... Testing delete_user baseDir "/tmp/test-container-executor/local-1/usercache/systest/appcache/app_3/test.cfg" is a file and cannot contain subdir "file1". 0 Trying banned default user() Testing test_check_user Requested user bin is banned Running as root is not allowed Testing test_check_user User sys not found Running as root is not allowed Testing trim function Finished tests {noformat} > Create an interface to provide cluster information to application > ----------------------------------------------------------------- > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Eric Yang > Assignee: Eric Yang > Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org