[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657208#comment-16657208
 ] 

Robert Kanter commented on YARN-8569:
-------------------------------------

[~eyang], I tried it out and with the YARN-8569 014 patch, 
test-container-executor fails when run as root:
{noformat}
[root@rkanter-dev hadoop-yarn-server-nodemanager]# 
target/native/target/usr/local/bin/test-container-executor systest
Attempting to clean up from any previous runs
chmod: cannot access ‘/tmp/test-container-executor’: No such file or directory

Our executable is 
/root/hadoop-upstream/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor

Starting tests

test_is_empty()

Testing is_empty function
Directory is not empty /
Could not open directory /tmp/test-container-executor/noexist - No such file or 
directory
Could not open directory /tmp/test-container-executor/emptydir - No such file 
or directory
FAIL: /tmp/test-container-executor/emptydir should be empty
{noformat}
It looks like it can't create the {{/tmp/test-container-executor/emptydir}} 
directory.

Here's the output when run on trunk:
{noformat}
[root@rkanter-dev hadoop-yarn-server-nodemanager]# 
target/native/target/usr/local/bin/test-container-executor systest
Attempting to clean up from any previous runs
chmod: cannot access ‘/tmp/test-container-executor’: No such file or directory

Our executable is 
/root/hadoop-upstream/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor

Starting tests

test_is_empty()

Testing is_empty function
Directory is not empty /
Could not open directory /tmp/test-container-executor/noexist - No such file or 
directory

Testing recursive_unlink_children()

Testing resolve_config_path()

Testing resolve_config_path

Testing get_user_directory()

Testing check_nm_local_dir()

Testing get_app_directory()

Testing get_container_work_directory()

Testing get_container_launcher_file()

Testing get_container_credentials_file()

Testing get_container_keystore_file()

Testing get_container_truststore_file()

Testing get_app_log_dir()

Testing check_configuration_permissions
File /tmp/test-container-executor must not be world or group writable, but is 
777

Testing delete_container()

Testing delete_app()

Testing delete race

Testing is_feature_enabled()
Illegal value '1klajdflkajdsflk' for 'feature.name3.enabled' in configuration. 
Using default value: 0.
Illegal value 'asdkjfasdkljfklsdjf0' for 'feature.name4.enabled' in 
configuration. Using default value: 0.
Illegal value '-1' for 'feature.name5.enabled' in configuration. Using default 
value: 1.
Illegal value '2' for 'feature.name6.enabled' in configuration. Using default 
value: 0.

Testing test_check_user
Requested user lp is not whitelisted and has id 4,which is below the minimum 
allowed 500
Running as root is not allowed

Testing clean_docker_cgroups
clean_docker_cgroups: Invalid mount table
clean_docker_cgroups: Invalid yarn_hierarchy
clean_docker_cgroups: Invalid container_id: null
clean_docker_cgroups: Invalid container_id: not_a_container_123

Running test test_signal_container_group in child process

Testing group signal_container
Child container launched as 16189
Killing process group 16189 with 9

Testing init app

Testing launch container without HTTPS
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /tmp/test-container-executor/pid.txt.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...

Testing launch container with HTTPS
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /tmp/test-container-executor/pid.txt.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...

Testing delete_user
baseDir 
"/tmp/test-container-executor/local-1/usercache/systest/appcache/app_3/test.cfg"
 is a file and cannot contain subdir "file1".
0
Trying banned default user()

Testing test_check_user
Requested user bin is banned
Running as root is not allowed

Testing test_check_user
User sys not found
Running as root is not allowed

Testing trim function

Finished tests
{noformat}

> Create an interface to provide cluster information to application
> -----------------------------------------------------------------
>
>                 Key: YARN-8569
>                 URL: https://issues.apache.org/jira/browse/YARN-8569
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, 
> YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, 
> YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>      --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
>      --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
>      --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to