[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-15 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975326#comment-16975326
 ] 

Shane Kumpf commented on YARN-9562:
---

I retested without the private tmp mount and moving my YARN local dirs out of 
tmp, both will resolve the issue I previously saw around clean up. +1 from me.

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch, 
> YARN-9562.015.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime

2019-11-15 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975061#comment-16975061
 ] 

Shane Kumpf commented on YARN-9561:
---

I retested without the private tmp mount and moving my YARN local dirs out of 
tmp, both will resolve the issue I previously saw around clean up. +1 from me.

> Add C changes for the new RuncContainerRuntime
> --
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch, 
> YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, 
> YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, 
> YARN-9561.009.patch, YARN-9561.010.patch, YARN-9561.011.patch, 
> YARN-9561.012.patch, YARN-9561.013.patch, YARN-9561.014.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new RuncContainerRuntime. There should be 
> no changes to existing code paths. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9959) Work around hard-coded tmp and /var/tmp bind-mounts in the container's working directory

2019-11-15 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975058#comment-16975058
 ] 

Shane Kumpf commented on YARN-9959:
---

Sounds like a reasonable approach to me. Thanks!

> Work around hard-coded tmp and /var/tmp bind-mounts in the container's 
> working directory
> 
>
> Key: YARN-9959
> URL: https://issues.apache.org/jira/browse/YARN-9959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>
> {noformat}
> addRuncMountLocation(mounts, containerWorkDir.toString() +
> "/private_slash_tmp", "/tmp", true, true);
> addRuncMountLocation(mounts, containerWorkDir.toString() +
> "/private_var_slash_tmp", "/var/tmp", true, true);
> {noformat}
> It would be good to remove the hard-coded tmp mounts from the 
> {{RuncContainerRuntime}} in place of something general or possibly a tmpfs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-13 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973463#comment-16973463
 ] 

Shane Kumpf edited comment on YARN-9562 at 11/13/19 3:55 PM:
-

Hey Eric, sorry for the delay. Just a note that my patch testing workflow is to 
always start with a fresh checkout of the source, apply the patches, and start 
a brand new VM, so there wasn't a switch from local to non-local. 

Here is a listing of what remains after I get that exception during cleanup.
{code}
[root@y7001 ~]# find /tmp/hadoop-yarn/nm-local-dir/ -ls
678266130 drwxr-xr-x   5 yarn yarn   57 Nov 13 15:36 
/tmp/hadoop-yarn/nm-local-dir/
87034050 drwxr-xr-x   3 yarn yarn   24 Nov 13 15:47 
/tmp/hadoop-yarn/nm-local-dir/usercache
1088699380 drwxr-s---   4 nobody   hadoop 39 Nov 13 15:47 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser
708090610 drwxr-s---   3 nobody   hadoop 44 Nov 13 15:49 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache
87034500 drwxr-s---   3 nobody   hadoop 52 Nov 13 15:50 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003
386355360 drwx--s---   3 nobody   hadoop 31 Nov 13 15:50 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02
87112680 drwx--x---   3 nobody   hadoop 25 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp
1088699500 drwxr-xr-x   3 root root   26 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn
87112700 drwxr-xr-x   4 root root   40 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir
386355400 drwxr-xr-x   3 root root   24 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache
708090980 drwxr-xr-x   4 root root   39 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser
1088699510 drwxr-xr-x   3 root root   44 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache
87112710 drwxr-xr-x   2 root root6 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003
708090990 drwxr-xr-x   2 root root6 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/filecache
386355410 drwxr-xr-x   2 root root6 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/filecache
338681210 drwx--x---   2 nobody   hadoop  6 Nov 13 15:47 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/filecache
383884700 drwxr-xr-x   4 yarn yarn   26 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/filecache
1088699180 drwxr-xr-x   2 yarn yarn  155 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/filecache/11
1088699194 -r-xr-xr-x   1 yarn yarn 2183 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/filecache/11/5e35e350aded98340bc8fcb0ba392d809c807bc3eb5c618d4a0674d98d88bccd
1088699444 -rw-r--r--   1 yarn yarn   28 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/filecache/11/.5e35e350aded98340bc8fcb0ba392d809c807bc3eb5c618d4a0674d98d88bccd.crc
708090930 drwxr-xr-x   2 yarn yarn  165 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/filecache/10
70809094 72260 -r-xr-xr-x   1 yarn yarn 73994240 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/filecache/10/ab5ef0e5819490abe86106fd9f4381123e37a03e80e650be39f7938d30ecb530.sqsh
70809095  568 -rw-r--r--   1 yarn yarn   578088 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/filecache/10/.ab5ef0e5819490abe86106fd9f4381123e37a03e80e650be39f7938d30ecb530.sqsh.crc
67826614

[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-13 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973463#comment-16973463
 ] 

Shane Kumpf commented on YARN-9562:
---

Hey Eric, sorry for the delay. Just a note that my patch testing workflow is to 
always start with a fresh checkout of the source, apply the patches, and start 
a brand new VM, so there wasn't a switch from local to non-local. 

Here is a listing of what remains after I get that exception during cleanup.
{code}
[root@y7001 ~]# find 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/ -ls
708090610 drwxr-s---   3 nobody   hadoop 44 Nov 13 15:49 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/
87034500 drwxr-s---   3 nobody   hadoop 52 Nov 13 15:50 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003
386355360 drwx--s---   3 nobody   hadoop 31 Nov 13 15:50 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02
87112680 drwx--x---   3 nobody   hadoop 25 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp
1088699500 drwxr-xr-x   3 root root   26 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn
87112700 drwxr-xr-x   4 root root   40 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir
386355400 drwxr-xr-x   3 root root   24 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache
708090980 drwxr-xr-x   4 root root   39 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser
1088699510 drwxr-xr-x   3 root root   44 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache
87112710 drwxr-xr-x   2 root root6 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003
708090990 drwxr-xr-x   2 root root6 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/filecache
386355410 drwxr-xr-x   2 root root6 Nov 13 15:48 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573659389683_0003/container_1573659389683_0003_01_02/private_slash_tmp/hadoop-yarn/nm-local-dir/filecache
{code}

I haven't tested removing it and am out of time at the moment, but it seems 
related to the private_slash_tmp dir, as that is what remains.

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-10 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971141#comment-16971141
 ] 

Shane Kumpf commented on YARN-9562:
---

Thanks for the new patches, [~ebadger]! I was able to successfully run a dshell 
and MR PI job leveraging runC with these patches.
{code}
[root@y7001 ~]# runc list
ID   PID STATUS  BUNDLE 

 CREATED  OWNER
container_e02_1573397883403_0003_01_02   32546   running 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1573397883403_0003/container_e02_1573397883403_0003_01_02
   2019-11-10T15:03:22.810203063Z   root
{code}

However, clean up of the container resources is failing due to permission 
denied issues:
{code}
2019-11-10 15:03:11,637 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573397883403_0002/container_e02_1573397883403_0002_01_02
2019-11-10 15:03:11,653 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 255. Privileged Execution Operation Stderr:
Nonzero exit code=-1, error message='Unknown error code'

Stdout: main : command provided 3
main : run as user is nobody
main : requested yarn user is hadoopuser
failed to rmdir application_1573397883403_0002: Permission denied
failed to rmdir appcache: Permission denied
failed to rmdir filecache: Permission denied
failed to rmdir hadoopuser: Permission denied
failed to rmdir usercache: Permission denied
failed to rmdir filecache: Permission denied
failed to rmdir nm-local-dir: Permission denied
failed to rmdir hadoop-yarn: Directory not empty
failed to rmdir private_slash_tmp: Directory not empty
Error while deleting 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573397883403_0002/container_e02_1573397883403_0002_01_02:
 39 (Directory not empty)

Full command array for failed execution:
[/usr/local/hadoop/bin/container-executor, nobody, hadoopuser, 3, 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573397883403_0002/container_e02_1573397883403_0002_01_02]
2019-11-10 15:03:11,653 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: DeleteAsUser 
for 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1573397883403_0002/container_e02_1573397883403_0002_01_02
 returned with exit code: 255
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=255: Nonzero exit code=-1, error message='Unknown 
error code'

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:871)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.FileDeletionTask.run(FileDeletionTask.java:125)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=255: Nonzero exit code=-1, error 
message='Unknown error code'
{code}

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch, 

[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-07 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969572#comment-16969572
 ] 

Shane Kumpf commented on YARN-9562:
---

Thanks again, Eric! I'll give the latest patches a try.

{quote} 
These variables are used in create_local_dirs. I'm not super familiar with the 
feature, but I was under the impression that they were not tied to any specific 
runtime. 
{quote}
You are correct. I guess I've overlooked this in the past.

{quote}What do you suggest as an alternative? Add both /tmp and /var/tmp as 
tmpfs in the runC config?{quote}
My initial thought would be handling these mounts via the default-rw-mounts 
setting in yarn-site vs hard coding it for every container. That said, I do see 
the challenge that poses, since the mount is inside of the container work dir 
in the current patch. For this initial cut, I'm fine with leaving it as is and 
we can open an issue to revisit.

{quote}Agreed. Should I file JIRAs for the features or add comments into the 
code or add documentation or what?{quote}
JIRAs under the runC umbrella maybe? or do we want to try to close that out 
relatively quickly? I can help open the issues, I wasn't intending to put that 
burden on you. :)

{quote} I don't want to attempt to install the packages for them. {quote}
Adding that check sounds like a reasonable solution to me without being 
intrusive. No other issues to report on usability.

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-06 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968539#comment-16968539
 ] 

Shane Kumpf commented on YARN-9562:
---

bq. So you're setting linux-container-executor.nonsecure-mode.limit-users to 
true with linux-container-executor.nonsecure-mode.local-user set to nobody in 
your yarn-site.xml? Is that the use case here?

Exactly correct

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-05 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967937#comment-16967937
 ] 

Shane Kumpf commented on YARN-9562:
---

Hey [~ebadger]. Thanks for your (and everyone elses) hard work here. Overall 
this looks to be coming together nicely.

I've taken a look at the code and have a couple of items, but nothing blocking. 
However, I'm having a bit of trouble getting runC containers working so far. 
I'm out of time to continue troubleshooting right now, but this is what I'm 
seeing, both dshell and MR pi do the same. Docker MR jobs are working. I am 
running all containers as the nobody user in this case.
{code:java}
2019-11-05 22:40:14,225 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 35. Privileged Execution Operation Stderr:
Bad/Missing runC int
Could not create container dirs
Could not create local files and directories
Nonzero exit code=35, error message='Could not create work dirs'
Stdout: Can't create directory 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1572993484434_0003/container_e04_1572993484434_0003_01_02
 - Permission denied
Full command array for failed execution:
[/usr/local/hadoop/bin/container-executor, --run-runc-container, 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1572993484434_0003/container_e04_1572993484434_0003_01_02/runc-config.json]{code}
 

Here are some questions/nits on the patch. None of these are blockers IMO.

Questions/Comments:

1) Why is the keystore and truststore needed within RuncContainerExecutorConfig?

2) I'm not a big fan of hard coded mounts like this. This would also be 
problematic for systemd based containers where systemd expects /tmp to be a 
tmpfs.
{code:java}
addRuncMountLocation(mounts, containerWorkDir.toString() +
"/private_slash_tmp", "/tmp", true, true);
addRuncMountLocation(mounts, containerWorkDir.toString() +
"/private_var_slash_tmp", "/var/tmp", true, true);
{code}
3) It would be great to track these disabled features for future implementation.
{code:java}
  public String getExposedPorts(Container container) {
return null;
  }

  public String[] getIpAndHost(Container container) {
return null;
  }

  public IOStreamPair execContainer(ContainerExecContext ctx)
  throws ContainerExecutionException {
return null;
  }

  public void reapContainer(ContainerRuntimeContext ctx)
  throws ContainerExecutionException {
  }

  public void relaunchContainer(ContainerRuntimeContext ctx)
  throws ContainerExecutionException {
  }
{code}
Nits:

1) clean up the whitespace around Container#getContainerRuntimeData

2) RuncContainerExecutorConfig typo in class javadoc

3) YarnConfiguration DEFAULT_NM_RUNC_ALLOWED_CONTAINER_NETWORKS and 
DEFAULT_NM_RUNC_ALLOWED_CONTAINER_RUNTIMES - copy and paste error on the javadoc

4) Many of the tests create tmpDirs but don't appear to clean them up. 
TestRuncContainerRuntime creates two temp dirs, once via mkdirs and the other 
via a Rule.
{code:java}
TestDockerContainerRuntime mkdirs for tmpDir
TestHdfsManifestToResouvesPlugin creates a tmpDir but doesn't clean it up
TestRuncContainerRuntime has both a tmpDir and TempDir created by a @Rule
{code}
5) Docs
 * Overview: "if created", newline after runC in second paragraph.
 * Docker to squash section: first paragraph "Getting" newline.
 * I'm fine with leaving reference to the patch to docker_to_squash.py for now 
until we have a better story, but I did need to do a few steps to get that tool 
working. 1) Create the hdfs runc-root as root 2) install skopeo, 
squashfs-tools, and attr.

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9860) Enable service mode for Docker containers on YARN

2019-10-08 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946765#comment-16946765
 ] 

Shane Kumpf edited comment on YARN-9860 at 10/8/19 11:51 AM:
-

{quote}Have changed the Keytab file visibility to PRIVATE and left others with 
default APPLICATION and can be overridden by user{quote}

 I don't think we want to change existing behavior with this patch. If it was 
previously APPLICATION, I think it should stay APPLICATION. If it really should 
be PRIVATE, that should be a follow up. I expect moving those resources to 
private is the cause of the exception above.


was (Author: shaneku...@gmail.com):
{quote}Have changed the Keytab file visibility to PRIVATE and left others with 
default APPLICATION and can be overridden by user.\{quote}

I don't think we want to change existing behavior with this patch. If it was 
previously APPLICATION, I think it should stay APPLICATION. If it really should 
be PRIVATE, that should be a follow up. I expect moving those resources to 
private is the cause of the exception above.

> Enable service mode for Docker containers on YARN
> -
>
> Key: YARN-9860
> URL: https://issues.apache.org/jira/browse/YARN-9860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9860-001.patch, YARN-9860-002.patch, 
> YARN-9860-003.patch, YARN-9860-004.patch, YARN-9860-005.patch, 
> YARN-9860-006.patch, YARN-9860-007.patch
>
>
> This task is to add support to YARN for running Docker containers in "Service 
> Mode". 
> Service Mode - Run the container as defined by the image, but still allow for 
> injecting configuration. 
> Background:
>   Entrypoint mode helped - now able to use the ENV and ENTRYPOINT/CMD as 
> defined in the image. However, still requires modification to official images 
> due to user propagation
> User propagation is problematic for running a secure cluster with sssd
>   
> Implementation:
>   Must be enabled via c-e.cfg (example: docker.service-mode.allowed=true)
>   Must be requested at runtime - (example: 
> YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE=true)
>   Entrypoint mode is default enabled for this mode (If Service Mode is 
> requested, YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE should be set 
> to true)
>   Writable log mount will not be added - stdout logging may still work 
> with entrypoint mode - remove the writable bind mounts
>   User and groups will not be propagated (now: docker run --user nobody 
> --group-add=nobody  , after: docker run  )
>   Read-only resources mounted at the file level, files get chmod 777, 
> parent directory only accessible by the run as user.
> cc [~shaneku...@gmail.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9860) Enable service mode for Docker containers on YARN

2019-10-08 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946765#comment-16946765
 ] 

Shane Kumpf commented on YARN-9860:
---

{quote}Have changed the Keytab file visibility to PRIVATE and left others with 
default APPLICATION and can be overridden by user.\{quote}

I don't think we want to change existing behavior with this patch. If it was 
previously APPLICATION, I think it should stay APPLICATION. If it really should 
be PRIVATE, that should be a follow up. I expect moving those resources to 
private is the cause of the exception above.

> Enable service mode for Docker containers on YARN
> -
>
> Key: YARN-9860
> URL: https://issues.apache.org/jira/browse/YARN-9860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9860-001.patch, YARN-9860-002.patch, 
> YARN-9860-003.patch, YARN-9860-004.patch, YARN-9860-005.patch, 
> YARN-9860-006.patch, YARN-9860-007.patch
>
>
> This task is to add support to YARN for running Docker containers in "Service 
> Mode". 
> Service Mode - Run the container as defined by the image, but still allow for 
> injecting configuration. 
> Background:
>   Entrypoint mode helped - now able to use the ENV and ENTRYPOINT/CMD as 
> defined in the image. However, still requires modification to official images 
> due to user propagation
> User propagation is problematic for running a secure cluster with sssd
>   
> Implementation:
>   Must be enabled via c-e.cfg (example: docker.service-mode.allowed=true)
>   Must be requested at runtime - (example: 
> YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE=true)
>   Entrypoint mode is default enabled for this mode (If Service Mode is 
> requested, YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE should be set 
> to true)
>   Writable log mount will not be added - stdout logging may still work 
> with entrypoint mode - remove the writable bind mounts
>   User and groups will not be propagated (now: docker run --user nobody 
> --group-add=nobody  , after: docker run  )
>   Read-only resources mounted at the file level, files get chmod 777, 
> parent directory only accessible by the run as user.
> cc [~shaneku...@gmail.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9860) Enable service mode for Docker containers on YARN

2019-10-02 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942957#comment-16942957
 ] 

Shane Kumpf commented on YARN-9860:
---

{quote}It looks like we are reverting to our old habit of using 0 for true. It 
would be more consistent to use is_feature_enabled() method to determine if 
service_mode is enabled, and reduce some code debris.
{quote}
Good points, I agree these comments should be addressed.
{quote}Container-executor already dup the container run output into stdout and 
stderr log files with proper user permission for entrypoint mode because the 
log files are initialized as user who runs the container executor rather than 
the user in the container. It works for both secure and non-secure mode. I fail 
to see the need to craft logging mechanism for the given reasoning for service 
mode. Let me know if I missed something.
{quote}
Excellent, I can confirm this is working exactly how we'd want. I overlooked 
this before. Seems logging isn't an issue after all. Thanks for pointing that 
out!

I did retest the patch today and it is still working as expected.

With the patch applied in my dev VM, below is the ps and logs from the official 
postgres image running under YARN with zero changes!

*ps:*
{code:java}
root@centos7-0:/# ps -ef
UIDPID  PPID  C STIME TTY  TIME CMD
postgres 1 0  0 16:13 ?00:00:00 postgres
postgres53 1  0 16:13 ?00:00:00 postgres: checkpointer
postgres54 1  0 16:13 ?00:00:00 postgres: background writer
postgres55 1  0 16:13 ?00:00:00 postgres: walwriter
postgres56 1  0 16:13 ?00:00:00 postgres: autovacuum launcher
postgres57 1  0 16:13 ?00:00:00 postgres: stats collector
postgres58 1  0 16:13 ?00:00:00 postgres: logical replication 
launcher
root59 0  4 16:14 pts/000:00:00 bash
root6459  0 16:14 pts/000:00:00 ps -ef
{code}

*Logs:*
{code:java}
[root@y7001 ~]# yarn logs -applicationId application_1570018164872_0005 
-containerId container_1570018164872_0005_01_02
2019-10-02 16:26:40,269 INFO client.RMProxy: Connecting to ResourceManager at 
y7001.yns.foo.com/192.168.70.211:9104
Container: container_1570018164872_0005_01_02 on y7001.yns.foo.com:9105
LogAggregationType: LOCAL
===
LogType:stdout.txt
LogLastModifiedTime:Wed Oct 02 16:13:31 + 2019
LogLength:2638
LogContents:
Launching docker container...
Docker run command: /usr/bin/docker run 
--name=container_1570018164872_0005_01_02 --net=host -v 
/tmp/hadoop-yarn/nm-local-dir/filecache/13/httpd-proxy.conf:/etc/httpd/conf.d/httpd-proxy.conf:ro
 --cgroup-parent=/hadoop-yarn/container_1570018164872_0005_01_02 
--cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP 
--cap-add=SETPCAP --cap-add=FSETID --cap-add=CHOWN --cap-add=AUDIT_WRITE 
--cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID 
--cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE 
--hostname=centos7-0.skumpftest.hadoopuser.ynsdev --env-file 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1570018164872_0005/container_1570018164872_0005_01_02/docker.container_1570018164872_0005_01_028354800474788286290.env
 library/postgres
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

pg_ctl -D /var/lib/postgresql/data -l logfile start

waiting for server to start2019-10-02 16:13:31.231 UTC [43] LOG:  listening 
on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-10-02 16:13:31.253 UTC [44] LOG:  database system was shut down at 
2019-10-02 16:13:30 UTC
2019-10-02 16:13:31.259 UTC [43] LOG:  database system is ready to accept 
connections
 done
server started

/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*

waiting for server to shut down...2019-10-02 16:13:31.322 UTC [43] LOG:  
received fast shutdown request
.2019-10-02 16:13:31.325 UTC [43] LOG:  aborting any active transactions
2019-10-02 16:13:31.329 UTC [43] LOG:  background worker "logical replication 

[jira] [Commented] (YARN-9860) Enable service mode for Docker containers on YARN

2019-10-02 Thread Shane Kumpf (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942699#comment-16942699
 ] 

Shane Kumpf commented on YARN-9860:
---

{quote}Can you give more clear definition of service mode?
{quote}
To simplify it the best I can, in Service Mode, YARN does not set the user 
(--user and --group-add) when running the container. The rest of the changes 
are in support of dropping the user in this mode. A simple use case where this 
is needed is running the official postgres image without modification. Note 
that this mode is disabled by default to limit any security implications.
{quote}I don't understand the reason to add --user= parameter only when service 
mode is enabled.
{quote}
That code does the opposite of what you stated, user is only passed when 
service mode is NOT enabled, which is what we want.
{quote}If there are no log directories, how would you attack debugging 
container failures?
{quote}
You are spot on that this will be an issue. The challenge is that if we mount 
the read-write log dirs into the container and the container user isn't the 
user YARN expects, the writes could fail or YARN may be unable to clean up the 
logs. I talked with Craig on this a bit and he had some interesting thoughts on 
how we might handle it with fuse. For the sake of this patch, I didn't want to 
get bogged down in the details there, given this has enough going already. 
Could we address logging in a follow up? In the meantime, with debug delay 
enabled, doing a {{docker logs}} on the exited container will allow admins to 
take a look, since the output redirection typically done by YARN is dropped in 
Service Mode.

I've done extensive testing of an earlier version of this patch and it 
addresses the use case and works as expected. I'm going to do some additional 
testing today with the patch here to make sure there are no regressions. 

> Enable service mode for Docker containers on YARN
> -
>
> Key: YARN-9860
> URL: https://issues.apache.org/jira/browse/YARN-9860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9860-001.patch, YARN-9860-002.patch
>
>
> This task is to add support to YARN for running Docker containers in "Service 
> Mode". 
> Service Mode - Run the container as defined by the image, but still allow for 
> injecting configuration. 
> Background:
>   Entrypoint mode helped - now able to use the ENV and ENTRYPOINT/CMD as 
> defined in the image. However, still requires modification to official images 
> due to user propagation
> User propagation is problematic for running a secure cluster with sssd
>   
> Implementation:
>   Must be enabled via c-e.cfg (example: docker.service-mode.allowed=true)
>   Must be requested at runtime - (example: 
> YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE=true)
>   Entrypoint mode is default enabled for this mode (If Service Mode is 
> requested, YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE should be set 
> to true)
>   Writable log mount will not be added - stdout logging may still work 
> with entrypoint mode - remove the writable bind mounts
>   User and groups will not be propagated (now: docker run --user nobody 
> --group-add=nobody  , after: docker run  )
>   Read-only resources mounted at the file level, files get chmod 777, 
> parent directory only accessible by the run as user.
> cc [~shaneku...@gmail.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2019-08-07 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902067#comment-16902067
 ] 

Shane Kumpf commented on YARN-8045:
---

+1 on the 2.8 patch. Feel free to go ahead with committing it.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.0.4, 2.8.6, 2.9.3, 3.1.3
>
> Attachments: YARN-8045.001-branch-2.8.patch, YARN-8045.001.patch
>
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2019-07-25 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893070#comment-16893070
 ] 

Shane Kumpf commented on YARN-8045:
---

No concerns from me. The fix should not break existing parsers, since the field 
is still intact.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8045.001.patch
>
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-08 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880265#comment-16880265
 ] 

Shane Kumpf commented on YARN-9660:
---

Thanks for the improvements [~pbacsko]! lgtm +1

> Enhance documentation of Docker on YARN support
> ---
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9660-001.patch
>
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
> runtime error: container_linux.go:235: starting container process caused 
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>  
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular 
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails 
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}
> *#4 Add cmd-line example of how to tag local images*
> This is actually documented under "Privileged Container Security 
> Consideration", but an one-liner would be helpful. I had trouble running a 
> local docker image and tagging it appropriately. Just an example like 
> {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-28 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875089#comment-16875089
 ] 

Shane Kumpf commented on YARN-9560:
---

Thanks for the patch and explanation, [~ebadger]. It is a similar pattern to 
what we do in the delegating runtime. I tested out the patch and it looks good 
to me. The unit test failing looks to be unrelated. I'm +1 on patch 013.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch, YARN-9560.013.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9230) Write a go hdfs driver for Docker Registry

2019-01-30 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756164#comment-16756164
 ] 

Shane Kumpf commented on YARN-9230:
---

Thanks for taking another look at the library, [~eyang]. It sounds like some 
improvements have been made, but gaps still exist. The security gaps in the go 
HDFS client is likely a non-starter for many. I'm inclined to agree that a 
registry storage driver for HDFS won't be implemented until this gap is fixed 
and fixing the gap is outside of the scope of YARN. The issue could be reopened 
if security is properly implemented.

> Write a go hdfs driver for Docker Registry
> --
>
> Key: YARN-9230
> URL: https://issues.apache.org/jira/browse/YARN-9230
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>
> A number of people have developed go client library for HDFS.  [Example 
> 1|https://medium.com/sqooba/making-hdfs-a-hundred-times-faster-ac75b8b5e0b4] 
> [Example 2|https://github.com/colinmarc/hdfs].
> This can be enhanced into a real storage driver for Docker registry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9074) Docker container rm command should be executed after stop

2019-01-23 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749972#comment-16749972
 ] 

Shane Kumpf commented on YARN-9074:
---

[~uranus] To add to what [~csingh] noted, the checkstyle should be fixed as 
well. Once those are addressed, I'll commit this. Thanks!

> Docker container rm command should be executed after stop
> -
>
> Key: YARN-9074
> URL: https://issues.apache.org/jira/browse/YARN-9074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9074.001.patch, image-2018-12-01-11-36-12-448.png, 
> image-2018-12-01-11-38-18-191.png
>
>
> {code:java}
> @Override
> public void transition(ContainerImpl container, ContainerEvent event) {
> container.setIsReInitializing(false);
> // Set exit code to 0 on success 
> container.exitCode = 0;
> // TODO: Add containerWorkDir to the deletion service.
> if (DockerLinuxContainerRuntime.isDockerContainerRequested(
> container.daemonConf,
> container.getLaunchContext().getEnvironment())) {
> removeDockerContainer(container);
> }
> if (clCleanupRequired) {
> container.dispatcher.getEventHandler().handle(
> new ContainersLauncherEvent(container,
> ContainersLauncherEventType.CLEANUP_CONTAINER));
> }
> container.cleanup();
> }{code}
> Now, when container is finished, NM firstly execute "_docker rm xxx"_  to 
> remove it and this thread is placed in DeletionService. see more in YARN-5366 
> .
> Next, NM will execute "_docker stop_" and "docker kill" command. these tow 
> commands are wrapped up in ContainerCleanup thread and executed by 
> ContainersLauncher. see more in YARN-7644. 
> The above will cause the container's cleanup to be split into two threads. I 
> think we should refactor these code to make all docker container killing 
> process be place in ContainerCleanup thread and "_docker rm_" should be 
> executed last.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9074) Docker container rm command should be executed after stop

2019-01-12 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741292#comment-16741292
 ] 

Shane Kumpf commented on YARN-9074:
---

lgtm +1

> Docker container rm command should be executed after stop
> -
>
> Key: YARN-9074
> URL: https://issues.apache.org/jira/browse/YARN-9074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9074.001.patch, image-2018-12-01-11-36-12-448.png, 
> image-2018-12-01-11-38-18-191.png
>
>
> {code:java}
> @Override
> public void transition(ContainerImpl container, ContainerEvent event) {
> container.setIsReInitializing(false);
> // Set exit code to 0 on success 
> container.exitCode = 0;
> // TODO: Add containerWorkDir to the deletion service.
> if (DockerLinuxContainerRuntime.isDockerContainerRequested(
> container.daemonConf,
> container.getLaunchContext().getEnvironment())) {
> removeDockerContainer(container);
> }
> if (clCleanupRequired) {
> container.dispatcher.getEventHandler().handle(
> new ContainersLauncherEvent(container,
> ContainersLauncherEventType.CLEANUP_CONTAINER));
> }
> container.cleanup();
> }{code}
> Now, when container is finished, NM firstly execute "_docker rm xxx"_  to 
> remove it and this thread is placed in DeletionService. see more in YARN-5366 
> .
> Next, NM will execute "_docker stop_" and "docker kill" command. these tow 
> commands are wrapped up in ContainerCleanup thread and executed by 
> ContainersLauncher. see more in YARN-7644. 
> The above will cause the container's cleanup to be split into two threads. I 
> think we should refactor these code to make all docker container killing 
> process be place in ContainerCleanup thread and "_docker rm_" should be 
> executed last.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9074) Docker container rm command should be executed after stop

2018-12-03 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707375#comment-16707375
 ] 

Shane Kumpf commented on YARN-9074:
---

Thanks for the input, [~uranus].
{quote}I think it is not good to keep all the containers on the NM when debug, 
which will make a large number of containers remaining for a period of time.
{quote}

That isn't how the feature works. Only if the user explicitly sets the 
environment variable YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL to true will 
the container be kept. ContainersLauncher will call ContainerCleanup, which 
will call reapContainer and remove the container otherwise. I think this all 
works as you are describing already. The community felt the deletion of the 
container should be the responsibility of the DeletionService and I'm not 
seeing a strong reason to change it.

> Docker container rm command should be executed after stop
> -
>
> Key: YARN-9074
> URL: https://issues.apache.org/jira/browse/YARN-9074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: image-2018-12-01-11-36-12-448.png, 
> image-2018-12-01-11-38-18-191.png
>
>
> {code:java}
> @Override
> public void transition(ContainerImpl container, ContainerEvent event) {
> container.setIsReInitializing(false);
> // Set exit code to 0 on success 
> container.exitCode = 0;
> // TODO: Add containerWorkDir to the deletion service.
> if (DockerLinuxContainerRuntime.isDockerContainerRequested(
> container.daemonConf,
> container.getLaunchContext().getEnvironment())) {
> removeDockerContainer(container);
> }
> if (clCleanupRequired) {
> container.dispatcher.getEventHandler().handle(
> new ContainersLauncherEvent(container,
> ContainersLauncherEventType.CLEANUP_CONTAINER));
> }
> container.cleanup();
> }{code}
> Now, when container is finished, NM firstly execute "_docker rm xxx"_  to 
> remove it and this thread is placed in deletionService. see more in YARN-5366 
> .
> Next, NM will execute "_docker stop_" and "docker kill" command. these tow 
> commands are wrapped up in ContainerCleanup thread and executed by 
> ContainersLauncher. see more in YARN-7644. 
> The above will cause the container's cleanup to be split into two threads. I 
> think we should refactor these code to make all docker container killing 
> process be place in ContainerCleanup thread and "_docker rm_" should be 
> executed last.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9074) Docker container rm command should be executed after stop

2018-11-30 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705036#comment-16705036
 ] 

Shane Kumpf commented on YARN-9074:
---

{quote}In my opinion, if we want to debug, we can just rerun the docker command 
manually,
{quote}
Recreating a new container vs being able to inspect the state of the existing 
container are different in my opinion. Being able to see the state of the 
failed/exited container has value. I think we should retain support for debug 
deletion delay.
{quote}On the contrary, in most cases, if we don't need debug, it would be 
unreasonable to remove container firstly and then stop container.
{quote}
IIRC, there isn't a way to avoid the deletion task and continue to support the 
debug delay. Is there an issue that you encountered that you can share more 
detail on? For normal execution, the container will be in an exited state, 
meaning {{docker stop}} won't be called. If you'd like to add an additional 
{{docker rm}} to cleanupContainer when the debug delay is zero, I don't have a 
major concern. If that approach is taken, we'd need to understand the impact 
for relaunch, since relaunch will try to {{docker start}} the existing 
container, so a {{docker rm}} would be undesirable in this case.

> Docker container rm command should be executed after stop
> -
>
> Key: YARN-9074
> URL: https://issues.apache.org/jira/browse/YARN-9074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> {code:java}
> @Override
> public void transition(ContainerImpl container, ContainerEvent event) {
> container.setIsReInitializing(false);
> // Set exit code to 0 on success 
> container.exitCode = 0;
> // TODO: Add containerWorkDir to the deletion service.
> if (DockerLinuxContainerRuntime.isDockerContainerRequested(
> container.daemonConf,
> container.getLaunchContext().getEnvironment())) {
> removeDockerContainer(container);
> }
> if (clCleanupRequired) {
> container.dispatcher.getEventHandler().handle(
> new ContainersLauncherEvent(container,
> ContainersLauncherEventType.CLEANUP_CONTAINER));
> }
> container.cleanup();
> }{code}
> Now, when container is finished, NM firstly execute "_docker rm xxx"_  to 
> remove it and this thread is placed in deletionService. see more in YARN-5366 
> .
> Next, NM will execute "_docker stop_" and "docker kill" command. these tow 
> commands are wrapped up in ContainerCleanup thread and executed by 
> ContainersLauncher. see more in YARN-7644. 
> The above will cause the container's cleanup to be split into two threads. I 
> think we should refactor these code to make all docker container killing 
> process be place in ContainerCleanup thread and "_docker rm_" should be 
> executed last.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9074) Docker container rm command should be executed after stop

2018-11-30 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704968#comment-16704968
 ] 

Shane Kumpf commented on YARN-9074:
---

[~ebadger] is correct. To allow for debugging, YARN provides the ability to 
keep artifacts on disk for an admin supplied amount of time (the debug deletion 
delay). Deleting docker containers needs to be handled by the DeletionService 
to be able to support keeping containers around, up until the debug deletion 
delay expires, for debugging purposes.

> Docker container rm command should be executed after stop
> -
>
> Key: YARN-9074
> URL: https://issues.apache.org/jira/browse/YARN-9074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> {code:java}
> @Override
> public void transition(ContainerImpl container, ContainerEvent event) {
> container.setIsReInitializing(false);
> // Set exit code to 0 on success 
> container.exitCode = 0;
> // TODO: Add containerWorkDir to the deletion service.
> if (DockerLinuxContainerRuntime.isDockerContainerRequested(
> container.daemonConf,
> container.getLaunchContext().getEnvironment())) {
> removeDockerContainer(container);
> }
> if (clCleanupRequired) {
> container.dispatcher.getEventHandler().handle(
> new ContainersLauncherEvent(container,
> ContainersLauncherEventType.CLEANUP_CONTAINER));
> }
> container.cleanup();
> }{code}
> Now, when container is finished, NM firstly execute "_docker rm xxx"_  to 
> remove it and this thread is placed in deletionService. see more in YARN-5366 
> .
> Next, NM will execute "_docker stop_" and "docker kill" command. these tow 
> commands are wrapped up in ContainerCleanup thread and executed by 
> ContainersLauncher. see more in YARN-7644. 
> The above will cause the container's cleanup to be split into two threads. I 
> think we should refactor these code to make all docker container killing 
> process be place in ContainerCleanup thread and "_docker rm_" should be 
> executed last.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5168) Add port mapping handling when docker container use bridge network

2018-11-15 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688056#comment-16688056
 ] 

Shane Kumpf commented on YARN-5168:
---

I believe there are two options, and we may want to support both.

{{-P}} - Sets up port mappings based on the EXPOSE directives in the 
Dockerfile. Each EXPOSE'd port will be mapped to an ephemeral port on the NM 
host. Docker manages which ephemeral port is used on the NM side.
{{-p }} - Allow the user to provide the container port that should be 
exposed (note that is NOT : which would be problematic). Each user 
supplied port will be mapped to an ephemeral port on the NM host. Docker 
manages which ephemeral port is used on the NM side. This would be useful when 
the user has forgotten to add the EXPOSE directive to the Dockerfile and would 
avoid the need to modify the image to expose a port. Note that we'd need the 
user supplied input to be able to support tcp and udp ports. These could be two 
separate env variables that contain a comma separated list of ports. 

{quote}The -P parameter is very useful. You can do this -P 8080, let the host 
pick an unoccupied port and bind it to the 8080 port of the container. Then the 
user can know which physical port is used by obtaining the container 
information.{quote}

What version of Docker does this work on? I'm only able to get the -p 8080 
option to do what you state above. -P requires the EXPOSE directive in the 
Dockerfile.

> Add port mapping handling when docker container use bridge network
> --
>
> Key: YARN-5168
> URL: https://issues.apache.org/jira/browse/YARN-5168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Xun Liu
>Priority: Major
>  Labels: Docker
>
> YARN-4007 addresses different network setups when launching the docker 
> container. We need support port mapping when docker container uses bridge 
> network.
> The following problems are what we faced:
> 1. Add "-P" to map docker container's exposed ports to automatically.
> 2. Add "-p" to let user specify specific ports to map.
> 3. Add service registry support for bridge network case, then app could find 
> each other. It could be done out of YARN, however it might be more convenient 
> to support it natively in YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5168) Add port mapping handling when docker container use bridge network

2018-11-13 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685724#comment-16685724
 ] 

Shane Kumpf commented on YARN-5168:
---

Great, thanks for validating, that is inline. +1 from me on the approach. I 
think this is the best we can do for now without tackling the port as a 
resource feature.

> Add port mapping handling when docker container use bridge network
> --
>
> Key: YARN-5168
> URL: https://issues.apache.org/jira/browse/YARN-5168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
>
> YARN-4007 addresses different network setups when launching the docker 
> container. We need support port mapping when docker container uses bridge 
> network.
> The following problems are what we faced:
> 1. Add "-P" to map docker container's exposed ports to automatically.
> 2. Add "-p" to let user specify specific ports to map.
> 3. Add service registry support for bridge network case, then app could find 
> each other. It could be done out of YARN, however it might be more convenient 
> to support it natively in YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5168) Add port mapping handling when docker container use bridge network

2018-11-13 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685660#comment-16685660
 ] 

Shane Kumpf edited comment on YARN-5168 at 11/13/18 7:46 PM:
-

I believe supporting -P will lead to port conflicts. What if two containers 
running on the same NM both expose 8080?

An alternative might be to support the "-p 8080" syntax (NOT "-p 8080:8080", 
only the single port). With this syntax, docker will allocate an ephemeral port 
on the NM and forward traffic to that port to 8080 in the container 
(0.0.0.0:32768 -> container_ip:8080). This allows docker to manage the ports 
and avoid conflict. The user would need to supply the list of ports they want 
to expose at job submission time, likely via an env variable like we do with 
networks.


was (Author: shaneku...@gmail.com):
I believe supporting -P will lead to port conflicts. What if two containers 
running on the same NM both expose 8080?

An alternative might be to support the "-p 8080" syntax (NOT "-p 8080:8080", 
only the single port). With this syntax, docker will allocate an ephemeral port 
on the NM and forward traffic to that port to 8080 in the container 
(0.0.0.0:32768 -> container_ip:8080). The user would need to supply the list of 
ports they want to expose at job submission time, likely via an env variable 
like we do with networks.

> Add port mapping handling when docker container use bridge network
> --
>
> Key: YARN-5168
> URL: https://issues.apache.org/jira/browse/YARN-5168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
>
> YARN-4007 addresses different network setups when launching the docker 
> container. We need support port mapping when docker container uses bridge 
> network.
> The following problems are what we faced:
> 1. Add "-P" to map docker container's exposed ports to automatically.
> 2. Add "-p" to let user specify specific ports to map.
> 3. Add service registry support for bridge network case, then app could find 
> each other. It could be done out of YARN, however it might be more convenient 
> to support it natively in YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5168) Add port mapping handling when docker container use bridge network

2018-11-13 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685660#comment-16685660
 ] 

Shane Kumpf commented on YARN-5168:
---

I believe supporting -P will lead to port conflicts. What if two containers 
running on the same NM both expose 8080?

An alternative might be to support the "-p 8080" syntax (NOT "-p 8080:8080", 
only the single port). With this syntax, docker will allocate an ephemeral port 
on the NM and forward traffic to that port to 8080 in the container 
(0.0.0.0:32768 -> container_ip:8080). The user would need to supply the list of 
ports they want to expose at job submission time, likely via an env variable 
like we do with networks.

> Add port mapping handling when docker container use bridge network
> --
>
> Key: YARN-5168
> URL: https://issues.apache.org/jira/browse/YARN-5168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
>
> YARN-4007 addresses different network setups when launching the docker 
> container. We need support port mapping when docker container uses bridge 
> network.
> The following problems are what we faced:
> 1. Add "-P" to map docker container's exposed ports to automatically.
> 2. Add "-p" to let user specify specific ports to map.
> 3. Add service registry support for bridge network case, then app could find 
> each other. It could be done out of YARN, however it might be more convenient 
> to support it natively in YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6729) Clarify documentation on how to enable cgroup support

2018-10-30 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-6729:
--
Summary: Clarify documentation on how to enable cgroup support  (was: NM 
percentage-physical-cpu-limit should be always 100 if 
DefaultLCEResourcesHandler is used)

> Clarify documentation on how to enable cgroup support
> -
>
> Key: YARN-6729
> URL: https://issues.apache.org/jira/browse/YARN-6729
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-6729-trunk.001.patch
>
>
> NM percentage-physical-cpu-limit is not honored in 
> DefaultLCEResourcesHandler, which may cause container cpu usage calculation 
> issue. e.g. container vcore usage is potentially more than 100% if 
> percentage-physical-cpu-limit is set to a value less than 100. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6729) NM percentage-physical-cpu-limit should be always 100 if DefaultLCEResourcesHandler is used

2018-10-30 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668663#comment-16668663
 ] 

Shane Kumpf commented on YARN-6729:
---

Thanks, [~tangzhankun]! I'm good with that approach. I'll commit this shortly 
unless there are other comments.
{quote}One thing in my mind is that once we put 
"yarn.nodemanager.resource.cpu.enabled" into the document. Does it mean that 
these features are stable? Because you know, for the historical reason, that 
settings are marked "unstable".
{quote}
Enabling CgroupsLCEResourcesHandler follows the same code path, so I don't have 
much concern. We can discuss the annotations and removal of the deprecated code 
as part of YARN-8924.

> NM percentage-physical-cpu-limit should be always 100 if 
> DefaultLCEResourcesHandler is used
> ---
>
> Key: YARN-6729
> URL: https://issues.apache.org/jira/browse/YARN-6729
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-6729-trunk.001.patch
>
>
> NM percentage-physical-cpu-limit is not honored in 
> DefaultLCEResourcesHandler, which may cause container cpu usage calculation 
> issue. e.g. container vcore usage is potentially more than 100% if 
> percentage-physical-cpu-limit is set to a value less than 100. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6729) NM percentage-physical-cpu-limit should be always 100 if DefaultLCEResourcesHandler is used

2018-10-29 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667780#comment-16667780
 ] 

Shane Kumpf edited comment on YARN-6729 at 10/29/18 9:34 PM:
-

Thanks for the patch, [~tangzhankun]! The update looks good to me overall. 
However, I think these docs could use an overhaul. Below are some of my 
thoughts on the rework. I'm OK if we want to open a follow on for the 
suggestions below, given it is a more exhaustive rework, just let me know.

 If you enable LCE and specify {{CgroupsLCEResourcesHandler}} as the handler 
class, LCE has code to force the use of {{DefaultLCEResourcesHandler}}, so LCE 
never uses {{CgroupsLCEResourcesHandler}}. {{CgroupsLCEResourcesHandler}} is 
used by the {{ResourceHandler}} though to determine if the CPU controller 
should be setup, however, {{yarn.nodemanager.resource.cpu.enabled}} does the 
exact same thing. Based on this, I don't think we should list 
{{yarn.nodemanager.linux-container-executor.resources-handler.class}} at all in 
the docs and guide users towards using the 
{{yarn.nodemanager.resource.cpu.enabled}} form.

None of the other {{yarn.nodemanager.resource.*.enabled}} properties are 
listed. It would be good to have links to the elastic memory accounting as well.

Also, the table at the bottom, _CGroups and security,_ really has nothing to do 
with cgroups and should probably be moved to a page about LCE.

I would look to structure these docs something like as follows:
 * Summary
 ** What is cgroups and why would a user want to enable them
 * Prerequisites for enabling YARN cgroup support
 ** Recent OS
 ** Cgroup mounts required and YARN settings
 *** Overview of YARN cgroup mounting capabilities
 ** Enabling LCE
 * Enabling YARN cgroup support
 ** CPU controller
 *** overview of pct and strict limits
 ** Memory controller
 *** Link to memory controller deep dive
 ** Disk controller
 ** Traffic shaping
 ** ...


was (Author: shaneku...@gmail.com):
Thanks for the patch, [~tangzhankun]! The update looks good to me overall. 
However, I think these docs could use an overhaul. Below are some of my 
thoughts on the rework. I'm OK if we want to open a follow on for the 
suggestions below, given it is a more exhaustive rework, just let me know.

 

 

As an example, if you enable LCE and specify {{CgroupsLCEResourcesHandler}} as 
the handler class, LCE has code to force the use of 
{{DefaultLCEResourcesHandler}}, so LCE never uses 
{{CgroupsLCEResourcesHandler}}. {{CgroupsLCEResourcesHandler}} is used by the 
{{ResourceHandler}} though to determine if the CPU controller should be setup, 
however, {{yarn.nodemanager.resource.cpu.enabled}} does the exact same thing. 
Based on this, I don't think we should list 
{{yarn.nodemanager.linux-container-executor.resources-handler.class}} at all in 
the docs and guide users towards using the 
{{yarn.nodemanager.resource.cpu.enabled}} form.

None of the other {{yarn.nodemanager.resource.*.enabled}} properties. It would 
be good to have links to the elastic memory accounting as well.

Also, the table at the bottom, _CGroups and security,_ really has nothing to do 
with cgroups and should probably be moved to a page about LCE.

I would look to structure these docs something like as follows:
 * Summary
 ** What is cgroups and why would a user want to enable them
 * Prerequisites for enabling YARN cgroup support
 ** Recent OS
 ** Cgroup mounts required and YARN settings
 *** Overview of YARN cgroup mounting capabilities
 ** Enabling LCE
 * Enabling YARN cgroup support
 ** CPU controller
 *** overview of pct and strict limits
 ** Memory controller
 *** Link to memory controller deep dive
 ** Disk controller
 ** Traffic shaping
 ** ...

> NM percentage-physical-cpu-limit should be always 100 if 
> DefaultLCEResourcesHandler is used
> ---
>
> Key: YARN-6729
> URL: https://issues.apache.org/jira/browse/YARN-6729
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-6729-trunk.001.patch
>
>
> NM percentage-physical-cpu-limit is not honored in 
> DefaultLCEResourcesHandler, which may cause container cpu usage calculation 
> issue. e.g. container vcore usage is potentially more than 100% if 
> percentage-physical-cpu-limit is set to a value less than 100. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6729) NM percentage-physical-cpu-limit should be always 100 if DefaultLCEResourcesHandler is used

2018-10-29 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667780#comment-16667780
 ] 

Shane Kumpf commented on YARN-6729:
---

Thanks for the patch, [~tangzhankun]! The update looks good to me overall. 
However, I think these docs could use an overhaul. Below are some of my 
thoughts on the rework. I'm OK if we want to open a follow on for the 
suggestions below, given it is a more exhaustive rework, just let me know.

 

 

As an example, if you enable LCE and specify {{CgroupsLCEResourcesHandler}} as 
the handler class, LCE has code to force the use of 
{{DefaultLCEResourcesHandler}}, so LCE never uses 
{{CgroupsLCEResourcesHandler}}. {{CgroupsLCEResourcesHandler}} is used by the 
{{ResourceHandler}} though to determine if the CPU controller should be setup, 
however, {{yarn.nodemanager.resource.cpu.enabled}} does the exact same thing. 
Based on this, I don't think we should list 
{{yarn.nodemanager.linux-container-executor.resources-handler.class}} at all in 
the docs and guide users towards using the 
{{yarn.nodemanager.resource.cpu.enabled}} form.

None of the other {{yarn.nodemanager.resource.*.enabled}} properties. It would 
be good to have links to the elastic memory accounting as well.

Also, the table at the bottom, _CGroups and security,_ really has nothing to do 
with cgroups and should probably be moved to a page about LCE.

I would look to structure these docs something like as follows:
 * Summary
 ** What is cgroups and why would a user want to enable them
 * Prerequisites for enabling YARN cgroup support
 ** Recent OS
 ** Cgroup mounts required and YARN settings
 *** Overview of YARN cgroup mounting capabilities
 ** Enabling LCE
 * Enabling YARN cgroup support
 ** CPU controller
 *** overview of pct and strict limits
 ** Memory controller
 *** Link to memory controller deep dive
 ** Disk controller
 ** Traffic shaping
 ** ...

> NM percentage-physical-cpu-limit should be always 100 if 
> DefaultLCEResourcesHandler is used
> ---
>
> Key: YARN-6729
> URL: https://issues.apache.org/jira/browse/YARN-6729
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-6729-trunk.001.patch
>
>
> NM percentage-physical-cpu-limit is not honored in 
> DefaultLCEResourcesHandler, which may cause container cpu usage calculation 
> issue. e.g. container vcore usage is potentially more than 100% if 
> percentage-physical-cpu-limit is set to a value less than 100. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8924) Refine the document or code related to legacy CPU isolation/enforcement

2018-10-29 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667223#comment-16667223
 ] 

Shane Kumpf commented on YARN-8924:
---

Sorry for the delay. As mentioned, {{LCEResourcesHandler}} and the 
implementations are deprecated. I think it makes sense to update the 
documentation. It looks like that is where things are headed in YARN-6729. I'll 
review that issue.

> Refine the document or code related to legacy CPU isolation/enforcement
> ---
>
> Key: YARN-8924
> URL: https://issues.apache.org/jira/browse/YARN-8924
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Minor
>
> This is to re-think the legacy configuration/code of CPU resource isolation. 
> In YARN-3542, we involve _CGroupsCpuResourceHandlerImpl_ based on new 
> _ResourceHandler_ mechanism but leaves the configuration 
> "yarn.nodemanager.linux-container-executor.resources-handler.class" there for 
> a long time. Now it seems confusing to the end user.
> Check YARN-6729, one sets "_DefaultLCEResourcesHandler_" and found that give 
> "percentage-physical-cpu-limit" a value less than "100" doesn't work.
> As far as I know, internally, the _CgroupsLCEResourcesHandler_ and 
> _DefaultLCEResourcesHandler are_ both deprecated. YARN won't use them anymore.
> Instead, YARN uses _CGroupsCpuResourceHandlerImpl_ to do CPU isolation and 
> only in LCE. If we want to enforce CPU usage, we must set LCE and 
> CgroupsLCEResourceHandler like this:
> {noformat}
> 
>   who will execute(launch) the containers.
>   yarn.nodemanager.container-executor.class
>   
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
> 
> 
>  The class which should help the LCE handle 
> resources.
>  
> yarn.nodemanager.linux-container-executor.resources-handler.class
>  
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler
>  {noformat}
> Based on above settings can the CPU related settings like 
> "percentage-physical-cpu-limit" works as expected.
> To avoid confusing like YARN-6729, we can do two things:
>  # More clear document about how should user configure CPU 
> isolation/enforcement in "NodeManagerCgroups.md"
>  # Make "ResourceHandlerModuler" stable and remove legacy code and update the 
> document to recommend new setting "yarn.nodemanager.resource.cpu.enabled"
> Thoughts? [~leftnoteasy], [~vinodkv], [~vvasudev]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8785) Improve the error message when a bind mount is not whitelisted

2018-10-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635477#comment-16635477
 ] 

Shane Kumpf commented on YARN-8785:
---

Thanks for the contribution [~simonprewo] and thanks to [~eyang] [~Zian Chen] 
[~leftnoteasy] for the assistance and reviews. I committed this to trunk and 
branch-3.1.

> Improve the error message when a bind mount is not whitelisted
> --
>
> Key: YARN-8785
> URL: https://issues.apache.org/jira/browse/YARN-8785
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 2.9.1, 3.1.1, 3.1.2
>Reporter: Simon Prewo
>Assignee: Simon Prewo
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.2
>
> Attachments: YARN-8785-branch-3.1.002.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> A user recieves the error message _Invalid docker rw mount_ when a container 
> tries to mount a directory which is not configured in property  
> *docker.allowed.rw-mounts*. 
> {code:java}
> Invalid docker rw mount 
> '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01',
>  
> realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code}
> The error message makes the user think "It is not possible due to a docker 
> issue". My suggestion would be to put there a message like *Configuration of 
> the container executor does not allow mounting directory.*.
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
> CURRENT:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", 
> values[i], mount_src);
> ...
> {code}
> NEW:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Configuration of the container executor does not 
> allow mounting directory '%s', realpath=%s\n", values[i], mount_src);
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8785) Improve the error message when a bind mount is not whitelisted

2018-10-02 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8785:
--
Summary: Improve the error message when a bind mount is not whitelisted  
(was: Error Message "Invalid docker rw mount" not helpful)

> Improve the error message when a bind mount is not whitelisted
> --
>
> Key: YARN-8785
> URL: https://issues.apache.org/jira/browse/YARN-8785
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 2.9.1, 3.1.1, 3.1.2
>Reporter: Simon Prewo
>Assignee: Simon Prewo
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.2
>
> Attachments: YARN-8785-branch-3.1.002.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> A user recieves the error message _Invalid docker rw mount_ when a container 
> tries to mount a directory which is not configured in property  
> *docker.allowed.rw-mounts*. 
> {code:java}
> Invalid docker rw mount 
> '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01',
>  
> realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code}
> The error message makes the user think "It is not possible due to a docker 
> issue". My suggestion would be to put there a message like *Configuration of 
> the container executor does not allow mounting directory.*.
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
> CURRENT:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", 
> values[i], mount_src);
> ...
> {code}
> NEW:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Configuration of the container executor does not 
> allow mounting directory '%s', realpath=%s\n", values[i], mount_src);
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists

2018-09-25 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627274#comment-16627274
 ] 

Shane Kumpf commented on YARN-8623:
---

Thanks [~ccondit-target] for the contribution and thanks to [~elek] and 
[~eyang] for the discussion. I committed this to trunk and branch-3.1.

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Attachments: YARN-8623.001.patch
>
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists

2018-09-25 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627270#comment-16627270
 ] 

Shane Kumpf commented on YARN-8623:
---

+1 thanks for the patch, [~ccondit-target]! So that we have a working example, 
I'm going to commit this. We can continue to expand this section as we make it 
easier to run unmodified applications.

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Attachments: YARN-8623.001.patch
>
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8785) Error Message "Invalid docker rw mount" not helpful

2018-09-25 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627256#comment-16627256
 ] 

Shane Kumpf commented on YARN-8785:
---

I'll note that we will also want a patch for trunk, so you'll want to add a 
"YARN-8785.002.patch" that applies to trunk and upload that here too.

> Error Message "Invalid docker rw mount" not helpful
> ---
>
> Key: YARN-8785
> URL: https://issues.apache.org/jira/browse/YARN-8785
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Simon Prewo
>Assignee: Simon Prewo
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8785.001.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> A user recieves the error message _Invalid docker rw mount_ when a container 
> tries to mount a directory which is not configured in property  
> *docker.allowed.rw-mounts*. 
> {code:java}
> Invalid docker rw mount 
> '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01',
>  
> realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code}
> The error message makes the user think "It is not possible due to a docker 
> issue". My suggestion would be to put there a message like *Configuration of 
> the container executor does not allow mounting directory.*.
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
> CURRENT:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", 
> values[i], mount_src);
> ...
> {code}
> NEW:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Configuration of the container executor does not 
> allow mounting directory '%s', realpath=%s\n", values[i], mount_src);
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8785) Error Message "Invalid docker rw mount" not helpful

2018-09-25 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627254#comment-16627254
 ] 

Shane Kumpf commented on YARN-8785:
---

[~simonprewo] - to specify a branch, rename your patch file to include the 
branch name: YARN-8785-.001.patch

Following this convention, the name for this patch would be: 
YARN-8785-branch-3.1.001.patch

> Error Message "Invalid docker rw mount" not helpful
> ---
>
> Key: YARN-8785
> URL: https://issues.apache.org/jira/browse/YARN-8785
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Simon Prewo
>Assignee: Simon Prewo
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8785.001.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> A user recieves the error message _Invalid docker rw mount_ when a container 
> tries to mount a directory which is not configured in property  
> *docker.allowed.rw-mounts*. 
> {code:java}
> Invalid docker rw mount 
> '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01',
>  
> realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code}
> The error message makes the user think "It is not possible due to a docker 
> issue". My suggestion would be to put there a message like *Configuration of 
> the container executor does not allow mounting directory.*.
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
> CURRENT:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", 
> values[i], mount_src);
> ...
> {code}
> NEW:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Configuration of the container executor does not 
> allow mounting directory '%s', realpath=%s\n", values[i], mount_src);
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8805) Automatically convert the launch command to the exec form when using entrypoint support

2018-09-20 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622503#comment-16622503
 ] 

Shane Kumpf commented on YARN-8805:
---

Thanks, [~Zian Chen]. Please feel free to take this.

> Automatically convert the launch command to the exec form when using 
> entrypoint support
> ---
>
> Key: YARN-8805
> URL: https://issues.apache.org/jira/browse/YARN-8805
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Priority: Major
>  Labels: Docker
>
> When {{YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE}} is true, and a 
> launch command is provided, it is expected that the launch command is 
> provided by the user in exec form.
> For example:
> {code:java}
> "/usr/bin/sleep 6000"{code}
> must be changed to:
> {code}"/usr/bin/sleep,6000"{code}
> If this is not done, the container will never start and will be in a Created 
> state. We should automatically do this conversion vs making the user 
> understand this nuance of using the entrypoint support. Docs should be 
> updated to reflect this change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8805) Automatically convert the launch command to the exec form when using entrypoint support

2018-09-20 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8805:
-

 Summary: Automatically convert the launch command to the exec form 
when using entrypoint support
 Key: YARN-8805
 URL: https://issues.apache.org/jira/browse/YARN-8805
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Shane Kumpf


When {{YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE}} is true, and a 
launch command is provided, it is expected that the launch command is provided 
by the user in exec form.

For example:
{code:java}
"/usr/bin/sleep 6000"{code}
must be changed to:

{code}"/usr/bin/sleep,6000"{code}

If this is not done, the container will never start and will be in a Created 
state. We should automatically do this conversion vs making the user understand 
this nuance of using the entrypoint support. Docs should be updated to reflect 
this change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8791) When STOPSIGNAL is not present then docker inspect returns an extra line feed

2018-09-19 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620748#comment-16620748
 ] 

Shane Kumpf commented on YARN-8791:
---

Thanks for the patch, [~csingh]! I have confirmed this fixes the issue. +1 
pending Jenkins

> When STOPSIGNAL is not present then docker inspect returns an extra line feed
> -
>
> Key: YARN-8791
> URL: https://issues.apache.org/jira/browse/YARN-8791
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8791.001.patch
>
>
> When the STOPSIGNAL is missing, then an extra line feed is appended to the 
> output. This messes with the signal sent to the docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8783) Improve the documentation for the docker.trusted.registries configuration

2018-09-18 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8783:
--
Summary: Improve the documentation for the docker.trusted.registries 
configuration  (was: Property docker.trusted.registries does not work when 
using a list)

> Improve the documentation for the docker.trusted.registries configuration
> -
>
> Key: YARN-8783
> URL: https://issues.apache.org/jira/browse/YARN-8783
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker, container-executor, docker
>
> I am deploying the default yarn distributed shell example:
> {code:java}
> yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar 
> hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
> Having a *single trusted registry configured like this works*:
> {code:java}
> docker.trusted.registries=centos{code}
> But having *a list of trusted registries configured fails* ("Shell error 
> output: image: centos is not trusted."):
> {code:java}
> docker.trusted.registries=centos,ubuntu{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists

2018-09-17 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618172#comment-16618172
 ] 

Shane Kumpf commented on YARN-8623:
---

The direction I was proposing for the default MR pi example was to use the 
openjdk:8 image and bind mount /etc/passwd and /etc/group. This should work in 
a good number of "try it out" use cases without needing to modify the existing 
hadoop-runner image. We could point to the SSSD instructions we have as an 
alternative way to manage users. Thoughts?

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8783) Property docker.trusted.registries does not work when using a list

2018-09-17 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8783:
--
Labels: Docker container-executor docker  (was: container-executor docker)

> Property docker.trusted.registries does not work when using a list
> --
>
> Key: YARN-8783
> URL: https://issues.apache.org/jira/browse/YARN-8783
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker, container-executor, docker
>
> I am deploying the default yarn distributed shell example:
> {code:java}
> yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar 
> hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
> Having a *single trusted registry configured like this works*:
> {code:java}
> docker.trusted.registries=centos{code}
> But having *a list of trusted registries configured fails* ("Shell error 
> output: image: centos is not trusted."):
> {code:java}
> docker.trusted.registries=centos,ubuntu{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8783) Property docker.trusted.registries does not work when using a list

2018-09-17 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8783:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-8472

> Property docker.trusted.registries does not work when using a list
> --
>
> Key: YARN-8783
> URL: https://issues.apache.org/jira/browse/YARN-8783
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker, container-executor, docker
>
> I am deploying the default yarn distributed shell example:
> {code:java}
> yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar 
> hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
> Having a *single trusted registry configured like this works*:
> {code:java}
> docker.trusted.registries=centos{code}
> But having *a list of trusted registries configured fails* ("Shell error 
> output: image: centos is not trusted."):
> {code:java}
> docker.trusted.registries=centos,ubuntu{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8785) Error Message "Invalid docker rw mount" not helpful

2018-09-17 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8785:
--
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-8472

> Error Message "Invalid docker rw mount" not helpful
> ---
>
> Key: YARN-8785
> URL: https://issues.apache.org/jira/browse/YARN-8785
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> A user recieves the error message _Invalid docker rw mount_ when a container 
> tries to mount a directory which is not configured in property  
> *docker.allowed.rw-mounts*. 
> {code:java}
> Invalid docker rw mount 
> '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01',
>  
> realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code}
> The error message makes the user think "It is not possible due to a docker 
> issue". My suggestion would be to put there a message like *Configuration of 
> the container executor does not allow mounting directory.*.
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
> CURRENT:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", 
> values[i], mount_src);
> ...
> {code}
> NEW:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Configuration of the container executor does not 
> allow mounting directory '%s', realpath=%s\n", values[i], mount_src);
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8785) Error Message "Invalid docker rw mount" not helpful

2018-09-17 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8785:
--
Labels: Docker  (was: )

> Error Message "Invalid docker rw mount" not helpful
> ---
>
> Key: YARN-8785
> URL: https://issues.apache.org/jira/browse/YARN-8785
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> A user recieves the error message _Invalid docker rw mount_ when a container 
> tries to mount a directory which is not configured in property  
> *docker.allowed.rw-mounts*. 
> {code:java}
> Invalid docker rw mount 
> '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01',
>  
> realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code}
> The error message makes the user think "It is not possible due to a docker 
> issue". My suggestion would be to put there a message like *Configuration of 
> the container executor does not allow mounting directory.*.
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
> CURRENT:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", 
> values[i], mount_src);
> ...
> {code}
> NEW:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Configuration of the container executor does not 
> allow mounting directory '%s', realpath=%s\n", values[i], mount_src);
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Moved] (YARN-8785) Error Message "Invalid docker rw mount" not helpful

2018-09-17 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf moved HADOOP-15734 to YARN-8785:


Affects Version/s: (was: 3.1.1)
   (was: 2.9.1)
   2.9.1
   3.1.1
  Key: YARN-8785  (was: HADOOP-15734)
  Project: Hadoop YARN  (was: Hadoop Common)

> Error Message "Invalid docker rw mount" not helpful
> ---
>
> Key: YARN-8785
> URL: https://issues.apache.org/jira/browse/YARN-8785
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.1, 2.9.1
>Reporter: Simon Prewo
>Priority: Major
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> A user recieves the error message _Invalid docker rw mount_ when a container 
> tries to mount a directory which is not configured in property  
> *docker.allowed.rw-mounts*. 
> {code:java}
> Invalid docker rw mount 
> '/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01:/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01',
>  
> realpath=/usr/local/hadoop/logs/userlogs/application_1536476159258_0004/container_1536476159258_0004_02_01{code}
> The error message makes the user think "It is not possible due to a docker 
> issue". My suggestion would be to put there a message like *Configuration of 
> the container executor does not allow mounting directory.*.
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
> CURRENT:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", 
> values[i], mount_src);
> ...
> {code}
> NEW:
> {code:java}
> permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
> mount_src);
> permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
> mount_src);
> if (permitted_ro == -1 || permitted_rw == -1) {
>   fprintf(ERRORFILE, "Configuration of the container executor does not 
> allow mounting directory '%s', realpath=%s\n", values[i], mount_src);
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8783) Property docker.trusted.registries does not work when using a list

2018-09-17 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617933#comment-16617933
 ] 

Shane Kumpf commented on YARN-8783:
---

[~simonprewo] - one trick here is that there is an implicit namespace at Docker 
hub for these "official" images, it is "library". Changing your registry list 
to include "library" and changing your image to "library/centos" should allow 
this to work without the need to tag locally.

> Property docker.trusted.registries does not work when using a list
> --
>
> Key: YARN-8783
> URL: https://issues.apache.org/jira/browse/YARN-8783
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: container-executor, docker
>
> I am deploying the default yarn distributed shell example:
> {code:java}
> yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar 
> hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
> Having a *single trusted registry configured like this works*:
> {code:java}
> docker.trusted.registries=centos{code}
> But having *a list of trusted registries configured fails* ("Shell error 
> output: image: centos is not trusted."):
> {code:java}
> docker.trusted.registries=centos,ubuntu{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-14 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615073#comment-16615073
 ] 

Shane Kumpf commented on YARN-8045:
---

Thanks again for the contribution, [~ccondit-target]. Committed to trunk.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8045.001.patch
>
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8748) Javadoc warnings within the nodemanager package

2018-09-14 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615055#comment-16615055
 ] 

Shane Kumpf commented on YARN-8748:
---

Thanks again for the contribution, [~ccondit-target]. Committed to trunk.

> Javadoc warnings within the nodemanager package
> ---
>
> Key: YARN-8748
> URL: https://issues.apache.org/jira/browse/YARN-8748
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Trivial
> Fix For: 3.2.0
>
> Attachments: YARN-8748.001.patch
>
>
> There are a number of javadoc warnings in trunk in classes under the 
> nodemanager package. These should be addressed or suppressed.
> {code:java}
> [WARNING] Javadoc Warnings
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java:93:
>  warning - Tag @see: reference not found: 
> ContainerLaunch.ShellScriptBuilder#listDebugInformation
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX (referenced by @value 
> tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_FILE_PERMISSIONS 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY (referenced by 
> @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY_GROUP_PREFIX 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211:
>  warning - NMContainerPolicyUtils#SECURITY_FLAG (referenced by @value tag) is 
> an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java:248:
>  warning - @return tag has no arguments.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-13 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613969#comment-16613969
 ] 

Shane Kumpf commented on YARN-8045:
---

Thanks for the patch, [~ccondit-target]. This is much better. +1 I'll commit 
this shortly.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
> Attachments: YARN-8045.001.patch
>
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8748) Javadoc warnings within the nodemanager package

2018-09-13 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613661#comment-16613661
 ] 

Shane Kumpf commented on YARN-8748:
---

Thanks for the contribution, [~ccondit-target]. It's a bummer we need to 
introduce new warnings to address these warnings, but I see what you mean. +1 
I'll commit this shortly.

> Javadoc warnings within the nodemanager package
> ---
>
> Key: YARN-8748
> URL: https://issues.apache.org/jira/browse/YARN-8748
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Trivial
> Attachments: YARN-8748.001.patch
>
>
> There are a number of javadoc warnings in trunk in classes under the 
> nodemanager package. These should be addressed or suppressed.
> {code:java}
> [WARNING] Javadoc Warnings
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java:93:
>  warning - Tag @see: reference not found: 
> ContainerLaunch.ShellScriptBuilder#listDebugInformation
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX (referenced by @value 
> tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_FILE_PERMISSIONS 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY (referenced by 
> @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY_GROUP_PREFIX 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211:
>  warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP 
> (referenced by @value tag) is an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211:
>  warning - NMContainerPolicyUtils#SECURITY_FLAG (referenced by @value tag) is 
> an unknown reference.
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java:248:
>  warning - @return tag has no arguments.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8768) Javadoc error in node attributes

2018-09-12 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612746#comment-16612746
 ] 

Shane Kumpf commented on YARN-8768:
---

Thanks to [~sunilg] for the contribution and [~rohithsharma] for the review. I 
validated that this fixed the build for me as well. Committed to trunk.

> Javadoc error in node attributes
> 
>
> Key: YARN-8768
> URL: https://issues.apache.org/jira/browse/YARN-8768
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-8768.001.patch
>
>
> fix java doc error from node attributes in yarn-api module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-06 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606463#comment-16606463
 ] 

Shane Kumpf commented on YARN-8045:
---

Good call, that sounds good to me.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Priority: Major
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-06 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606394#comment-16606394
 ] 

Shane Kumpf commented on YARN-8045:
---

Thanks for the proposal [~ccondit-target]. Moving the meat of the diagnostics 
field to DEBUG makes sense to me and would meet the requirement with minimal 
change.

My one concern is how that might impact compatibility. HADOOP-13714 recently 
updated the [compatibility 
guide|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md#log-output],
 which includes logs. Given that logs are considered Unstable, I think we are 
safe, but there is a note about ensuring existing parsers don't break. Can we 
consider the parser requirement in moving this entry to DEBUG?

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Priority: Major
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606189#comment-16606189
 ] 

Shane Kumpf commented on YARN-8751:
---

Thanks for the feedback and suggestions everyone. I think the issue is most 
likely to happen under relaunch conditions with a poorly behaving container (as 
noted by [~eyang]). Relaunch (afaik) is only used by YARN Services today, so 
the impact may be isolated. Having said that, based on the conversation here, 
it does appear there are other non-fatal cases that could trigger these errors, 
so I'm +1 on the proposal from [~jlowe] affecting both launch and relaunch.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> 

[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8751:
--
Description: 
{{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
{{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
occurs based on the exit code returned by container-executor, and 7 different 
exit codes cause the NM to be marked UNHEALTHY.
{code:java}
if (exitCode ==
ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
exitCode ==
ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
  throw new ConfigurationException(
  "Linux Container Executor reached unrecoverable exception", e);{code}
I can understand why these are treated as fatal with the existing process 
container model. However, with privileged Docker containers this may be too 
harsh, as Privileged Docker containers don't guarantee the user's identity will 
be propagated into the container, so these mismatches can occur. Outside of 
privileged containers, an application may inadvertently change the permissions 
on one of these directories, triggering this condition.

In our case, a container changed the "appcache//" directory 
permissions to 774. Some time later, the process in the container died and the 
Retry Policy kicked in to RELAUNCH the container. When the RELAUNCH occurred, 
container-executor checked the permissions of the 
"appcache//" directory (the existing workdir is retained 
for RELAUNCH) and returned exit code 35. Exit code 35 is 
COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
containers running on that node, when really only this container would have 
been impacted.
{code:java}
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e15_1535130383425_0085_01_05
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exit code: 35
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch container 
failed
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell error output: Could not create 
container dirsCould not create local files and directories 5 6
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) -
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 
4
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : run as user is user
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating script paths...
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating local dirs...
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Path 
/grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
 has permission 774 but needs per
mission 750.
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
(ContainerRelaunch.java:call(129)) - Failed to launch container due to 
configuration error.
org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container 
Executor reached unrecoverable exception
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleExitCode(LinuxContainerExecutor.java:633)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:573)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
at 

[jira] [Created] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8751:
-

 Summary: Container-executor permission check errors cause the NM 
to be marked unhealthy
 Key: YARN-8751
 URL: https://issues.apache.org/jira/browse/YARN-8751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shane Kumpf


{{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
{{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
occurs based on the exit code returned by container-executor, and 7 different 
exit codes cause the NM to be marked UNHEALTHY.
{code:java}
if (exitCode ==
ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
exitCode ==
ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
  throw new ConfigurationException(
  "Linux Container Executor reached unrecoverable exception", e);{code}
I can understand why these are treated as fatal with the existing process 
container model. However, with privileged Docker containers this may be too 
harsh, as Privileged Docker containers don't guarantee the user's identity will 
be propagated into the container.

In our case, a privileged container changed the 
"appcache//" directory permissions to 774. Some time later, 
the process in the container died and the Retry Policy kicked in to RELAUNCH 
the container. When the RELAUNCH occurred, container-executor checked the 
permissions of the "appcache//" directory (the existing 
workdir is retained for RELAUNCH) and returned exit code 35. Exit code 35 is 
COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
containers running on that node, when really only this container would have 
been impacted.
{code:java}
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e15_1535130383425_0085_01_05
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exit code: 35
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch container 
failed
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell error output: Could not create 
container dirsCould not create local files and directories 5 6
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) -
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 
4
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : run as user is user
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating script paths...
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating local dirs...
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Path 
/grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
 has permission 774 but needs per
mission 750.
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
(ContainerRelaunch.java:call(129)) - Failed to launch container due to 
configuration error.
org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container 
Executor reached unrecoverable exception
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleExitCode(LinuxContainerExecutor.java:633)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:573)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)

[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-09-05 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604384#comment-16604384
 ] 

Shane Kumpf commented on YARN-8638:
---

I opened HADOOP-15721 and YARN-8748 to discuss disabling/fixing the two 
pre-commit warnings encountered here.

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch, 
> YARN-8638.003.patch, YARN-8638.004.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8748) Javadoc warnings within the nodemanager package

2018-09-05 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8748:
-

 Summary: Javadoc warnings within the nodemanager package
 Key: YARN-8748
 URL: https://issues.apache.org/jira/browse/YARN-8748
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Shane Kumpf


There are a number of javadoc warnings in trunk in classes under the 
nodemanager package. These should be addressed or suppressed.
{code:java}
[WARNING] Javadoc Warnings
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java:93:
 warning - Tag @see: reference not found: 
ContainerLaunch.ShellScriptBuilder#listDebugInformation
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
 warning - YarnConfiguration#YARN_CONTAINER_SANDBOX (referenced by @value tag) 
is an unknown reference.
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
 warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_FILE_PERMISSIONS 
(referenced by @value tag) is an unknown reference.
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
 warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY (referenced by 
@value tag) is an unknown reference.
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
 warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP (referenced 
by @value tag) is an unknown reference.
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:118:
 warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_POLICY_GROUP_PREFIX 
(referenced by @value tag) is an unknown reference.
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211:
 warning - YarnConfiguration#YARN_CONTAINER_SANDBOX_WHITELIST_GROUP (referenced 
by @value tag) is an unknown reference.
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java:211:
 warning - NMContainerPolicyUtils#SECURITY_FLAG (referenced by @value tag) is 
an unknown reference.
[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java:248:
 warning - @return tag has no arguments.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-09-05 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604366#comment-16604366
 ] 

Shane Kumpf commented on YARN-8638:
---

Thanks for the contribution, [~ccondit-target] and thank you all for the 
reviews and discussion. I committed this to trunk and branch-3.1.

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch, 
> YARN-8638.003.patch, YARN-8638.004.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-09-01 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599643#comment-16599643
 ] 

Shane Kumpf commented on YARN-8638:
---

Thanks for the patch [~ccondit-target]! I have been able to successfully test 
this feature using a pluggable runtime. I can understand your reasoning behind 
ignoring the remaining warnings. It would be good to open an issue  (likely a 
HADOOP JIRA) to start a conversation about removing these checks if they don't 
make sense and/or fixing the current issues. Beyond the warnings, the patch 
lgtm. I'll commit this after the holiday.



> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch, 
> YARN-8638.003.patch, YARN-8638.004.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8642) Add support for tmpfs mounts with the Docker runtime

2018-08-29 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596307#comment-16596307
 ] 

Shane Kumpf commented on YARN-8642:
---

Thanks to [~ccondit-target] for the contribution and [~ebadger] and [~eyang] 
for the reviews! I committed this to trunk and branch-3.1.

> Add support for tmpfs mounts with the Docker runtime
> 
>
> Key: YARN-8642
> URL: https://issues.apache.org/jira/browse/YARN-8642
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8642.001.patch, YARN-8642.002.patch
>
>
> Add support to the existing Docker runtime to allow the user to request tmpfs 
> mounts for their containers. For example:
> {code}/usr/bin/docker run --name=container_name --tmpfs /run image 
> /bootstrap/start-systemd
> {code}
> One use case is to allow systemd to run as PID 1 in a non-privileged 
> container, /run is expected to be a tmpfs mount in the container for that to 
> work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-08-28 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595562#comment-16595562
 ] 

Shane Kumpf commented on YARN-8706:
---

Seems like a reasonable solution to me. {{docker stop}} has been a pain point, 
so removing that call while still supporting STOPSIGNAL sounds like what we 
want.

> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: docker
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> the smallest value, which is 1 second, because anyways we are forcing kill 
> after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-08-28 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595507#comment-16595507
 ] 

Shane Kumpf commented on YARN-8706:
---

Thanks for reporting this, [~csingh]. I know several of us discussed this in 
the past and ran into some sticking points.

As [~ebadger] points out, the reason for using {{docker stop}} is to be able to 
leverage the STOPSIGNAL directive that can be used in Dockerfiles. {{docker 
stop}} will issue the signal defined in the STOPSIGNAL instead of SIGTERM. This 
is important for gracefully stopping databases and even systemd (which expects 
SIGRTMIN+3).

> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: docker
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> the smallest value, which is 1 second, because anyways we are forcing kill 
> after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-08-28 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf reassigned YARN-8706:
-

Assignee: Shane Kumpf  (was: Chandni Singh)

> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: docker
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> the smallest value, which is 1 second, because anyways we are forcing kill 
> after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-08-28 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf reassigned YARN-8706:
-

Assignee: Chandni Singh  (was: Shane Kumpf)

> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: docker
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> the smallest value, which is 1 second, because anyways we are forcing kill 
> after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists

2018-08-28 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595488#comment-16595488
 ] 

Shane Kumpf commented on YARN-8623:
---

Thanks [~elek]. Craig and I had a quick offline chat. The public _openjdk:8_ 
image works for the MR pi example being documented here. The direction we are 
going is to bind mount the Hadoop bits and config into the container. Using the 
_openjdk:8_ image makes it clear we haven't packaged Hadoop in the image, which 
is important to convey. Given this, I think we should leave 
_apache/hadoop-runner_ as is and use the _openjdk:8_ image in our example.

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8642) Add support for tmpfs mounts with the Docker runtime

2018-08-28 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595446#comment-16595446
 ] 

Shane Kumpf commented on YARN-8642:
---

Thanks for the patch, [~ccondit-target]! With this patch (+ prior features 
we've added), I'm happy to report being able to run systemd as PID 1 in a 
non-privileged container!

{quote}In Redhat, tmpfs is automatically created for /run and /run/users/$uid. 
How to automate mounting of /run/users/$uid with the current 
implementation?{quote}
This need will be dependent on what is running in the container. It would be 
nice to be able to reference UID and GID by variable, as you've outlined. Maybe 
resolving those variables within the mount related environment variables is a 
task the YARN Services AM could handle? Could we discuss in a follow on since 
this seems like a useful feature beyond just the tmpfs mounts?

I'm +1 on the latest patch unless there are additional concerns we need to 
address.

> Add support for tmpfs mounts with the Docker runtime
> 
>
> Key: YARN-8642
> URL: https://issues.apache.org/jira/browse/YARN-8642
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8642.001.patch, YARN-8642.002.patch
>
>
> Add support to the existing Docker runtime to allow the user to request tmpfs 
> mounts for their containers. For example:
> {code}/usr/bin/docker run --name=container_name --tmpfs /run image 
> /bootstrap/start-systemd
> {code}
> One use case is to allow systemd to run as PID 1 in a non-privileged 
> container, /run is expected to be a tmpfs mount in the container for that to 
> work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590633#comment-16590633
 ] 

Shane Kumpf commented on YARN-8704:
---

[~cheersyang] - Thanks for reporting this. Can you share a bit more detail on 
which log you saw the error in? It would also be helpful if you could share the 
log entries surrounding the error as well. There should be a log entry calling 
out the problematic mount, so I'm curious if it was just overlooked.

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf reopened YARN-8675:
---

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590143#comment-16590143
 ] 

Shane Kumpf commented on YARN-8675:
---

Looking back over the history here, I think we made the wrong decision in 
setting the YARN defined hostname when {{\-\-net=host}}. I think we should make 
{{\-\-net=host}} return the NM hostname, even if Registry DNS is enabled, as 
you originally proposed in YARN-7797 via your early patches, [~eyang]. While 
using the YARN defined hostname was nice for testing, it breaks several aspects 
of running both Services and "native" Hadoop frameworks, such as MR and Spark, 
side by side, which is a core goal of the containerization effort. 

The problem isn't domain, it is that the "ctr" hostname we are setting won't 
exist in DNS for these containers. Concretely, the NM will set 
{{\-\-hostname=ctr-e111-111-11-01-06.domain.site}} even though 
that entry will never be available via DNS, since the Spark job is not running 
as a YARN Service, and not writing any entries to ZK. Anything related to that 
container that relies on DNS lookups will fail.

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3611) Support Docker Containers In LinuxContainerExecutor

2018-08-20 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586630#comment-16586630
 ] 

Shane Kumpf commented on YARN-3611:
---

[~zhouyunfan] - thank you for your interest! Please see the [YARN 
containerization 
docs|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md]
 as a starting point. If you have specific questions after that, please do 
reach out on the hadoop-user mailing list.

> Support Docker Containers In LinuxContainerExecutor
> ---
>
> Key: YARN-3611
> URL: https://issues.apache.org/jira/browse/YARN-3611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
>
> Support Docker Containers In LinuxContainerExecutor
> LinuxContainerExecutor provides useful functionality today with respect to 
> localization, cgroups based resource management and isolation for CPU, 
> network, disk etc. as well as security with a well-defined mechanism to 
> execute privileged operations using the container-executor utility.  Bringing 
> docker support to LinuxContainerExecutor lets us use all of this 
> functionality when running docker containers under YARN, while not requiring 
> users and admins to configure and use a different ContainerExecutor. 
> There are several aspects here that need to be worked through :
> * Mechanism(s) to let clients request docker-specific functionality - we 
> could initially implement this via environment variables without impacting 
> the client API.
> * Security - both docker daemon as well as application
> * Docker image localization
> * Running a docker container via container-executor as a specified user
> * “Isolate” the docker container in terms of CPU/network/disk/etc
> * Communicating with and/or signaling the running container (ensure correct 
> pid handling)
> * Figure out workarounds for certain performance-sensitive scenarios like 
> HDFS short-circuit reads 
> * All of these need to be achieved without changing the current behavior of 
> LinuxContainerExecutor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8623) Update Docker examples to use image which exists

2018-08-20 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586592#comment-16586592
 ] 

Shane Kumpf edited comment on YARN-8623 at 8/20/18 10:34 PM:
-

[~elek] - thanks, those details are helpful. It does appear 
_apache/hadoop-runner_ is closer to what we want than I originally thought, but 
the user setup clashes with our needs. With a goal of trying to provide a 
working MR pi example, MapReduce expects to run (and write data) as the end 
user (or a static local user, such as nobody, depending on config), so we need 
to propagate the user identity into the container. I expect Spark needs this as 
well.

Removing the use of sudo in the entrypoint script, gating that {{sudo chmod}} 
in the starter script via an env variable, or opening up the sudo rules would 
all seem to work to allow us to use this for YARN as well.

I think we should open a separate HADOOP Jira to discuss making the image work 
for both cases if that makes sense to others. [~elek] [~ccondit-target] 
thoughts?


was (Author: shaneku...@gmail.com):
[~elek] - thanks, those details are helpful. It does appear 
_apache/hadoop-runner_ is closer to what we want than I originally thought, but 
the user setup clashes with our needs. With a goal of trying to provide a 
working MR pi example, MapReduce expects to run (and write data) as the end 
user (or a static local user, such as nobody, depending on config). I expect 
Spark does as well.

Removing the use of sudo in the entrypoint script, gating that {{sudo chmod}} 
in the starter script via an env variable, or opening up the sudo rules would 
all seem to work to allow us to use this for YARN as well.

I think we should open a separate HADOOP Jira to discuss making the image work 
for both cases if that makes sense to others. [~elek] [~ccondit-target] 
thoughts?

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists

2018-08-20 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586601#comment-16586601
 ] 

Shane Kumpf commented on YARN-8623:
---

I was able to run the below MR pi job with the modified _apache/hadoop-runner_ 
image, after a quick hack to the sudo rules.
{code:java}
YARN_EXAMPLES_JAR=$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar
IMAGE_ID="local/hadoop-runner-new:latest"
MOUNTS="/usr/local/hadoop:/usr/local/hadoop:ro,/etc/hadoop/conf:/etc/hadoop/conf:ro,/etc/passwd:/etc/passwd:ro,/etc/group:/etc/group:ro"

yarn jar $YARN_EXAMPLES_JAR pi \
-Dmapreduce.map.env.YARN_CONTAINER_RUNTIME_TYPE=docker \
-Dmapreduce.map.env.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS \
-Dmapreduce.map.env.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$IMAGE_ID \
-Dmapreduce.reduce.env.YARN_CONTAINER_RUNTIME_TYPE=docker \
-Dmapreduce.reduce.env.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS \
-Dmapreduce.reduce.env.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$IMAGE_ID 1 
4{code}
Hadoop bits were installed to {{/usr/local/hadoop}} on the host. Hadoop config 
in {{/etc/hadoop/conf}} on the host. The appropriate mounts were added to 
{{docker.allowed.ro-mounts}} and the image prefix to 
{{docker.trusted.registries}} in {{container-executor.cfg}}

The above assumes the use of {{/etc/passwd}} and {{/etc/group}} for propagating 
the user and group into the container. We should point to the other ways of 
[managing user 
propagation|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md#user-management-in-docker-container]
 as part of this example documentation.

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists

2018-08-20 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586592#comment-16586592
 ] 

Shane Kumpf commented on YARN-8623:
---

[~elek] - thanks, those details are helpful. It does appear 
_apache/hadoop-runner_ is closer to what we want than I originally thought, but 
the user setup clashes with our needs. With a goal of trying to provide a 
working MR pi example, MapReduce expects to run (and write data) as the end 
user (or a static local user, such as nobody, depending on config). I expect 
Spark does as well.

Removing the use of sudo in the entrypoint script, gating that {{sudo chmod}} 
in the starter script via an env variable, or opening up the sudo rules would 
all seem to work to allow us to use this for YARN as well.

I think we should open a separate HADOOP Jira to discuss making the image work 
for both cases if that makes sense to others. [~elek] [~ccondit-target] 
thoughts?

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576832#comment-16576832
 ] 

Shane Kumpf commented on YARN-8520:
---

Thanks for the contribution, [~eyang]! I committed this to trunk and branch-3.1.

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576681#comment-16576681
 ] 

Shane Kumpf commented on YARN-8569:
---

Thanks for filing this [~eyang]. I have a use case that could benefit from this 
as well.

When running in containers, one challenging piece is determining how much CPU 
and memory was allocated to the container. Traditional os tooling shows the 
totals from the host. This is especially problematic for tools like Ambari, 
which use os tooling to dynamically set configuration. Exposing the resource 
request details via this mechanism could be used to solve this problem.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8623) Update Docker examples to use image which exists

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576655#comment-16576655
 ] 

Shane Kumpf edited comment on YARN-8623 at 8/10/18 6:07 PM:


[~ccondit-target] - thanks for looking into this. I see what you mean about the 
challenge with using that image. I think you are correct that the existing 
apache/hadoop-runner image serves a different type of use case than we need 
here.

IMO, our target should be an image capable of running MapReduce pi, as that's 
the example we provide in the docs. If it also works for the Spark shell 
example we provide in our docs, with the appropriate spark install/config, that 
would be great, but I don't think it's a requirement to start.

Thinking about what we need to meet that goal, I think a majority of the users 
we would be targeting with this guide will have all of Hadoop installed on the 
nodes where these containers are running. Instead of trying to package the 
latest version of Apache Hadoop as an image, I think our example would be 
easier to maintain if we guide the user towards bind mounting the Hadoop 
binaries and configuration from the NodeManager hosts. If we take that 
approach, I believe the image should only need to include a JDK and set up 
JAVA_HOME. We might even be able to use an existing openjdk image.

Assuming we can't leverage an existing image, one question I'm unsure about is 
the process of creating an "official" image under the apache docker hub 
namespace. [~elek] - can you share any insights around this process?

 


was (Author: shaneku...@gmail.com):
[~ccondit-target] - thanks for looking into this. I see what you mean about the 
challenge with using that image. I think you are correct that the existing 
apache/hadoop-runner image serves a different type of use case than we need 
here.

IMO, our target should be an image capable of running MapReduce pi, as that's 
the example we provide in the docs. If it also works for Spark shell example we 
provide in our docs, with the appropriate spark install/config, that would be 
great, but I don't think it's a requirement to start.  
!/jira/images/icons/emoticons/smile.png!

Thinking about what we need to meet that goal, I think a majority of the users 
we would be targeting with this guide will have all of Hadoop installed on the 
nodes where these containers are running. Instead of trying to package the 
latest version of Apache Hadoop as an image, I think our example would be 
easier to maintain if we guide the user towards bind mounting the Hadoop 
binaries and configuration from the NodeManager hosts. If we take that 
approach, I believe the image should only need to include a JDK and set up 
JAVA_HOME. We might even be able to use an existing openjdk image.

Assuming we can't leverage an existing image, one question I'm unsure about is 
the process of creating an "official" image under the apache docker hub 
namespace. [~elek] - can you share any insights around this process?

 

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8623) Update Docker examples to use image which exists

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576655#comment-16576655
 ] 

Shane Kumpf commented on YARN-8623:
---

[~ccondit-target] - thanks for looking into this. I see what you mean about the 
challenge with using that image. I think you are correct that the existing 
apache/hadoop-runner image serves a different type of use case than we need 
here.

IMO, our target should be an image capable of running MapReduce pi, as that's 
the example we provide in the docs. If it also works for Spark shell example we 
provide in our docs, with the appropriate spark install/config, that would be 
great, but I don't think it's a requirement to start.  
!/jira/images/icons/emoticons/smile.png!

Thinking about what we need to meet that goal, I think a majority of the users 
we would be targeting with this guide will have all of Hadoop installed on the 
nodes where these containers are running. Instead of trying to package the 
latest version of Apache Hadoop as an image, I think our example would be 
easier to maintain if we guide the user towards bind mounting the Hadoop 
binaries and configuration from the NodeManager hosts. If we take that 
approach, I believe the image should only need to include a JDK and set up 
JAVA_HOME. We might even be able to use an existing openjdk image.

Assuming we can't leverage an existing image, one question I'm unsure about is 
the process of creating an "official" image under the apache docker hub 
namespace. [~elek] - can you share any insights around this process?

 

> Update Docker examples to use image which exists
> 
>
> Key: YARN-8623
> URL: https://issues.apache.org/jira/browse/YARN-8623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Condit
>Priority: Minor
>  Labels: Docker
>
> The example Docker image given in the documentation 
> (images/hadoop-docker:latest) does not exist. We could change 
> images/hadoop-docker:latest to apache/hadoop-runner:latest, which does exist. 
> We'd need to do a quick sanity test to see if the image works with YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576597#comment-16576597
 ] 

Shane Kumpf commented on YARN-8520:
---

Thanks for the updated patch, [~eyang]! +1 on the latest patch. I'll commit 
this later today if there is no additional feedback.

 

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8520.001.patch, YARN-8520.002.patch, 
> YARN-8520.003.patch, YARN-8520.004.patch, YARN-8520.005.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8642) Add support for tmpfs mounts with the Docker runtime

2018-08-09 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8642:
-

 Summary: Add support for tmpfs mounts with the Docker runtime
 Key: YARN-8642
 URL: https://issues.apache.org/jira/browse/YARN-8642
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Shane Kumpf


Add support to the existing Docker runtime to allow the user to request tmpfs 
mounts for their containers. For example:
{code}/usr/bin/docker run --name=container_name --tmpfs /run image 
/bootstrap/start-systemd
{code}

One use case is to allow systemd to run as PID 1 in a non-privileged container, 
/run is expected to be a tmpfs mount in the container for that to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8287) Update documentation and yarn-default related to the Docker runtime

2018-08-03 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8287:
--
Target Version/s: 3.2.0, 3.1.2  (was: 3.2.0)

> Update documentation and yarn-default related to the Docker runtime
> ---
>
> Key: YARN-8287
> URL: https://issues.apache.org/jira/browse/YARN-8287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8287.001.patch, YARN-8287.002.patch
>
>
> There are a few typos and omissions in the documentation and yarn-default wrt 
> running Docker containers on YARN. Below is what I noticed, but a more 
> thorough review is still needed:
>  * docker.allowed.volume-drivers is not documented
>  * None of the GPU or FPGA related items are in the Docker docs.
>  * "To run without any capabilites," - typo in yarn-default.xml
>  * remove    from yarn-default.xml
>  * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from 
> docs
>  * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs
>  * The user remapping features are missing from the docs, we should 
> explicitly call this out.
>  * The privileged container section could use a bit of rework to outline the 
> risks of the feature.
>  * Is it time to remove the security warnings? The community has made many 
> improvements since that warning was added. 
>  * "path within the contatiner" typo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8287) Update documentation and yarn-default related to the Docker runtime

2018-08-03 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8287:
--
Fix Version/s: 3.1.2
   3.2.0

> Update documentation and yarn-default related to the Docker runtime
> ---
>
> Key: YARN-8287
> URL: https://issues.apache.org/jira/browse/YARN-8287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8287.001.patch, YARN-8287.002.patch
>
>
> There are a few typos and omissions in the documentation and yarn-default wrt 
> running Docker containers on YARN. Below is what I noticed, but a more 
> thorough review is still needed:
>  * docker.allowed.volume-drivers is not documented
>  * None of the GPU or FPGA related items are in the Docker docs.
>  * "To run without any capabilites," - typo in yarn-default.xml
>  * remove    from yarn-default.xml
>  * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from 
> docs
>  * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs
>  * The user remapping features are missing from the docs, we should 
> explicitly call this out.
>  * The privileged container section could use a bit of rework to outline the 
> risks of the feature.
>  * Is it time to remove the security warnings? The community has made many 
> improvements since that warning was added. 
>  * "path within the contatiner" typo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8287) Update documentation and yarn-default related to the Docker runtime

2018-08-03 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568952#comment-16568952
 ] 

Shane Kumpf commented on YARN-8287:
---

Thanks for the contribution [~ccondit-target]! I committed this to trunk and 
branch-3.1.

> Update documentation and yarn-default related to the Docker runtime
> ---
>
> Key: YARN-8287
> URL: https://issues.apache.org/jira/browse/YARN-8287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Attachments: YARN-8287.001.patch, YARN-8287.002.patch
>
>
> There are a few typos and omissions in the documentation and yarn-default wrt 
> running Docker containers on YARN. Below is what I noticed, but a more 
> thorough review is still needed:
>  * docker.allowed.volume-drivers is not documented
>  * None of the GPU or FPGA related items are in the Docker docs.
>  * "To run without any capabilites," - typo in yarn-default.xml
>  * remove    from yarn-default.xml
>  * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from 
> docs
>  * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs
>  * The user remapping features are missing from the docs, we should 
> explicitly call this out.
>  * The privileged container section could use a bit of rework to outline the 
> risks of the feature.
>  * Is it time to remove the security warnings? The community has made many 
> improvements since that warning was added. 
>  * "path within the contatiner" typo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8520) Document best practice for user management

2018-08-03 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568683#comment-16568683
 ] 

Shane Kumpf commented on YARN-8520:
---

Thanks for the patch [~eyang]! Sorry for the delayed review on this. I think 
user management is an important topic, so I'm glad to see additional 
documentation. I've got a few comments:

1) In the "Docker images requirements" section, we call out the requirement 
that the UID must match between the NM host and image. It would be good to add 
a link in the "Docker images requirements" section to the "User Management in 
Docker Container" section to guide the image builder towards the various ways 
to handle users and groups with containers.

2) SSSD is one option for handing this but there are others. SSSD is not 
necessarily a requirement for YARN containerization either, which isn't clear 
here to a novice. I think it would be good to expand on the /etc/passwd and 
/etc/shadow option (defining users and groups statically in the image) you 
mention as an alternative to SSSD. nscd and user namespacing could be 
additional alternatives we list in the future.

3) "YARN Docker container support launches container with uid:gid identity." - 
I think this is an important item to highlight and could use some more detail. 
Maybe call out again that it is the uid:gid identity as known by the 
NodeManager host. Also what uid:gid is used in which security mode would be 
helpful to those new to YARN that want to try containerization (e.g. In secure 
mode it is the submitting user, in unsecure mode see [Cgroups and 
Security|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html]).
 

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8520.001.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8287) Update documentation and yarn-default related to the Docker runtime

2018-08-03 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568613#comment-16568613
 ] 

Shane Kumpf commented on YARN-8287:
---

Thanks for addressing my comments, [~ccondit-target]! +1 on patch 002. I'll 
commit this shortly.

> Update documentation and yarn-default related to the Docker runtime
> ---
>
> Key: YARN-8287
> URL: https://issues.apache.org/jira/browse/YARN-8287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Attachments: YARN-8287.001.patch, YARN-8287.002.patch
>
>
> There are a few typos and omissions in the documentation and yarn-default wrt 
> running Docker containers on YARN. Below is what I noticed, but a more 
> thorough review is still needed:
>  * docker.allowed.volume-drivers is not documented
>  * None of the GPU or FPGA related items are in the Docker docs.
>  * "To run without any capabilites," - typo in yarn-default.xml
>  * remove    from yarn-default.xml
>  * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from 
> docs
>  * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs
>  * The user remapping features are missing from the docs, we should 
> explicitly call this out.
>  * The privileged container section could use a bit of rework to outline the 
> risks of the feature.
>  * Is it time to remove the security warnings? The community has made many 
> improvements since that warning was added. 
>  * "path within the contatiner" typo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8287) Update documentation and yarn-default related to the Docker runtime

2018-08-03 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568185#comment-16568185
 ] 

Shane Kumpf commented on YARN-8287:
---

Thanks for the patch, [~ccondit-target]! Overall this looks good. A couple of 
items I wanted to bring up after additional review.

1) Let's remove the following, since it refers to the Phase 1 Jira.
{code:java}
Docker support in the LCE is still evolving. To track progress, follow 
YARN-3611, the umbrella JIRA for Docker support improvements. {code}
2) A point of confusion we see is that the example image doesn't exist. We 
could change images/hadoop-docker:latest to apache/hadoop-runner:trunk, which 
does exist. We'd need to do a quick sanity test to see if the image works with 
YARN. Maybe best for a follow on JIRA given the testing involved.

3) With the changes to allow disabling the YARN launch script in favor of 
running whatever is specified in the image, I think the following needs to be 
remove/updated. There is already a section called Docker Container ENTRYPOINT 
support. I think we can clean this up a bit make it easier to understand this 
feature. Given the need to explain YARN's existing launching logic, this also 
may be best for a follow on ticket as it will require some testing to fully 
describe the feature.
{code:java}
If a Docker image has a command set, the behavior will depend on whether the 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is set to true. If so, the 
command will be overridden when LCE launches the image with YARN's container 
launch script. 
 If a Docker image has an entry point set, the entry point will be honored, but 
the default command may be overridden, as just mentioned above. Unless the 
entry point is something similar to sh -c or 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is set to true, the net 
result will likely be undesirable. Because the YARN container launch script is 
required to correctly launch the YARN task, use of entry points is discouraged. 
{code}
I think if we address #1 as part of this patch and open follow on JIRAs for the 
other two, this is ready to go.

> Update documentation and yarn-default related to the Docker runtime
> ---
>
> Key: YARN-8287
> URL: https://issues.apache.org/jira/browse/YARN-8287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Attachments: YARN-8287.001.patch
>
>
> There are a few typos and omissions in the documentation and yarn-default wrt 
> running Docker containers on YARN. Below is what I noticed, but a more 
> thorough review is still needed:
>  * docker.allowed.volume-drivers is not documented
>  * None of the GPU or FPGA related items are in the Docker docs.
>  * "To run without any capabilites," - typo in yarn-default.xml
>  * remove    from yarn-default.xml
>  * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from 
> docs
>  * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs
>  * The user remapping features are missing from the docs, we should 
> explicitly call this out.
>  * The privileged container section could use a bit of rework to outline the 
> risks of the feature.
>  * Is it time to remove the security warnings? The community has made many 
> improvements since that warning was added. 
>  * "path within the contatiner" typo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8619) Automate docker network configuration through YARN API

2018-08-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567596#comment-16567596
 ] 

Shane Kumpf commented on YARN-8619:
---

Thanks for the proposal, [~eyang]! I can see the value in making it easier to 
admin container networking. However, I think we need to make this plugable 
beyond Docker/macvlan, and I think we can do that with minimal changes to your 
idea by adding a -type option or similar. One of the goals I'd like to see us 
move towards is the adoption of prevailing standards, one of which is CNI, 
which follows a different execution model than Docker's CNM/libnetwork and so 
the macvlan options wouldn't apply. The -type option could lead to a CLI 
similar to below:

Docker:
{code:java}
yarn network -create my-libnetwork-macvlan-net -type docker -conf 
/tmp/network.json{code}
CNI:
{code:java}
yarn network -create my-cni-net -type cni -cni-config /tmp/cni.cfg -cni-plugin 
/tmp/cni-plugin
{code}
As you mention, each host may need a different configuration. We may need a 
-node option to target a specific nodemanager.

> Automate docker network configuration through YARN API
> --
>
> Key: YARN-8619
> URL: https://issues.apache.org/jira/browse/YARN-8619
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Docker supports bridge, host, overlay, macvlan networking.  It might be 
> useful to automate docker network setup through a set of YARN API to improve 
> management of docker networks.  Each type of network driver requires 
> different type of parameters.  For Hadoop use case, it seems more useful to 
> focus on macvlan networking for ease of use and configuration.  It would be 
> great addition to support commands like:
> {code}
> yarn network create -d macvlan \
>   --subnet=172.16.86.0/24 \
>   --gateway=172.16.86.1 \
>   -o parent=eth0 \
>   my-macvlan-net
> {code}
> This changes docker configuration to hosts managed by YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8600) RegistryDNS hang when remote lookup does not reply

2018-08-01 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565792#comment-16565792
 ] 

Shane Kumpf commented on YARN-8600:
---

Thanks for the updated patch [~eyang]! +1 on the 003 patch. I've committed this 
to trunk, branch-3.1.1, and branch-3.1.

> RegistryDNS hang when remote lookup does not reply
> --
>
> Key: YARN-8600
> URL: https://issues.apache.org/jira/browse/YARN-8600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8600.001.patch, YARN-8600.002.patch, 
> YARN-8600.003.patch
>
>
> If lookup type mismatch with the record to query, remote DNS server might not 
> reply.  For example looking up a CNAME record with a PTR address: 
> 1.76.27.172.in-addr.arpa.  This can hang registryDNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8600) RegistryDNS hang when remote lookup does not reply

2018-08-01 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8600:
--
Fix Version/s: 3.1.1
   3.2.0

> RegistryDNS hang when remote lookup does not reply
> --
>
> Key: YARN-8600
> URL: https://issues.apache.org/jira/browse/YARN-8600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8600.001.patch, YARN-8600.002.patch, 
> YARN-8600.003.patch
>
>
> If lookup type mismatch with the record to query, remote DNS server might not 
> reply.  For example looking up a CNAME record with a PTR address: 
> 1.76.27.172.in-addr.arpa.  This can hang registryDNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8600) RegistryDNS hang when remote lookup does not reply

2018-08-01 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8600:
--
Target Version/s: 3.2.0, 3.1.1  (was: 3.2.0, 3.1.2)

> RegistryDNS hang when remote lookup does not reply
> --
>
> Key: YARN-8600
> URL: https://issues.apache.org/jira/browse/YARN-8600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Critical
> Attachments: YARN-8600.001.patch, YARN-8600.002.patch, 
> YARN-8600.003.patch
>
>
> If lookup type mismatch with the record to query, remote DNS server might not 
> reply.  For example looking up a CNAME record with a PTR address: 
> 1.76.27.172.in-addr.arpa.  This can hang registryDNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8600) RegistryDNS hang when remote lookup does not reply

2018-08-01 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565277#comment-16565277
 ] 

Shane Kumpf commented on YARN-8600:
---

Thanks for the patch, [~eyang]. I had some trouble finding an example lookup to 
test, but I was able to find a repro. I can confirm this resolves the issue.

Two minor nits:
* The timeout on the test should be increased as the thread may not have timed 
out by the timeout trigger, causing the test to fail.
* When the timeout exception occurs, I would like to see the query type in the 
exception message to help aid in figuring out the exact query that hung.

I'm +1 on this patch with those changes. I can make these minor changes at 
commit time if you'd like [~eyang]. I'll commit this with those changes later 
today unless there are other concerns.

> RegistryDNS hang when remote lookup does not reply
> --
>
> Key: YARN-8600
> URL: https://issues.apache.org/jira/browse/YARN-8600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Critical
> Attachments: YARN-8600.001.patch, YARN-8600.002.patch
>
>
> If lookup type mismatch with the record to query, remote DNS server might not 
> reply.  For example looking up a CNAME record with a PTR address: 
> 1.76.27.172.in-addr.arpa.  This can hang registryDNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3611) Support Docker Containers In LinuxContainerExecutor

2018-07-11 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540749#comment-16540749
 ] 

Shane Kumpf commented on YARN-3611:
---

+1000 :) Great effort everyone. I'm excited for what has been achieved and 
where this support is going.

> Support Docker Containers In LinuxContainerExecutor
> ---
>
> Key: YARN-3611
> URL: https://issues.apache.org/jira/browse/YARN-3611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
>
> Support Docker Containers In LinuxContainerExecutor
> LinuxContainerExecutor provides useful functionality today with respect to 
> localization, cgroups based resource management and isolation for CPU, 
> network, disk etc. as well as security with a well-defined mechanism to 
> execute privileged operations using the container-executor utility.  Bringing 
> docker support to LinuxContainerExecutor lets us use all of this 
> functionality when running docker containers under YARN, while not requiring 
> users and admins to configure and use a different ContainerExecutor. 
> There are several aspects here that need to be worked through :
> * Mechanism(s) to let clients request docker-specific functionality - we 
> could initially implement this via environment variables without impacting 
> the client API.
> * Security - both docker daemon as well as application
> * Docker image localization
> * Running a docker container via container-executor as a specified user
> * “Isolate” the docker container in terms of CPU/network/disk/etc
> * Communicating with and/or signaling the running container (ensure correct 
> pid handling)
> * Figure out workarounds for certain performance-sensitive scenarios like 
> HDFS short-circuit reads 
> * All of these need to be achieved without changing the current behavior of 
> LinuxContainerExecutor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530510#comment-16530510
 ] 

Shane Kumpf commented on YARN-8485:
---

Thanks to [~yeshavora] for reporting this, [~eyang] for the contribution, and 
[~gsaha] for the review! I committed this to trunk and branch-3.1.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8485:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-3611

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8485:
--
Affects Version/s: 3.1.1
   3.2.0
 Target Version/s: 3.2.0, 3.1.1
   Labels: Docker  (was: )

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 

[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530482#comment-16530482
 ] 

Shane Kumpf commented on YARN-8485:
---

{code}by checking /usr/bin/sudo is good enough{code}
I agree this should be enough for now and is the least risky change. We can 
open a follow on effort to make this configurable if we find an operating 
system where this is needed. +1 on the latest patch, pending pre-commit.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 

[jira] [Comment Edited] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530482#comment-16530482
 ] 

Shane Kumpf edited comment on YARN-8485 at 7/2/18 9:58 PM:
---

{quote}by checking /usr/bin/sudo is good enough{quote}
I agree this should be enough for now and is the least risky change. We can 
open a follow on effort to make this configurable if we find an operating 
system where this is needed. +1 on the latest patch, pending pre-commit.


was (Author: shaneku...@gmail.com):
{code}by checking /usr/bin/sudo is good enough{code}
I agree this should be enough for now and is the least risky change. We can 
open a follow on effort to make this configurable if we find an operating 
system where this is needed. +1 on the latest patch, pending pre-commit.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 

  1   2   3   4   5   6   7   8   9   >