[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511814#comment-16511814
 ] 

Hudson commented on YARN-8259:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14424 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14424/])
YARN-8259.  Improve privileged docker container liveliness checks.   
(eyang: rev 22994889dc449f966fb6462a3ac3d3bbaee3ac6a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/LinuxContainerRuntimeConstants.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java


> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8259.001.patch, YARN-8259.002.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-13 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511444#comment-16511444
 ] 

genericqa commented on YARN-8259:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
58s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
29s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
37s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8259 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927676/YARN-8259.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4581982638e6 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| 

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-12 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510290#comment-16510290
 ] 

Shane Kumpf commented on YARN-8259:
---

Thanks for the input everyone.
{quote}Could you add some information to DockerContainers.md
{quote}
Absolutely, I'll get a new patch up shortly with the doc improvements, thanks 
again for all the feedback [~eyang]

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510275#comment-16510275
 ] 

Eric Yang commented on YARN-8259:
-

4 People have expressed opinion to go with option #1.  Therefore, this patch 
should be ready for commit in it's current form.  [~shaneku...@gmail.com] Could 
you add some information to DockerContainers.md, Privileged Container Security 
Consideration section to indicate to white list NM user if hidepid option is 
enabled?

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-11 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508657#comment-16508657
 ] 

Eric Badger commented on YARN-8259:
---

I would give a slight preference to proposal #1 because of performance, 
especially in the live-restore case. There is a workaround even with hidepid in 
place, so I think the good outweighs the bad. 

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-11 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508516#comment-16508516
 ] 

Eric Yang commented on YARN-8259:
-

I prefer #3 to keep abstraction in place, and improve portability.  #1 with 
documentation is my second choice to address this problem.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-11 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508198#comment-16508198
 ] 

Jim Brennan commented on YARN-8259:
---

I think we should go with Option 1 with documentation to whitelist the NM user 
if hidepid is enabled.

 

 

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-10 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507437#comment-16507437
 ] 

Shane Kumpf commented on YARN-8259:
---

[~eyang], [~Jim_Brennan], [~ebadger], [~jlowe] - any additional feedback? I'd 
like to get this in soon given the impact. It appears the patch that implements 
#1 still applies if we wanted to go with that for now. We could add 
alternatives later based on user demand. I want everyone to be comfortable with 
the approach though. Thanks!

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-31 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497266#comment-16497266
 ] 

Shane Kumpf commented on YARN-8259:
---

Thanks for the feedback, [~ebadger].
{quote}if the yarn user is whitelisted for hidepid, then isn't that going to 
get you basically the same situation as checking pids as a privileged user?
{quote}
Perhaps non-starter was a bit harsh. I do see what you mean but I think they 
are a bit different. To clarify, if the admin has explicitly enabled hidepid, 
allowing yarn to bypass that protection via c-e would be surprising behavior, 
IMO. If hidepid is disabled or the yarn user is explicitly whitelisted, then 
the admin should not be surprised that the yarn user can see all pids.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-31 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497239#comment-16497239
 ] 

Eric Badger commented on YARN-8259:
---

For proposal #1, if the yarn user is whitelisted for hidepid, then isn't that 
going to get you basically the same situation as checking pids as a privileged 
user? I.e. you'll be able to see all arbitrary pids if you are able to 
compromise the yarn user. If that's a non-starter, then we have no choice but 
to go with proposal #4 (even though I would prefer #1). 

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-31 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496492#comment-16496492
 ] 

Shane Kumpf commented on YARN-8259:
---

I've been doing additional testing here and could use input from the community 
as all of the solutions have cons. Here is what I've tested and been 
considering.

1) */proc/pid check as yarn*

Pros:
 * No c-e changes
 * Works for with Docker live restore

Cons:
 * Breaks down when using hide pid
 * Portability


2) */proc/pid or kill -0 as privileged user*

Pros:
 * Works for with Docker live restore

Cons:
 * Circumvents hidepid, allows the yarn user to check the existence of any pid 
due to use of elevated privileges.
 * Portability (/proc method)


3) *docker inspect*

Pros:
 * No c-e changes
 * Uses the Docker API

Cons:
 * Requires retry handling to support Docker live restore.
 ** In the case of a Docker daemon upgrade, this means the upgrade must 
complete before the retries are exhausted, which could mean hundreds of retries.


4) *Hybrid* (Keep existing kill -0 for non-privileged, docker inspect for 
privileged)

Pros:
 * No c-e changes
 * Limits impacts to live restore

Cons:
 * Requires retry handling to support Docker live restore.
 * Different handling based on container type.


I believe #2 is a non-starter as it silently bypasses the hidepid option.  I'm 
leaning towards striking #3 from the list as well, as we really need the 
recovery logic to be solid, so I don't want to unnecessary impact 
non-privileged containers which appear to be working well.

At this point, I'm leaning towards #4 or #1 (with docs indicating that the NM 
user must be whitelisted if hidepid is enabled).

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484271#comment-16484271
 ] 

Eric Yang commented on YARN-8259:
-

[~shaneku...@gmail.com] The proposal for implementing both is okay, but we can 
make better software with sensible optimization and pick a solution that can 
work for all scenarios without adding extra administration tasks.  There is no 
objection with current approach.  We are aware that hidepid corner case can 
generate additional system administration tasks to white list node manager to 
access all pid.  We also know it cost more resource to fork exec with docker 
inspect approach.  Human labor to configure OS with knowledge of Hadoop details 
is usually more expensive than adding processor or ram.  It would be great if 
the solution can work without additional configuration flag, nor adding extra 
hardware resource.  This means doing pid check as privileged user via 
container-executor may be preferred solution by system administrators without 
adding overhead to system administration chores.  Can proc pid check work in 
docker in docker environment?

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-22 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483845#comment-16483845
 ] 

Shane Kumpf commented on YARN-8259:
---

{quote}System administrator can reserve one cpu core for node manager and all 
the docker inspect call are counted toward saturating one cpu core{quote}
I'm less concerned about the cpu usage and more about docker's client/server 
model and the potential for hangs (that I've seen many of in the past under 
load). Personally, I want the /proc route for my systems and am not using 
hidepid. Losing a container due to an intermittent docker issue isn't really 
acceptable to me when an alternative exists that avoids the issue.

What I could do is implement both the /proc and {{docker inspect}} approaches, 
and a configuration switch to choose the implementation for that that use 
hidepid (or a system without /proc). Would that be acceptable?

I'm also going to make this a blocker, as all privileged containers are leaked 
on NM restart today.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483057#comment-16483057
 ] 

Eric Yang commented on YARN-8259:
-

System administrator can reserve one cpu core for node manager and all the 
docker inspect call are counted toward saturating one cpu core, but not more.  
Exact accounting is not available today, but I usually recommend customers to 
do this to avoid system overload.

At a glance of yarn code base, I only found one instance of code that is 
reading /proc/[pid]/ from node manager.  This is located in 
CGroupsResourceCalculator.java.  Hence, hidepid is not working by 
implementation.  This can be addressed in other JIRAs to make this proper.  I 
am +0 on this patch.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483041#comment-16483041
 ] 

Eric Badger commented on YARN-8259:
---

Also, I have tested the current patch for correctness. So, if we decide to go 
with the current implementation, I am +1 on the patch. 

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483038#comment-16483038
 ] 

Eric Badger commented on YARN-8259:
---

bq. If hidepid option is used by system administrator, yarn user might not have 
rights to check if /proc/[pid] exists.
This might be a concern, but there is a workaround to allow for the admin to 
whitelist the NM user
https://linux-audit.com/linux-system-hardening-adding-hidepid-to-proc/

bq. Also, the reacquistion code runs signalContainer once per second until the 
application finishes, this resulted in many docker inspect and 
container-executor calls, which are expensive operations.
This worries me the most. Especially on nodes where there are lots of 
containers running concurrently, this could be pretty devastating for rolling 
upgrades.

I'm not sure I have a strong opinion one way or another on retries vs. /proc 
for correctness, but I am worried about overloading the docker daemon with a 
large amount of inspect/ps calls. 

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483032#comment-16483032
 ] 

Jason Lowe commented on YARN-8259:
--

I do agree with Shane that there are already subsystems that currently rely on 
/proc to function properly, e.g.: container resource monitoring.  Hiding pids 
will break those subsystems.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483029#comment-16483029
 ] 

Jason Lowe commented on YARN-8259:
--

Ah comment race with [~eyang], I'll defer until his concerns are addressed.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483028#comment-16483028
 ] 

Jason Lowe commented on YARN-8259:
--

Thanks for the patch!  +1 lgtm.  I'll commit this tomorrow if there are no 
objections.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482992#comment-16482992
 ] 

Eric Yang commented on YARN-8259:
-

If I am not mistaken, DockerContainerRuntime is running as part of node 
manager.  If hidepid option is used by system administrator, yarn user might 
not have rights to check if /proc/[pid] exists.  We might need to create a LCE 
operation to perform the check, if we are going with the suggested pid file 
check path.

I still prefers the docker inspect command path with retry logic.  In a 
non-blocking IO system, it is hard to avoid coding logic for retries.  The 
investment will pay off in the long run, when each retry value is defined and 
optimized to make the system reliable and robust.

> Revisit liveliness checks for Docker containers
> ---
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN 
> run as user, sending the null signal for liveliness checks could fail. We 
> need to reconsider how liveliness checks are handled in the Docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482729#comment-16482729
 ] 

genericqa commented on YARN-8259:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
32s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8259 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924356/YARN-8259.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d3ca0d4182cb 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f48fec8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20808/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20808/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Revisit liveliness checks for Docker containers
>