[ 
https://issues.apache.org/jira/browse/YARN-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541824#comment-16541824
 ] 

genericqa commented on YARN-8515:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
40m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m  4s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931329/YARN-8515.001.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux c76b663d45d3 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b37074b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21222/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21222/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 10000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21222/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> container-executor can crash with SIGPIPE after nodemanager restart
> -------------------------------------------------------------------
>
>                 Key: YARN-8515
>                 URL: https://issues.apache.org/jira/browse/YARN-8515
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-8515.001.patch
>
>
> When running with docker on large clusters, we have noticed that sometimes 
> docker containers are not removed - they remain in the exited state, and the 
> corresponding container-executor is no longer running.  Upon investigation, 
> we noticed that this always seemed to happen after a nodemanager restart.   
> The sequence leading to the stranded docker containers is:
>  # Nodemanager restarts
>  # Containers are recovered and then run for a while
>  # Containers are killed for some (legitimate) reason
>  # Container-executor exits without removing the docker container.
> After reproducing this on a test cluster, we found that the 
> container-executor was exiting due to a SIGPIPE.
> What is happening is that the shell command executor that is used to start 
> container-executor has threads reading from c-e's stdout and stderr.  When 
> the NM is restarted, these threads are killed.  Then when the 
> container-executor continues executing after the container exits with error, 
> it tries to write to stderr (ERRORFILE) and gets a SIGPIPE.  Since SIGPIPE is 
> not handled, this crashes the container-executor before it can actually 
> remove the docker container.
> We ran into this in branch 2.8.  The way docker containers are removed has 
> been completely redesigned in trunk, so I don't think it will lead to this 
> exact failure, but after an NM restart, potentially any write to stderr or 
> stdout in the container-executor could cause it to crash.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to