[jira] [Commented] (YARN-4152) NM crash when LogAggregationService#stopContainer called for absent container

2015-09-15 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745236#comment-14745236
 ] 

Bibin A Chundatt commented on YARN-4152:


Please review patch attached.

> NM crash when LogAggregationService#stopContainer called for absent container
> -
>
> Key: YARN-4152
> URL: https://issues.apache.org/jira/browse/YARN-4152
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4152.patch
>
>
> NM crash during of log aggregation.
> Ran Pi job with 500 container and killed application in between
> *Logs*
> {code}
> 2015-09-12 18:44:25,597 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_e51_1442063466801_0001_01_99 is : 143
> 2015-09-12 18:44:25,670 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101
> 2015-09-12 18:44:25,670 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_e51_1442063466801_0001_01_000101 from application 
> application_1442063466801_0001
> 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_STOP for appId application_1442063466801_0001
> 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf   
> OPERATION=Container Finished - SucceededTARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1442063466801_0001
> CONTAINERID=container_e51_1442063466801_0001_01_000100
> {code}
> *Analysis*
> Looks like for absent container also {{stopContainer}} is called 
> {code}
>   case CONTAINER_FINISHED:
> LogHandlerContainerFinishedEvent containerFinishEvent =
> (LogHandlerContainerFinishedEvent) event;
> stopContainer(containerFinishEvent.getContainerId(),
> containerFinishEvent.getExitCode());
> break;
> {code}
> *Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101*
> Should skip when {{null==context.getContainers().get(containerId)}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4152) NM crash when LogAggregationService#stopContainer called for absent container

2015-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742532#comment-14742532
 ] 

Hadoop QA commented on YARN-4152:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  7s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 41s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  46m 56s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12755607/0001-YARN-4152.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7269906 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9102/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9102/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9102/console |


This message was automatically generated.

> NM crash when LogAggregationService#stopContainer called for absent container
> -
>
> Key: YARN-4152
> URL: https://issues.apache.org/jira/browse/YARN-4152
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4152.patch
>
>
> NM crash during of log aggregation.
> Ran Pi job with 500 container and killed application in between
> *Logs*
> {code}
> 2015-09-12 18:44:25,597 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_e51_1442063466801_0001_01_99 is : 143
> 2015-09-12 18:44:25,670 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101
> 2015-09-12 18:44:25,670 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_e51_1442063466801_0001_01_000101 from application 
> application_1442063466801_0001
> 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_STOP for appId application_1442063466801_0001
> 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf   
> OPERATION=Container Finished - Succeeded

[jira] [Commented] (YARN-4152) NM crash when LogAggregationService#stopContainer called for absent container

2015-09-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742082#comment-14742082
 ] 

Bibin A Chundatt commented on YARN-4152:


LogAggregationService#stopContainer
{code}
ContainerType containerType = context.getContainers().get(
containerId).getContainerTokenIdentifier().getContainerType();
{code}

> NM crash when LogAggregationService#stopContainer called for absent container
> -
>
> Key: YARN-4152
> URL: https://issues.apache.org/jira/browse/YARN-4152
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> NM crash during of log aggregation.
> Ran Pi job with 500 container and killed application in between
> *Logs*
> {code}
> 2015-09-12 18:44:25,597 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_e51_1442063466801_0001_01_99 is : 143
> 2015-09-12 18:44:25,670 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101
> 2015-09-12 18:44:25,670 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_e51_1442063466801_0001_01_000101 from application 
> application_1442063466801_0001
> 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_STOP for appId application_1442063466801_0001
> 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf   
> OPERATION=Container Finished - SucceededTARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1442063466801_0001
> CONTAINERID=container_e51_1442063466801_0001_01_000100
> {code}
> *Analysis*
> Looks like for absent container also {{stopContainer}} is called 
> {code}
>   case CONTAINER_FINISHED:
> LogHandlerContainerFinishedEvent containerFinishEvent =
> (LogHandlerContainerFinishedEvent) event;
> stopContainer(containerFinishEvent.getContainerId(),
> containerFinishEvent.getExitCode());
> break;
> {code}
> *Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101*
> Should skip when {{null==context.getContainers().get(containerId)}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)