[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742159#comment-14742159 ] Hitesh Shah commented on YARN-3942: --- Thanks [~gss2002]. One point to note - if you use long running Hive sessions, this will cause an OOM in the timeline server as the data is cached on a per "session" basis. I am not sure if there is another simple way to disable Hive session re-use in the HiveServer \cc [~vikram.dixit] > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4150) Failure in TestNMClient because nodereports were not available
[ https://issues.apache.org/jira/browse/YARN-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742170#comment-14742170 ] Hadoop QA commented on YARN-4150: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 6m 26s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 4s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 31s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 52s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 6m 53s | Tests failed in hadoop-yarn-client. | | | | 25m 11s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755487/YARN-4150.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 8c05441 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9101/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9101/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9101/console | This message was automatically generated. > Failure in TestNMClient because nodereports were not available > -- > > Key: YARN-4150 > URL: https://issues.apache.org/jira/browse/YARN-4150 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-4150.001.patch > > > Saw a failure in a test run > https://builds.apache.org/job/PreCommit-YARN-Build/9010/testReport/ > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) > at java.util.ArrayList.get(ArrayList.java:411) > at > org.apache.hadoop.yarn.client.api.impl.TestNMClient.allocateContainers(TestNMClient.java:244) > at > org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:210) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741921#comment-14741921 ] Hadoop QA commented on YARN-3717: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 24m 40s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 8 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 3m 0s | Site still builds. | | {color:red}-1{color} | checkstyle | 2m 38s | The applied patch generated 3 new checkstyle issues (total was 16, now 18). | | {color:green}+1{color} | whitespace | 0m 28s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 19s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 56s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 12s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 54m 12s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 126m 11s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755534/YARN-3717.20150912-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / 9538af0 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9099/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9099/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9099/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9099/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9099/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/9099/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9099/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9099/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9099/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9099/console | This message was automatically generated. > Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, > YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch, > YARN-3717.20150911-1.patch, YARN-3717.20150912-1.patch > > > 1> Add the default-node-Label expression for each queue in scheduler page. > 2> In Application/Appattempt page show the app configured node label > expression for AM and Job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741966#comment-14741966 ] Naganarasimha G R commented on YARN-3717: - Check style comments are not related to the patch > Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, > YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch, > YARN-3717.20150911-1.patch, YARN-3717.20150912-1.patch > > > 1> Add the default-node-Label expression for each queue in scheduler page. > 2> In Application/Appattempt page show the app configured node label > expression for AM and Job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741995#comment-14741995 ] Junping Du commented on YARN-313: - The test failures are not related and looks like some intermittent failures. Manually kick off jenkins test again. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Inigo Goiri >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v10.patch, YARN-313-v11.patch, YARN-313-v2.patch, YARN-313-v3.patch, > YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch, > YARN-313-v8.patch, YARN-313-v9.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742015#comment-14742015 ] Hadoop QA commented on YARN-313: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 51s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 9s | The applied patch generated 1 new checkstyle issues (total was 231, now 231). | | {color:green}+1{color} | whitespace | 0m 10s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 56s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 54m 7s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 111m 46s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755528/YARN-313-v11.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d845547 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9100/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9100/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9100/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9100/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9100/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9100/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9100/console | This message was automatically generated. > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Inigo Goiri >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v10.patch, YARN-313-v11.patch, YARN-313-v2.patch, YARN-313-v3.patch, > YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch, > YARN-313-v8.patch, YARN-313-v9.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account
[ https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741974#comment-14741974 ] Arun Suresh commented on YARN-4120: --- Currently, pre-emption happens in two passes: 1) Iterate through the leaf Queues, find resource deficits (sum of resources to be preempted to from other apps to allow apps below share to run) 2) iterate through Schedulables and pre-empt enough containers to match the resource collected in 1. One goal of YARN-2154 (when its ready) is to try to address the issue brought up here, wherein, instead of asking a root queue to find Schedulables within its hierarchy that can pre-empt containers, it tries to match an actual resource ask (by an app below fair share) with containers from apps (above fair share). I believe the above logic might solve both issues raised here.. thoughts ? > FSAppAttempt.getResourceUsage() should not take preemptedResource into account > -- > > Key: YARN-4120 > URL: https://issues.apache.org/jira/browse/YARN-4120 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Xianyin Xin > > When compute resource usage for Schedulables, the following code is envolved, > {{FSAppAttempt.getResourceUsage}}, > {code} > public Resource getResourceUsage() { > return Resources.subtract(getCurrentConsumption(), getPreemptedResources()); > } > {code} > and this value is aggregated to FSLeafQueues and FSParentQueues. In my > opinion, taking {{preemptedResource}} into account here is not reasonable, > there are two main reasons, > # it is something in future, i.e., even though these resources are marked as > preempted, it is currently used by app, and these resources will be > subtracted from {{currentCosumption}} once the preemption is finished. it's > not reasonable to make arrange for it ahead of time. > # there's another problem here, consider following case, > {code} > root >/\ > queue1 queue2 > /\ > queue1.3, queue1.4 > {code} > suppose queue1.3 need resource and it can preempt resources from queue1.4, > the preemption happens in the interior of queue1. But when compute resource > usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - > preemption}} according to the current code, which is unfair to queue2 when > doing resource allocating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4152) NM crash when LogAggregationService#stopContainer called for absent container
Bibin A Chundatt created YARN-4152: -- Summary: NM crash when LogAggregationService#stopContainer called for absent container Key: YARN-4152 URL: https://issues.apache.org/jira/browse/YARN-4152 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical NM crash during of log aggregation. *Logs* {code} 2015-09-12 18:44:25,597 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e51_1442063466801_0001_01_99 is : 143 2015-09-12 18:44:25,670 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_e51_1442063466801_0001_01_000101 2015-09-12 18:44:25,670 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_e51_1442063466801_0001_01_000101 from application application_1442063466801_0001 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1442063466801_0001 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf OPERATION=Container Finished - SucceededTARGET=ContainerImpl RESULT=SUCCESS APPID=application_1442063466801_0001 CONTAINERID=container_e51_1442063466801_0001_01_000100 {code} *Analysis* Looks like for absent container also {{stopContainer}} is called {code} case CONTAINER_FINISHED: LogHandlerContainerFinishedEvent containerFinishEvent = (LogHandlerContainerFinishedEvent) event; stopContainer(containerFinishEvent.getContainerId(), containerFinishEvent.getExitCode()); break; {code} Should skip when {{null==context.getContainers().get(containerId)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4152) NM crash when LogAggregationService#stopContainer called for absent container
[ https://issues.apache.org/jira/browse/YARN-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4152: --- Description: NM crash during of log aggregation. Ran Pi job with 500 container and killed application in between *Logs* {code} 2015-09-12 18:44:25,597 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e51_1442063466801_0001_01_99 is : 143 2015-09-12 18:44:25,670 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_e51_1442063466801_0001_01_000101 2015-09-12 18:44:25,670 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_e51_1442063466801_0001_01_000101 from application application_1442063466801_0001 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1442063466801_0001 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf OPERATION=Container Finished - SucceededTARGET=ContainerImpl RESULT=SUCCESS APPID=application_1442063466801_0001 CONTAINERID=container_e51_1442063466801_0001_01_000100 {code} *Analysis* Looks like for absent container also {{stopContainer}} is called {code} case CONTAINER_FINISHED: LogHandlerContainerFinishedEvent containerFinishEvent = (LogHandlerContainerFinishedEvent) event; stopContainer(containerFinishEvent.getContainerId(), containerFinishEvent.getExitCode()); break; {code} *Event EventType: KILL_CONTAINER sent to absent container container_e51_1442063466801_0001_01_000101* Should skip when {{null==context.getContainers().get(containerId)}} was: NM crash during of log aggregation. *Logs* {code} 2015-09-12 18:44:25,597 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e51_1442063466801_0001_01_99 is : 143 2015-09-12 18:44:25,670 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_e51_1442063466801_0001_01_000101 2015-09-12 18:44:25,670 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_e51_1442063466801_0001_01_000101 from application application_1442063466801_0001 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1442063466801_0001 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf OPERATION=Container Finished - SucceededTARGET=ContainerImpl RESULT=SUCCESS APPID=application_1442063466801_0001 CONTAINERID=container_e51_1442063466801_0001_01_000100 {code} *Analysis* Looks like for absent container also {{stopContainer}} is called {code} case
[jira] [Commented] (YARN-4152) NM crash when LogAggregationService#stopContainer called for absent container
[ https://issues.apache.org/jira/browse/YARN-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742082#comment-14742082 ] Bibin A Chundatt commented on YARN-4152: LogAggregationService#stopContainer {code} ContainerType containerType = context.getContainers().get( containerId).getContainerTokenIdentifier().getContainerType(); {code} > NM crash when LogAggregationService#stopContainer called for absent container > - > > Key: YARN-4152 > URL: https://issues.apache.org/jira/browse/YARN-4152 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > NM crash during of log aggregation. > Ran Pi job with 500 container and killed application in between > *Logs* > {code} > 2015-09-12 18:44:25,597 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code > from container container_e51_1442063466801_0001_01_99 is : 143 > 2015-09-12 18:44:25,670 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: KILL_CONTAINER sent to absent container > container_e51_1442063466801_0001_01_000101 > 2015-09-12 18:44:25,670 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_e51_1442063466801_0001_01_000101 from application > application_1442063466801_0001 > 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > 2015-09-12 18:44:25,692 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got > event CONTAINER_STOP for appId application_1442063466801_0001 > 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Exiting, bbye.. > 2015-09-12 18:44:25,692 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf > OPERATION=Container Finished - SucceededTARGET=ContainerImpl > RESULT=SUCCESS APPID=application_1442063466801_0001 > CONTAINERID=container_e51_1442063466801_0001_01_000100 > {code} > *Analysis* > Looks like for absent container also {{stopContainer}} is called > {code} > case CONTAINER_FINISHED: > LogHandlerContainerFinishedEvent containerFinishEvent = > (LogHandlerContainerFinishedEvent) event; > stopContainer(containerFinishEvent.getContainerId(), > containerFinishEvent.getExitCode()); > break; > {code} > *Event EventType: KILL_CONTAINER sent to absent container > container_e51_1442063466801_0001_01_000101* > Should skip when {{null==context.getContainers().get(containerId)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)