[ https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954732#comment-14954732 ]
Hadoop QA commented on YARN-4227: --------------------------------- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 15s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 56s | The applied patch generated 1 new checkstyle issues (total was 71, now 72). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 44s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 44s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 57m 48s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 104m 2s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766280/YARN-4227.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5b6bae0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9423/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9423/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9423/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9423/console | This message was automatically generated. > FairScheduler: RM quits processing expired container from a removed node > ------------------------------------------------------------------------ > > Key: YARN-4227 > URL: https://issues.apache.org/jira/browse/YARN-4227 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.3.0, 2.5.0, 2.7.1 > Reporter: Wilfred Spiegelenburg > Assignee: Wilfred Spiegelenburg > Priority: Critical > Attachments: YARN-4227.2.patch, YARN-4227.3.patch, YARN-4227.patch > > > Under some circumstances the node is removed before an expired container > event is processed causing the RM to exit: > {code} > 2015-10-04 21:14:01,063 INFO > org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: > Expired:container_1436927988321_1307950_01_000012 Timed out after 600 secs > 2015-10-04 21:14:01,063 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1436927988321_1307950_01_000012 Container Transitioned from > ACQUIRED to EXPIRED > 2015-10-04 21:14:01,063 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: > Completed container: container_1436927988321_1307950_01_000012 in state: > EXPIRED event:EXPIRE > 2015-10-04 21:14:01,063 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op > OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS > APPID=application_1436927988321_1307950 > CONTAINERID=container_1436927988321_1307950_01_000012 > 2015-10-04 21:14:01,063 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_EXPIRED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585) > at java.lang.Thread.run(Thread.java:745) > 2015-10-04 21:14:01,063 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} > The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 > and 2.6.0 by different customers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)