[
https://issues.apache.org/jira/browse/YARN-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217266#comment-16217266
]
Robert Kanter edited comment on YARN-7382 at 10/24/17 5:03 PM:
---------------------------------------------------------------
Test failures occur even without the patch. I've filed YARN-7385.
was (Author: rkanter):
Test failures occur even without the path. I've filed YARN-7385.
> NoSuchElementException in FairScheduler after failover causes RM crash
> ----------------------------------------------------------------------
>
> Key: YARN-7382
> URL: https://issues.apache.org/jira/browse/YARN-7382
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.9.0, 3.0.0
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Priority: Blocker
> Attachments: YARN-7382.001.patch
>
>
> While running an MR job (e.g. sleep) and an RM failover occurs, once the maps
> gets to 100%, the now active RM will crash due to:
> {noformat}
> 2017-10-18 15:02:05,347 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1508361403235_0001_01_000002 Container Transitioned from RUNNING to
> COMPLETED
> 2017-10-18 15:02:05,347 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1508361403235_0001
> CONTAINERID=container_1508361403235_0001_01_000002
> RESOURCE=<memory:1024, vCores:1>
> 2017-10-18 15:02:05,349 FATAL org.apache.hadoop.yarn.event.EventDispatcher:
> Error in handling event type NODE_UPDATE to the Event Dispatcher
> java.util.NoSuchElementException
> at
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:371)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:901)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1326)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:371)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:221)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:221)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1019)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:887)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1104)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:128)
> at
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> 2017-10-18 15:02:05,360 INFO org.apache.hadoop.yarn.event.EventDispatcher:
> Exiting, bbye..
> {noformat}
> This leaves the cluster with no RMs!
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]