Dustin Cote created YARN-3419: --------------------------------- Summary: ConcurrentModificationException in FSLeafQueue Key: YARN-3419 URL: https://issues.apache.org/jira/browse/YARN-3419 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Dustin Cote Assignee: Dustin Cote
Heavy Resource Manager use causes a manifestation of a ConcurrentModificationException in FSLeafQueue. Doesn't look like FSLeafQueue does anything except add, remove, traverse, and get sorted, so I think we could use a CopyOnWriteArrayList that will use a bit more memory but remove these exceptions. Seems to me that there will be relatively few app adds compared to the number of traversals. Stack trace below: 2015-03-27 00:47:34,773 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1427401429921_3388 java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) at java.util.ArrayList$Itr.next(ArrayList.java:831) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:929) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:922) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:110) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:765) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:746) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332)