[
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755851#comment-13755851
]
Vinod Kumar Vavilapalli commented on YARN-292:
----------------------------------------------
bq. 3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl
doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is
not thread safe (FairScheduler#applications is a HashMap while
CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of
accessing the map are not consistently synchronized, thus, read and write on
the same map can operate simultaneously. RMAppAttemptImpl on the thread of
AsyncDispatcher will eventually call FiFoScheduler#applications#get in
AMContainerAllocatedTransition, while FiFoScheduler on thread of
SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove.
Therefore, getting null when the application actually exists happens under a
big number of concurrent operations.
This doesn't sound right to me. The thing is scheduler will be told to remove
app only by RMAppAttempt. Now if the RMAppAttempt is going to
AMContainerAllocatedTransition, it cannot tell the scheduler to remove app.
While the theory of unsafe data-structures seems right, I still can't see the
case when the original exception can happen. Clearly the app was removed, then
the RMAppAttempt would have gone into KILLING state, right? If so, why is it
now trying to get the AM Container?
> ResourceManager throws ArrayIndexOutOfBoundsException while handling
> CONTAINER_ALLOCATED for application attempt
> ----------------------------------------------------------------------------------------------------------------
>
> Key: YARN-292
> URL: https://issues.apache.org/jira/browse/YARN-292
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.0.1-alpha
> Reporter: Devaraj K
> Assignee: Zhijie Shen
> Attachments: YARN-292.1.patch, YARN-292.2.patch, YARN-292.3.patch
>
>
> {code:xml}
> 2012-12-26 08:41:15,030 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
> Calling allocate on removed or non existant application
> appattempt_1356385141279_49525_000001
> 2012-12-26 08:41:15,031 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
> handling event type CONTAINER_ALLOCATED for applicationAttempt
> application_1356385141279_49525
> java.lang.ArrayIndexOutOfBoundsException: 0
> at java.util.Arrays$ArrayList.get(Arrays.java:3381)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:662)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira