[
https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991594#comment-13991594
]
Wangda Tan commented on YARN-1986:
----------------------------------
[~sandyr], in my understanding, this will be happen in following invoke
sequence. [~zhiguohong] please correct me if I was wrong.
{code}
1) APP_ADDED // will create SchedulerApplication
2) NODE_UPDATE // will access SchedulerApplication.getCurrentAppAttempt(), it's
null currently. :(
3) APP_ATTEMPT_ADDED // will call SchedulerApplication.setCurrentAppAttempt(...)
{code}
Instead of
{code}
1) APP_ADDED // will create SchedulerApplication
2) APP_ATTEMPT_ADDED // will call SchedulerApplication.setCurrentAppAttempt(...)
3) NODE_UPDATE // will access SchedulerApplication.getCurrentAppAttempt, not
null now :)
{code}
I think the patch should be enough to fix this problem, but some comments,
1) A simple test case (instead of a random test) should be included to make
sure no regression in the future
2) Should we rethink is it possible to evict APP_ADDED event in scheduler? Only
APP_ATTEMPT_ADDED event left can make sure atomic in this case.
> After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
> ----------------------------------------------------------
>
> Key: YARN-1986
> URL: https://issues.apache.org/jira/browse/YARN-1986
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Jon Bringhurst
> Assignee: Hong Zhiguo
> Attachments: YARN-1986-testcase.patch, YARN-1986.patch
>
>
> After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
> After RM was restarted, the job runs without a problem.
> {noformat}
> 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type
> NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> at java.lang.Thread.run(Thread.java:744)
> 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye..
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)