[ 
https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991594#comment-13991594
 ] 

Wangda Tan commented on YARN-1986:
----------------------------------

[~sandyr], in my understanding, this will be happen in following invoke 
sequence. [~zhiguohong] please correct me if I was wrong.
{code}
1) APP_ADDED // will create SchedulerApplication
2) NODE_UPDATE // will access SchedulerApplication.getCurrentAppAttempt(), it's 
null currently. :(
3) APP_ATTEMPT_ADDED // will call SchedulerApplication.setCurrentAppAttempt(...)
{code}
Instead of 
{code}
1) APP_ADDED // will create SchedulerApplication
2) APP_ATTEMPT_ADDED // will call SchedulerApplication.setCurrentAppAttempt(...)
3) NODE_UPDATE // will access SchedulerApplication.getCurrentAppAttempt, not 
null now :)
{code}

I think the patch should be enough to fix this problem, but some comments,
1) A simple test case (instead of a random test) should be included to make 
sure no regression in the future
2) Should we rethink is it possible to evict APP_ADDED event in scheduler? Only 
APP_ATTEMPT_ADDED event left can make sure atomic in this case.

> After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
> ----------------------------------------------------------
>
>                 Key: YARN-1986
>                 URL: https://issues.apache.org/jira/browse/YARN-1986
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Jon Bringhurst
>            Assignee: Hong Zhiguo
>         Attachments: YARN-1986-testcase.patch, YARN-1986.patch
>
>
> After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
> After RM was restarted, the job runs without a problem.
> {noformat}
> 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
> NODE_UPDATE to the scheduler
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>       at java.lang.Thread.run(Thread.java:744)
> 19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to