[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882025#comment-13882025
 ] 

Karthik Kambatla commented on YARN-1618:
----------------------------------------

bq. All we need to do is go from NEW->KILLED on KILL event and ignore START 
event in KILLED state.
Agree. Posted patch (yarn-1618-2.patch) to handle this. Tested the patch on a 
secure cluster, and verified the RM doesn't crash anymore when I run an Oozie 
job with an incorrect RM address.

bq. The point about saving app before scheduler acknowledges is a known issue. 
If that is the only issue, we can close as a duplicate of YARN-1507 which 
already exists.
I think there is merit to fixing the bug here, and use YARN-1507 to have the 
app be saved only after the scheduler acknowledges it.  

> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1618
>                 URL: https://issues.apache.org/jira/browse/YARN-1618
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: yarn-1618-1.patch, yarn-1618-2.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to