[ 
https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605974#comment-13605974
 ] 

Bikas Saha commented on YARN-472:
---------------------------------

I think we are on the same page. Its not quite easy to make the AM just crash 
because it has multiple threads and shutdown hooks etc. Do you have any 
suggestions?
It looks like the cleanest way is to follow the normal shutdown path and not do 
deletion of staging dir and unregister. The rest of the committer and history 
stuff should work fine after all the fixes we made to that code. Unless this is 
the last/successful attempt, history should be available for recovery and 
commit should not happen.
                
> MR app master deletes staging dir when sent a reboot command from the RM
> ------------------------------------------------------------------------
>
>                 Key: YARN-472
>                 URL: https://issues.apache.org/jira/browse/YARN-472
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: jian he
>            Assignee: jian he
>         Attachments: YARN-472.1.patch
>
>
> If the RM is restarted when the MR job is running, then it sends a reboot 
> command to the job. The job ends up deleting the staging dir and that causes 
> the next attempt to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to