[
https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605974#comment-13605974
]
Bikas Saha commented on YARN-472:
---------------------------------
I think we are on the same page. Its not quite easy to make the AM just crash
because it has multiple threads and shutdown hooks etc. Do you have any
suggestions?
It looks like the cleanest way is to follow the normal shutdown path and not do
deletion of staging dir and unregister. The rest of the committer and history
stuff should work fine after all the fixes we made to that code. Unless this is
the last/successful attempt, history should be available for recovery and
commit should not happen.
> MR app master deletes staging dir when sent a reboot command from the RM
> ------------------------------------------------------------------------
>
> Key: YARN-472
> URL: https://issues.apache.org/jira/browse/YARN-472
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: jian he
> Assignee: jian he
> Attachments: YARN-472.1.patch
>
>
> If the RM is restarted when the MR job is running, then it sends a reboot
> command to the job. The job ends up deleting the staging dir and that causes
> the next attempt to fail.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira