[
https://issues.apache.org/jira/browse/YARN-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723891#comment-13723891
]
Jason Lowe commented on YARN-917:
---------------------------------
Yes, that's exactly what I was proposing with my first comment.
Originally the staging directory cleanup was after the unregister, but there
was a problem. Before the FINISHING state was added, the RM would kill the AM
container as soon as it unregistered. This meant that sometimes the AM would
be killed by the NM (if the NM happened to heartbeat soon enough) before the AM
had a chance to cleanup the staging directory, and over time staging
directories would start piling up and filling user quotas. The initial fix was
to move the staging directory cleanup from after unregistering to just before.
Now that there's a FINISHING state that allows the AM some time to cleanup
before it is killed, it should have plenty of time to remove the staging
directory before the RM tries to kill it after unregistering.
> Job can fail when RM restarts after staging dir is cleaned but before MR
> successfully unregister with RM
> --------------------------------------------------------------------------------------------------------
>
> Key: YARN-917
> URL: https://issues.apache.org/jira/browse/YARN-917
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Jian He
> Assignee: Jian He
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira