[
https://issues.apache.org/jira/browse/YARN-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110852#comment-16110852
]
Eric Payne commented on YARN-6846:
----------------------------------
{quote}
these aren't the methods that are setting errno in the code change in the first
block referenced (1837).
{code}
- if (rmdir(path) != 0) {
+ if (rmdir(path) != 0 && errno != ENOENT) {
{code}
rmdir(path) is what sets errno here and can return ENOENT.
{quote}
Thanks [~ebadger]. You are correct. I misread the code.
> Nodemanager can fail to fully delete application local directories when
> applications are killed
> -----------------------------------------------------------------------------------------------
>
> Key: YARN-6846
> URL: https://issues.apache.org/jira/browse/YARN-6846
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.8.1
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Critical
> Attachments: YARN-6846.001.patch, YARN-6846.002.patch,
> YARN-6846.003.patch
>
>
> When an application is killed all of the running containers are killed and
> the app waits for the containers to complete before cleaning up. As each
> container completes the container directory is deleted via the
> DeletionService. After all containers have completed the app completes and
> the app directory is deleted. If the app completes quickly enough then the
> deletion of the container and app directories can race against each other.
> If the container deletion executor deletes a file just before the application
> deletion executor then it can cause the application deletion executor to
> fail, leaving the remaining entries in the application directory lingering.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]