[
https://issues.apache.org/jira/browse/YARN-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723580#comment-16723580
]
Chandni Singh commented on YARN-9126:
-------------------------------------
There were 2 changes that caused the issue:
- YARN-7644 : the cleanup of working directory is done asynchronously
- YARN-8569: this introduced sysfs directory in container's working directory
which needs to be deleted during cleanup of working directory.
Attached is patch 001. [~eyang] could you please take a look.
> Container reinit always fails in branch-3.2 and trunk
> -----------------------------------------------------
>
> Key: YARN-9126
> URL: https://issues.apache.org/jira/browse/YARN-9126
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Eric Yang
> Assignee: Chandni Singh
> Priority: Major
> Labels: docker
> Attachments: YARN-9126.001.patch
>
>
> When upgrading container, container reinitialization always failed with code
> 33. This error code means the localizing file already exist while copying
> resource files. The container will retry with another container ID, hence
> the problem is masked.
> Hadoop 3.1.x relaunch logic seem to have some way to prevent this bug from
> happening. The same logic might be useful in branch 3.2 and trunk.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]