[
https://issues.apache.org/jira/browse/YARN-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721666#comment-16721666
]
Eric Yang commented on YARN-9126:
---------------------------------
We have a couple options here:
Option 1 - Force file copy to overwrite existing files in container-executor.
Option 2 - Delete the entire working directory, and allow the reinit process to
reconstruct the whole thing.
Option 3 - Delete the files that will be localized prior to launching
container-executor. The only problem is the files are owned by application
runner. This approach will require calling container-executor to delete the
targeted list of files, then reinit again during relaunch. It's a more
expensive process than option 1. The only rationale to go with this solution
is to delete files as application runner to protect against overwriting other
people's files.
> Container reinit always fails in branch-3.2 and trunk
> -----------------------------------------------------
>
> Key: YARN-9126
> URL: https://issues.apache.org/jira/browse/YARN-9126
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Eric Yang
> Priority: Major
> Labels: docker
>
> When upgrading container, container reinitialization always failed with code
> 33. This error code means the localizing file already exist while copying
> resource files. The container will retry with another container ID, hence
> the problem is masked.
> Hadoop 3.1.x relaunch logic seem to have some way to prevent this bug from
> happening. The same logic might be useful in branch 3.2 and trunk.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]