[ 
https://issues.apache.org/jira/browse/YARN-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721666#comment-16721666
 ] 

Eric Yang commented on YARN-9126:
---------------------------------

We have a couple options here:

Option 1 - Force file copy to overwrite existing files in container-executor.
Option 2 - Delete the entire working directory, and allow the reinit process to 
reconstruct the whole thing.
Option 3 - Delete the files that will be localized prior to launching 
container-executor.  The only problem is the files are owned by application 
runner.  This approach will require calling container-executor to delete the 
targeted list of files, then reinit again during relaunch.  It's a more 
expensive process than option 1.  The only rationale to go with this solution 
is to delete files as application runner to protect against overwriting other 
people's files.

> Container reinit always fails in branch-3.2 and trunk
> -----------------------------------------------------
>
>                 Key: YARN-9126
>                 URL: https://issues.apache.org/jira/browse/YARN-9126
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Priority: Major
>              Labels: docker
>
> When upgrading container, container reinitialization always failed with code 
> 33.  This error code means the localizing file already exist while copying 
> resource files.  The container will retry with another container ID, hence 
> the problem is masked.
> Hadoop 3.1.x relaunch logic seem to have some way to prevent this bug from 
> happening.  The same logic might be useful in branch 3.2 and trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to