[
https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664229#comment-16664229
]
Jason Lowe commented on YARN-8672:
----------------------------------
Thanks for updating the patch! I'm still skeptical this is going to work well
in practice for some corner cases. For example, what if FileDeletionService
has been configured with a delay? Deletes would be significantly delayed, and
then the tokens file can be removed just as the new one is getting created.
Couple of potential fixes:
# Have localizers use token files private to that localizer instance. Then
each localizer is responsible for reaping its personal tokens file without
concerns of deleting it just as a new localizer spins up to use it. Seems we
would always have a race trying to share the tokens file given we cannot
control the time period between when we want to delete something and when it
actually gets deleted.
# Never delete the tokens file until the container completes. This could have
implications if the tokens file needs to be different between different
localizers (i.e.: credentials of the container were updated since the first
localizer).
My preference would be to separate the token files.
> TestContainerManager#testLocalingResourceWhileContainerRunning occasionally
> times out
> -------------------------------------------------------------------------------------
>
> Key: YARN-8672
> URL: https://issues.apache.org/jira/browse/YARN-8672
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.2.0
> Reporter: Jason Lowe
> Assignee: Chandni Singh
> Priority: Major
> Attachments: YARN-8672.001.patch, YARN-8672.002.patch,
> YARN-8672.003.patch, YARN-8672.004.patch
>
>
> Precommit builds have been failing in
> TestContainerManager#testLocalingResourceWhileContainerRunning. I have been
> able to reproduce the problem without any patch applied if I run the test
> enough times. It looks like something is removing container tokens from the
> nmPrivate area just as a new localizer starts.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]