[ 
https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835712#comment-16835712
 ] 

Eric Badger commented on YARN-9527:
-----------------------------------

Thanks for the analysis and patches, [~Jim_Brennan]! I believe I understand the 
problem and the patch you put up to fix it.

Moving the {{getPathForLocalization()}} logic into {{findNextResource()}} makes 
a lot of sense so we don't have to go through the bad resources one heartbeat 
at a time and so we'll actually remove them from the pending list. 

I'm not super wild about adding an LRU cache of 128 recent entries since it 
only makes the race less likely to occur instead of fixing it outright. 
However, this code is very complex and I can understand why you would want to 
make a minimally invasive change. I would like to hear other peoples' thoughts 
on this. 

It would also be good to prove that this fix actually works, and more 
importantly doesn't break anything else. So I think we should definitely wait 
for that until we put this in (if others agree with the approach)

> Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
> -------------------------------------------------------------------------
>
>                 Key: YARN-9527
>                 URL: https://issues.apache.org/jira/browse/YARN-9527
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.8.5, 3.1.2
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Major
>         Attachments: YARN-9527.001.patch, YARN-9527.002.patch, 
> YARN-9527.003.patch
>
>
> A rogue ContainerLocalizer can get stuck in a loop continuously downloading 
> the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" 
> exception on each iteration.  Sometimes this continues long enough that it 
> fills up a disk or depletes available inodes for the filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to