[
https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836638#comment-16836638
]
Jim Brennan commented on YARN-9527:
-----------------------------------
I was able to repro the problem in branch-2.8 on a one-node-cluster by changing
ApplicationImpl.AppInitDoneTransition() to immediately send a
ContainerKillEvent event after first ContainerInitEvent is sent. So it's a
one-time shot for the NM.
I restart the nodemanager with this change, and then run a sleep job with a
list of files to localize.
{noformat}
hadoop jar
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar
sleep -files
file1,file2,file3,file4,file5,file6,file7,file8,file9,file10,file11,file12,file13,file14,file15,file16,file17
-m 10 -r 10 -mt 10000 -rt 10000
{noformat}
Without my fix, this causes a rogue ContainerLocalizer to get stuck in the
LOCALIZED at LOCALIZED loop every time. I have verified that my fix prevents
this. I have also verified that the fix without the LRUCache portion (just the
findNextResource change) does not fix the problem (at least for this test case).
> Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
> -------------------------------------------------------------------------
>
> Key: YARN-9527
> URL: https://issues.apache.org/jira/browse/YARN-9527
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 2.8.5, 3.1.2
> Reporter: Jim Brennan
> Assignee: Jim Brennan
> Priority: Major
> Attachments: YARN-9527.001.patch, YARN-9527.002.patch,
> YARN-9527.003.patch, YARN-9527.004.patch
>
>
> A rogue ContainerLocalizer can get stuck in a loop continuously downloading
> the same file while generating an "Invalid event: LOCALIZED at LOCALIZED"
> exception on each iteration. Sometimes this continues long enough that it
> fills up a disk or depletes available inodes for the filesystem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]