[
https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865956#comment-13865956
]
Jason Lowe commented on YARN-1575:
----------------------------------
I think there's a race condition in the public localizer. The code adds
requests to the queue like this:
{code}
if (rsrc.tryAcquire()) {
....
pending.put(queue.submit(new FSDownload(lfs, null, conf,
publicDirDestPath, resource)), request);
{code}
and it pulls requests like this:
{code}
while (!Thread.currentThread().isInterrupted()) {
try {
Future<Path> completed = queue.take();
LocalizerResourceRequestEvent assoc = pending.remove(completed);
try {
Path local = completed.get();
if (null == assoc) {
LOG.error("Localized unkonwn resource to " + completed);
{code}
{{pending}} is a ConcurrentHashMap but that's insufficient. queue.submit can
complete and trigger the consumer thread before the producer thread completes
the subsequent pending.put, and the consumer thread can be left with a request
that has no corresponding pending entry.
> Public localizer crashes with "Localized unkown resource"
> ---------------------------------------------------------
>
> Key: YARN-1575
> URL: https://issues.apache.org/jira/browse/YARN-1575
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 0.23.10, 2.2.0
> Reporter: Jason Lowe
> Priority: Critical
>
> The public localizer can crash with the error:
> {noformat}
> 2014-01-08 14:11:43,212 [Thread-467] ERROR
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
> 2014-01-08 14:11:43,212 [Thread-467] INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Public cache exiting
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)