[
https://issues.apache.org/jira/browse/YARN-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shilun Fan updated YARN-11147:
------------------------------
Target Version/s: 3.4.0
Affects Version/s: 3.4.0
> ResourceUsage and QueueCapacities classes provide node label iterators that
> are not thread safe
> -----------------------------------------------------------------------------------------------
>
> Key: YARN-11147
> URL: https://issues.apache.org/jira/browse/YARN-11147
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 3.4.0
> Reporter: András Győri
> Assignee: András Győri
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> AbstractResourceUsage#getNodePartitionsSet and
> QueueCapacities#getNodePartitionsSet provide keySet, a mutable view on the
> HashMap's keys, that is subject to change. Iterating through an iterator that
> is modified by an other thread at the same time results in a
> ConcurrentModificationException as the following stacktrace shows:
> {code:java}
> 2022-04-28 13:21:53,692 FATAL org.apache.hadoop.yarn.event.EventDispatcher:
> Error in handling event type NODE_LABELS_UPDATE to the Event Dispatcher
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
> at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
> at com.google.common.collect.Sets$1$1.computeNext(Sets.java:758)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:236)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:1281)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:2115)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1900)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:169)
> at
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]