[
https://issues.apache.org/jira/browse/YARN-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated YARN-9320:
-----------------------------------
Description:
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top
of my head what version it corresponds to. I can look it up if that's
important, but I haven't found a bug like this so I suspect it would also
affect a current version unless fixed by accident.
If it helps, the cluster is very large (1000s of NMs) so we expect node
failures/restart frequently; also some apps may have misconfigured node labels
specified so node label related stuff may go into corner cases. Still, this
shouldn't happen based on a user-supplied parameter.
{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor]
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils:
queueCapacities.getNodePartitionsSet() changed
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)
{noformat}
was:
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top
of my head what version it corresponds to. I can look it up if that's
important, but I haven't found a bug like this so I suspect it would also
affect a current version unless fixed by accident.
If it helps, the cluster is very large (1000s of NMs) so we expect node
failures frequently; also some apps may have misconfigured node labels
specified so node label related stuff may go into corner cases. Still, this
shouldn't happen based on a user-supplied parameter.
{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor]
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils:
queueCapacities.getNodePartitionsSet() changed
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)
{noformat}
> ConcurrentModificationException in capacity scheduler (updateQueueStatistics)
> -----------------------------------------------------------------------------
>
> Key: YARN-9320
> URL: https://issues.apache.org/jira/browse/YARN-9320
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.9.3
> Reporter: Sergey Shelukhin
> Priority: Critical
>
> We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the
> top of my head what version it corresponds to. I can look it up if that's
> important, but I haven't found a bug like this so I suspect it would also
> affect a current version unless fixed by accident.
> If it helps, the cluster is very large (1000s of NMs) so we expect node
> failures/restart frequently; also some apps may have misconfigured node
> labels specified so node label related stuff may go into corner cases. Still,
> this shouldn't happen based on a user-supplied parameter.
> {noformat}
> 2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor]
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils:
> queueCapacities.getNodePartitionsSet() changed
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
> at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
> at
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]