[
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528957#comment-15528957
]
Yufei Gu commented on YARN-4743:
--------------------------------
Hi [~gzh1992n], thanks for working on this. Some thoughts about the patch. Both
0.5(weight value less than 1.0) or 0.0 are valid value for weights in fair
scheduler. Once use case of zero-weight would be that user uses the zero-weight
queue to run jobs when there is no jobs for other non-zero-weight queues. So it
make no sense to me to enforce weight larger than 1.0.
> ResourceManager crash because TimSort
> -------------------------------------
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.6.4
> Reporter: Zephyr Guo
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-4743-v1.patch, YARN-CDH5.4.7.patch, timsort.log
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general
> contract!
> at java.util.TimSort.mergeHi(TimSort.java:868)
> at java.util.TimSort.mergeAt(TimSort.java:485)
> at java.util.TimSort.mergeCollapse(TimSort.java:410)
> at java.util.TimSort.sort(TimSort.java:214)
> at java.util.TimSort.sort(TimSort.java:173)
> at java.util.Arrays.sort(Arrays.java:659)
> at java.util.Collections.sort(Collections.java:217)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator<Schedulable> comparator = policy.getComparator();
> writeLock.lock();
> try {
> Collections.sort(runnableApps, comparator);
> } finally {
> writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ......
> s1.getResourceUsage(), minShare1);
> boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
> s2.getResourceUsage(), minShare2);
> minShareRatio1 = (double) s1.getResourceUsage().getMemory()
> / Resources.max(RESOURCE_CALCULATOR, null, minShare1,
> ONE).getMemory();
> minShareRatio2 = (double) s2.getResourceUsage().getMemory()
> / Resources.max(RESOURCE_CALCULATOR, null, minShare2,
> ONE).getMemory();
> ......
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is
> unstable.
> {code:title=FSAppAttempt.java}
> @Override
> public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(),
> getPreemptedResources());
> }
> {code}
> {code:title=SchedulerApplicationAttempt}
> public Resource getCurrentConsumption() {
> return currentConsumption;
> }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ......
> Resources.addTo(currentConsumption, rmContainer.getContainer()
> .getResource());
> ......
> }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]