[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zephyr Guo updated YARN-4743:
-----------------------------
    Description: 
{code}
2016-02-26 14:08:50,821 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: Comparison method violates its general 
contract!
         at java.util.TimSort.mergeHi(TimSort.java:868)
         at java.util.TimSort.mergeAt(TimSort.java:485)
         at java.util.TimSort.mergeCollapse(TimSort.java:410)
         at java.util.TimSort.sort(TimSort.java:214)
         at java.util.TimSort.sort(TimSort.java:173)
         at java.util.Arrays.sort(Arrays.java:659)
         at java.util.Collections.sort(Collections.java:217)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
         at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
         at java.lang.Thread.run(Thread.java:745)
2016-02-26 14:08:50,822 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

Actually, this bug found in 2.6.0-cdh. {{FairShareComparator}} is not 
transitive.

We get NaN when memorySize=0 and weight=0.
{code:title=FairSharePolicy.java}
useToWeightRatio1 = s1.getResourceUsage().getMemorySize() /
  s1.getWeights().getWeight(ResourceType.MEMORY)
{code}


  was:
{code}
2016-02-26 14:08:50,821 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: Comparison method violates its general 
contract!
         at java.util.TimSort.mergeHi(TimSort.java:868)
         at java.util.TimSort.mergeAt(TimSort.java:485)
         at java.util.TimSort.mergeCollapse(TimSort.java:410)
         at java.util.TimSort.sort(TimSort.java:214)
         at java.util.TimSort.sort(TimSort.java:173)
         at java.util.Arrays.sort(Arrays.java:659)
         at java.util.Collections.sort(Collections.java:217)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
         at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
         at java.lang.Thread.run(Thread.java:745)
2016-02-26 14:08:50,822 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

Actually, this issue found in 2.6.0-cdh5.4.7.
I think the cause is that we modify {{Resouce}} while we are sorting 
{{runnableApps}}.
{code:title=FSLeafQueue.java}
    Comparator<Schedulable> comparator = policy.getComparator();
    writeLock.lock();
    try {
      Collections.sort(runnableApps, comparator);
    } finally {
      writeLock.unlock();
    }
    readLock.lock();
{code}

{code:title=FairShareComparator}
public int compare(Schedulable s1, Schedulable s2) {
......
          s1.getResourceUsage(), minShare1);
      boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
          s2.getResourceUsage(), minShare2);
      minShareRatio1 = (double) s1.getResourceUsage().getMemory()
          / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
ONE).getMemory();
      minShareRatio2 = (double) s2.getResourceUsage().getMemory()
          / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
ONE).getMemory();
......
{code}
{{getResourceUsage}} will return current Resource. The current Resource is 
unstable. 
{code:title=FSAppAttempt.java}
@Override
  public Resource getResourceUsage() {
    // Here the getPreemptedResources() always return zero, except in
    // a preemption round
    return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
  }
{code}
{code:title=SchedulerApplicationAttempt}
 public Resource getCurrentConsumption() {
    return currentConsumption;
  }

// This method may modify current Resource.
public synchronized void recoverContainer(RMContainer rmContainer) {
......
    Resources.addTo(currentConsumption, rmContainer.getContainer()
      .getResource());
......
  }
{code}
I suggest that use stable Resource in comparator.

Is there something i think wrong´╝č


> ResourceManager crash because TimSort
> -------------------------------------
>
>                 Key: YARN-4743
>                 URL: https://issues.apache.org/jira/browse/YARN-4743
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Zephyr Guo
>            Assignee: Zephyr Guo
>         Attachments: YARN-4743-v1.patch, YARN-4743-v2.patch, timsort.log
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>          at java.util.TimSort.mergeHi(TimSort.java:868)
>          at java.util.TimSort.mergeAt(TimSort.java:485)
>          at java.util.TimSort.mergeCollapse(TimSort.java:410)
>          at java.util.TimSort.sort(TimSort.java:214)
>          at java.util.TimSort.sort(TimSort.java:173)
>          at java.util.Arrays.sort(Arrays.java:659)
>          at java.util.Collections.sort(Collections.java:217)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>          at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this bug found in 2.6.0-cdh. {{FairShareComparator}} is not 
> transitive.
> We get NaN when memorySize=0 and weight=0.
> {code:title=FairSharePolicy.java}
> useToWeightRatio1 = s1.getResourceUsage().getMemorySize() /
>   s1.getWeights().getWeight(ResourceType.MEMORY)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to