[ 
https://issues.apache.org/jira/browse/YARN-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787526#comment-17787526
 ] 

Jepson commented on YARN-11615:
-------------------------------

*+Current Version: 2.9.2,FairScheduler.java+*
*code also add “synchronized”,but this error still occurs.*

void continuousSchedulingAttempt() throws InterruptedException {
    long start = getClock().getTime();
    List<FSSchedulerNode> nodeIdList;
    // Hold a lock to prevent comparator order changes due to changes of node
    // unallocated resources
    {color:#4C9AFF}synchronized (this) {
      nodeIdList = nodeTracker.sortedNodeList(nodeAvailableResourceComparator);
    }{color}

    // iterate all nodes
    for (FSSchedulerNode node : nodeIdList) {
      try {
        if (Resources.fitsIn(minimumAllocation,
            node.getUnallocatedResource())) {
          attemptScheduling(node);
        }
      } catch (Throwable ex) {
        LOG.error("Error while attempting scheduling for node " + node +
            ": " + ex.toString(), ex);
        if ((ex instanceof YarnRuntimeException) &&
            (ex.getCause() instanceof InterruptedException)) {
          // AsyncDispatcher translates InterruptedException to
          // YarnRuntimeException with cause InterruptedException.
          // Need to throw InterruptedException to stop schedulingThread.
          throw (InterruptedException)ex.getCause();
        }
      }
    }

    long duration = getClock().getTime() - start;
    fsOpDurations.addContinuousSchedulingRunDuration(duration);
  }

> Comparison method violates its general contract
> -----------------------------------------------
>
>                 Key: YARN-11615
>                 URL: https://issues.apache.org/jira/browse/YARN-11615
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.9.2
>            Reporter: Jepson
>            Priority: Major
>
> 2023-11-18 04:04:43,578 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> FairSchedulerContinuousScheduling, that exited unexpectedly: 
> *{color:#DE350B}java.lang.IllegalArgumentException: Comparison method 
> violates its general contract!{color}*
>       at java.util.TimSort.mergeHi(TimSort.java:899)
>       at java.util.TimSort.mergeAt(TimSort.java:516)
>       at java.util.TimSort.mergeCollapse(TimSort.java:441)
>       at java.util.TimSort.sort(TimSort.java:245)
>       at java.util.Arrays.sort(Arrays.java:1512)
>       at java.util.ArrayList.sort(ArrayList.java:1462)
>       at java.util.Collections.sort(Collections.java:177)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:351)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:917)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2023-11-18 04:04:43,578 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> the resource manager to standby.
> 2023-11-18 04:04:43,578 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Container container_e92_1700226234054_0098_02_000163 completed with event 
> FINISHED, but corresponding RMContainer doesn't exist.
> 2023-11-18 04:04:43,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> RM to Standby mode
> 2023-11-18 04:04:43,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to standby state
> 2023-11-18 04:04:43,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
>  
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2023-11-18 04:04:43,583 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 23140
> 2023-11-18 04:04:43,587 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 23140
> 2023-11-18 04:04:43,587 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 23130
> 2023-11-18 04:04:43,587 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Container container_e92_1700226234054_0098_02_000155 completed with event 
> FINISHED, but corresponding RMContainer doesn't exist.
> 2023-11-18 04:04:43,587 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2023-11-18 04:04:43,589 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 23130
> 2023-11-18 04:04:43,590 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> {color:#DE350B}*2023-11-18 04:04:43,590 INFO org.apache.hadoop.ipc.Server: 
> Stopping server on 8031*{color}
> 2023-11-18 04:04:43,597 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 8031
> 2023-11-18 04:04:43,597 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor 
> thread interrupted
> 2023-11-18 04:04:43,597 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2023-11-18 04:04:43,597 ERROR org.apache.hadoop.yarn.event.EventDispatcher: 
> Returning, interrupted : java.lang.InterruptedException
> 2023-11-18 04:04:43,598 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService:
>  Interrupted while waiting to reload alloc configuration
> 2023-11-18 04:04:43,598 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Scheduler UpdateThread interrupted. Exiting.
> 2023-11-18 04:04:43,598 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, ignoring any new events.
> 2023-11-18 04:04:43,598 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor
>  thread interrupted
> 2023-11-18 04:04:43,598 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer
>  thread interrupted
> 2023-11-18 04:04:43,598 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor 
> thread interrupted
> 2023-11-18 04:04:43,598 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor 
> thread interrupted
> 2023-11-18 04:04:43,599 ERROR 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>  ExpiredTokenRemover received java.lang.InterruptedException: sleep 
> interrupted
> 2023-11-18 04:04:43,599 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2023-11-18 04:04:43,601 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2023-11-18 04:04:43,601 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2023-11-18 04:04:43,601 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, ignoring any new events.
> 2023-11-18 04:04:43,604 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Registering class 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher
> 2023-11-18 04:04:43,606 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
>  NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to