[
https://issues.apache.org/jira/browse/YARN-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787526#comment-17787526
]
Jepson commented on YARN-11615:
-------------------------------
*+Current Version: 2.9.2,FairScheduler.java+*
*code also add “synchronized”,but this error still occurs.*
void continuousSchedulingAttempt() throws InterruptedException {
long start = getClock().getTime();
List<FSSchedulerNode> nodeIdList;
// Hold a lock to prevent comparator order changes due to changes of node
// unallocated resources
{color:#4C9AFF}synchronized (this) {
nodeIdList = nodeTracker.sortedNodeList(nodeAvailableResourceComparator);
}{color}
// iterate all nodes
for (FSSchedulerNode node : nodeIdList) {
try {
if (Resources.fitsIn(minimumAllocation,
node.getUnallocatedResource())) {
attemptScheduling(node);
}
} catch (Throwable ex) {
LOG.error("Error while attempting scheduling for node " + node +
": " + ex.toString(), ex);
if ((ex instanceof YarnRuntimeException) &&
(ex.getCause() instanceof InterruptedException)) {
// AsyncDispatcher translates InterruptedException to
// YarnRuntimeException with cause InterruptedException.
// Need to throw InterruptedException to stop schedulingThread.
throw (InterruptedException)ex.getCause();
}
}
}
long duration = getClock().getTime() - start;
fsOpDurations.addContinuousSchedulingRunDuration(duration);
}
> Comparison method violates its general contract
> -----------------------------------------------
>
> Key: YARN-11615
> URL: https://issues.apache.org/jira/browse/YARN-11615
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.9.2
> Reporter: Jepson
> Priority: Major
>
> 2023-11-18 04:04:43,578 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread,
> FairSchedulerContinuousScheduling, that exited unexpectedly:
> *{color:#DE350B}java.lang.IllegalArgumentException: Comparison method
> violates its general contract!{color}*
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeCollapse(TimSort.java:441)
> at java.util.TimSort.sort(TimSort.java:245)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:351)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:917)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296)
> 2023-11-18 04:04:43,578 WARN
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning
> the resource manager to standby.
> 2023-11-18 04:04:43,578 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
> Container container_e92_1700226234054_0098_02_000163 completed with event
> FINISHED, but corresponding RMContainer doesn't exist.
> 2023-11-18 04:04:43,579 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning
> RM to Standby mode
> 2023-11-18 04:04:43,579 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning
> to standby state
> 2023-11-18 04:04:43,579 WARN
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
>
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> 2023-11-18 04:04:43,583 INFO org.apache.hadoop.ipc.Server: Stopping server on
> 23140
> 2023-11-18 04:04:43,587 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 23140
> 2023-11-18 04:04:43,587 INFO org.apache.hadoop.ipc.Server: Stopping server on
> 23130
> 2023-11-18 04:04:43,587 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
> Container container_e92_1700226234054_0098_02_000155 completed with event
> FINISHED, but corresponding RMContainer doesn't exist.
> 2023-11-18 04:04:43,587 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2023-11-18 04:04:43,589 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 23130
> 2023-11-18 04:04:43,590 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> {color:#DE350B}*2023-11-18 04:04:43,590 INFO org.apache.hadoop.ipc.Server:
> Stopping server on 8031*{color}
> 2023-11-18 04:04:43,597 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8031
> 2023-11-18 04:04:43,597 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor
> thread interrupted
> 2023-11-18 04:04:43,597 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2023-11-18 04:04:43,597 ERROR org.apache.hadoop.yarn.event.EventDispatcher:
> Returning, interrupted : java.lang.InterruptedException
> 2023-11-18 04:04:43,598 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService:
> Interrupted while waiting to reload alloc configuration
> 2023-11-18 04:04:43,598 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
> Scheduler UpdateThread interrupted. Exiting.
> 2023-11-18 04:04:43,598 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> AsyncDispatcher is draining to stop, ignoring any new events.
> 2023-11-18 04:04:43,598 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor:
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor
> thread interrupted
> 2023-11-18 04:04:43,598 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor:
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer
> thread interrupted
> 2023-11-18 04:04:43,598 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor
> thread interrupted
> 2023-11-18 04:04:43,598 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor
> thread interrupted
> 2023-11-18 04:04:43,599 ERROR
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> ExpiredTokenRemover received java.lang.InterruptedException: sleep
> interrupted
> 2023-11-18 04:04:43,599 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager
> metrics system...
> 2023-11-18 04:04:43,601 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system stopped.
> 2023-11-18 04:04:43,601 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system shutdown complete.
> 2023-11-18 04:04:43,601 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> AsyncDispatcher is draining to stop, ignoring any new events.
> 2023-11-18 04:04:43,604 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> Registering class
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher
> 2023-11-18 04:04:43,606 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]