[
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460280#comment-17460280
]
Hadoop QA commented on YARN-10178:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
39s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} No case conflicting files
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} The patch does not contain any
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to
include any new or modified tests. Please justify why no new tests are needed
for this patch. Also please list what manual steps were performed to verify
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 33m
23s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
4s{color} | {color:green}{color} | {color:green} trunk passed with JDK
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m
11s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
16m 42s{color} | {color:green}{color} | {color:green} branch has no errors when
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
54s{color} | {color:green}{color} | {color:green} trunk passed with JDK
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
44s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m
20s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m
0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
0s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
0s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
49s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} The patch has no whitespace
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
14m 6s{color} | {color:green}{color} | {color:green} patch has no errors when
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
40s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m
7s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m
15s{color} | {color:green}{color} | {color:green}
hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
33s{color} | {color:green}{color} | {color:green} The patch does not generate
ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m 19s{color} |
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base:
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1263/artifact/out/Dockerfile
|
| JIRA Issue | YARN-10178 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/13037536/YARN-10178.006.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite
unit shadedclient findbugs checkstyle spotbugs |
| uname | Linux fb9c797d9fa9 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / f43ac31b445 |
| Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
| Test Results |
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1263/testReport/ |
| Max. process+thread count | 962 (vs. ulimit of 5500) |
| modules | C:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
U:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
|
| Console output |
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1263/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
This message was automatically generated.
> Global Scheduler async thread crash caused by 'Comparison method violates its
> general contract'
> -----------------------------------------------------------------------------------------------
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 3.2.1
> Reporter: tuyu
> Assignee: Qi Zhu
> Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch,
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch,
> YARN-10178.006.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread,
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException:
> Comparison method violates its general contract!
> at
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has few require
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign
> container,and construct a CSAssignment struct, and use
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService will tryCommit this CSAssignment,look tryCommit
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
> long commitStart = System.nanoTime();
> ResourceCommitRequest<FiCaSchedulerApp, FiCaSchedulerNode> request =
> (ResourceCommitRequest<FiCaSchedulerApp, FiCaSchedulerNode>) r;
>
> ...
> boolean isSuccess = false;
> if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null && attemptId.equals(app.getApplicationAttemptId())) {
> if (app.accept(cluster, request, updatePending)
> && app.apply(cluster, request, updatePending)) { // apply this
> resource
> ...
> }
> }
> }
> return isSuccess;
> }
> }
> {code}
> {code:java}
> public boolean apply(Resource cluster, ResourceCommitRequest<FiCaSchedulerApp,
> FiCaSchedulerNode> request, boolean updatePending) {
> ...
> if (!reReservation) {
> getCSLeafQueue().apply(cluster, request);
> }
> ...
> }
> {code}
> LeafQueue.apply invok allocateResource
> {code:java}
> void allocateResource(Resource clusterResource,
> Resource resource, String nodePartition) {
> try {
> writeLock.lock(); // only lock leaf queue lock
> queueUsage.incUsed(nodePartition, resource);
>
> ++numContainers;
>
> CSQueueUtils.updateQueueStatistics(resourceCalculator, clusterResource,
> this, labelManager, nodePartition); // there will update queue
> statistics
> } finally {
> writeLock.unlock();
> }
> }
> {code}
> we found ResourceCommitterService will only lock leaf queue to update queue
> statistics, but AsyncThread use sortAndGetChildrenAllocationIterator only
> lock queue root queue lock
> {code:java}
> ParentQueue.java
> private Iterator<CSQueue> sortAndGetChildrenAllocationIterator(
> String partition) {
> try {
> readLock.lock();
> return queueOrderingPolicy.getAssignmentIterator(partition);
> } finally {
> readLock.unlock();
> }
> }
> {code}
> so if multi async thread compare queue usage statistics and
> ResourceCommitterService apply leaf queue change statistics concurrent, will
> break TimSort algo required, and cause thread crash
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]