[
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274391#comment-17274391
]
Hadoop QA commented on YARN-10178:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m
15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} No case conflicting files
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} The patch does not contain any
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color}
| {color:green}test4tests{color} | {color:green} The patch appears to include 1
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m
41s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
2s{color} | {color:green}{color} | {color:green} trunk passed with JDK
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
17m 31s{color} | {color:green}{color} | {color:green} branch has no errors when
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m
51s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs
config; considering switching to SpotBugs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m
49s{color} |
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/562/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html{color}
| {color:red}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
in trunk has 1 extant findbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 53s{color}
|
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/562/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04.txt{color}
| {color:red}
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 generated 2 new + 42
unchanged - 0 fixed = 44 total (was 42) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 45s{color}
|
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/562/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01.txt{color}
| {color:red}
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 generated 2 new
+ 30 unchanged - 0 fixed = 32 total (was 30) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}
0m 40s{color} |
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/562/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
| {color:orange}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
The patch generated 1 new + 117 unchanged - 0 fixed = 118 total (was 117)
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} The patch has no whitespace
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
15m 38s{color} | {color:green}{color} | {color:green} patch has no errors when
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
37s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
33s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 98m
29s{color} | {color:green}{color} | {color:green}
hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
27s{color} | {color:green}{color} | {color:green} The patch does not generate
ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}170m 52s{color} |
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base:
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/562/artifact/out/Dockerfile
|
| JIRA Issue | YARN-10178 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/13019617/YARN-10178.004.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite
unit shadedclient findbugs checkstyle |
| uname | Linux 0f0a6f24bb65 4.15.0-126-generic #129-Ubuntu SMP Mon Nov 23
18:53:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / fa15594ae60 |
| Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 |
| Test Results |
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/562/testReport/ |
| Max. process+thread count | 819 (vs. ulimit of 5500) |
| modules | C:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
U:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
|
| Console output |
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/562/console |
| versions | git=2.25.1 maven=3.6.3 findbugs=4.0.6 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
This message was automatically generated.
> Global Scheduler async thread crash caused by 'Comparison method violates its
> general contract'
> -----------------------------------------------------------------------------------------------
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 3.2.1
> Reporter: tuyu
> Assignee: zhuqi
> Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch,
> YARN-10178.003.patch, YARN-10178.004.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread,
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException:
> Comparison method violates its general contract!
> at
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has few require
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign
> container,and construct a CSAssignment struct, and use
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService will tryCommit this CSAssignment,look tryCommit
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
> long commitStart = System.nanoTime();
> ResourceCommitRequest<FiCaSchedulerApp, FiCaSchedulerNode> request =
> (ResourceCommitRequest<FiCaSchedulerApp, FiCaSchedulerNode>) r;
>
> ...
> boolean isSuccess = false;
> if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null && attemptId.equals(app.getApplicationAttemptId())) {
> if (app.accept(cluster, request, updatePending)
> && app.apply(cluster, request, updatePending)) { // apply this
> resource
> ...
> }
> }
> }
> return isSuccess;
> }
> }
> {code}
> {code:java}
> public boolean apply(Resource cluster, ResourceCommitRequest<FiCaSchedulerApp,
> FiCaSchedulerNode> request, boolean updatePending) {
> ...
> if (!reReservation) {
> getCSLeafQueue().apply(cluster, request);
> }
> ...
> }
> {code}
> LeafQueue.apply invok allocateResource
> {code:java}
> void allocateResource(Resource clusterResource,
> Resource resource, String nodePartition) {
> try {
> writeLock.lock(); // only lock leaf queue lock
> queueUsage.incUsed(nodePartition, resource);
>
> ++numContainers;
>
> CSQueueUtils.updateQueueStatistics(resourceCalculator, clusterResource,
> this, labelManager, nodePartition); // there will update queue
> statistics
> } finally {
> writeLock.unlock();
> }
> }
> {code}
> we found ResourceCommitterService will only lock leaf queue to update queue
> statistics, but AsyncThread use sortAndGetChildrenAllocationIterator only
> lock queue root queue lock
> {code:java}
> ParentQueue.java
> private Iterator<CSQueue> sortAndGetChildrenAllocationIterator(
> String partition) {
> try {
> readLock.lock();
> return queueOrderingPolicy.getAssignmentIterator(partition);
> } finally {
> readLock.unlock();
> }
> }
> {code}
> so if multi async thread compare queue usage statistics and
> ResourceCommitterService apply leaf queue change statistics concurrent, will
> break TimSort algo required, and cause thread crash
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]