[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799746#comment-17799746 ] yanbin.zhang commented on YARN-7592: [~slfan1989] Thank you for your prompt reply. > yarn.federation.failover.enabled missing in yarn-default.xml > > > Key: YARN-7592 > URL: https://issues.apache.org/jira/browse/YARN-7592 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1 >Reporter: Gera Shegalov >Priority: Major > Attachments: IssueReproduce.patch > > > yarn.federation.failover.enabled should be documented in yarn-default.xml. I > am also not sure why it should be true by default and force the HA retry > policy in {{RMProxy#createRMProxy}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799725#comment-17799725 ] yanbin.zhang commented on YARN-7592: [~slfan1989] Do you have any thoughts on this? This bug seems to have not been resolved yet. > yarn.federation.failover.enabled missing in yarn-default.xml > > > Key: YARN-7592 > URL: https://issues.apache.org/jira/browse/YARN-7592 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1 >Reporter: Gera Shegalov >Priority: Major > Attachments: IssueReproduce.patch > > > yarn.federation.failover.enabled should be documented in yarn-default.xml. I > am also not sure why it should be true by default and force the HA retry > policy in {{RMProxy#createRMProxy}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11633) [Federation] Improve LoadBasedRouterPolicy To Use Available vcores
yanbin.zhang created YARN-11633: --- Summary: [Federation] Improve LoadBasedRouterPolicy To Use Available vcores Key: YARN-11633 URL: https://issues.apache.org/jira/browse/YARN-11633 Project: Hadoop YARN Issue Type: Improvement Components: federation Affects Versions: 3.3.6 Reporter: yanbin.zhang When selecting a subcluster, consider not only available memory but also available vcore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11624) CapacityScheduler: Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yanbin.zhang updated YARN-11624: Description: Disable AM-preemption for CapacityScheduler, like FairScheduler: -YARN-9537- (was: Disable AM-preemption for CapacityScheduler like fair-scheduler: -YARN-9537-) > CapacityScheduler: Add configuration to disable AM preemption > - > > Key: YARN-11624 > URL: https://issues.apache.org/jira/browse/YARN-11624 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: yanbin.zhang >Priority: Major > > Disable AM-preemption for CapacityScheduler, like FairScheduler: -YARN-9537- -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11624) CapacityScheduler: Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yanbin.zhang updated YARN-11624: Description: Like FairScheduler feature: YARN-9537, for CapacityScheduler to disable AM-preemption. (was: Like FairScheduler feature: YARN-9537, add global flag for CapacityScheduler to disable AM-preemption.) > CapacityScheduler: Add configuration to disable AM preemption > - > > Key: YARN-11624 > URL: https://issues.apache.org/jira/browse/YARN-11624 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: yanbin.zhang >Priority: Major > > Like FairScheduler feature: YARN-9537, for CapacityScheduler to disable > AM-preemption. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11624) CapacityScheduler: Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yanbin.zhang updated YARN-11624: Description: Like FairScheduler feature: YARN-9537, add global flag for CapacityScheduler to disable AM-preemption. (was: Like FairScheduler feature: YARN-10625, add global flag for CapacityScheduler to disable AM-preemption.) > CapacityScheduler: Add configuration to disable AM preemption > - > > Key: YARN-11624 > URL: https://issues.apache.org/jira/browse/YARN-11624 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: yanbin.zhang >Priority: Major > > Like FairScheduler feature: YARN-9537, add global flag for CapacityScheduler > to disable AM-preemption. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11624) CapacityScheduler: Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yanbin.zhang updated YARN-11624: Summary: CapacityScheduler: Add configuration to disable AM preemption (was: CapacityScheduler: add global flag to disable AM-preemption) > CapacityScheduler: Add configuration to disable AM preemption > - > > Key: YARN-11624 > URL: https://issues.apache.org/jira/browse/YARN-11624 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: yanbin.zhang >Priority: Major > > Like FairScheduler feature: YARN-10625, add global flag for CapacityScheduler > to disable AM-preemption. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11624) CapacityScheduler: add global flag to disable AM-preemption
yanbin.zhang created YARN-11624: --- Summary: CapacityScheduler: add global flag to disable AM-preemption Key: YARN-11624 URL: https://issues.apache.org/jira/browse/YARN-11624 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler Reporter: yanbin.zhang Like FairScheduler feature: YARN-10625, add global flag for CapacityScheduler to disable AM-preemption. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792755#comment-17792755 ] yanbin.zhang commented on YARN-5: - Take it up. > Add configuration to disable AM preemption for capacity scheduler > - > > Key: YARN-5 > URL: https://issues.apache.org/jira/browse/YARN-5 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yuan Luo >Assignee: Ashutosh Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > I think it's necessary to add configuration to disable AM preemption for > capacity-scheduler, like fair-scheduler feature: YARN-9537. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11623) FairScheduler: Document AM preemption related changes (YARN-9537 and YARN-10625)
yanbin.zhang created YARN-11623: --- Summary: FairScheduler: Document AM preemption related changes (YARN-9537 and YARN-10625) Key: YARN-11623 URL: https://issues.apache.org/jira/browse/YARN-11623 Project: Hadoop YARN Issue Type: Task Components: fairscheduler Reporter: yanbin.zhang Extend the documentation with these enhancements about YARN-9537 and YARN-10625. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10631) Document AM preemption related changes (YARN-9537 and YARN-10625)
[ https://issues.apache.org/jira/browse/YARN-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792656#comment-17792656 ] yanbin.zhang commented on YARN-10631: - take it up > Document AM preemption related changes (YARN-9537 and YARN-10625) > - > > Key: YARN-10631 > URL: https://issues.apache.org/jira/browse/YARN-10631 > Project: Hadoop YARN > Issue Type: Task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Preemption-related changes were introduced in YARN-9537 and YARN-10625. > These also introduce new properties which are not documented for Fair > Scheduler. Extend the documentation with these enhancements. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] (YARN-10900) Yarn nodes missing from router web app
[ https://issues.apache.org/jira/browse/YARN-10900 ] yanbin.zhang deleted comment on YARN-10900: - was (Author: it_singer): After enabling YARN Federation, I executed the command {code:java} yarn jar hadoop-mapreduce-examples-3.3.0.jar pi 16 1000{code} , and used services across other sub-clusters in my cluster, and reported an error 'Invalid AMRMToken from appattempt_xxx'. I don't know about you. How is it configured? My version is 3.3.0. [~Babbleshack] [~zhangjunj] > Yarn nodes missing from router web app > -- > > Key: YARN-10900 > URL: https://issues.apache.org/jira/browse/YARN-10900 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router >Affects Versions: 3.2.1 >Reporter: Babble Shack >Priority: Major > Fix For: 3.4.0 > > > {color:#172b4d}Hi, > I am trying to configure YARN Federation mode.I seem to be able to schedule > to all nodes in my federation across each of my subclusters. > > However my federation router shows both of my subclusters, but nodes from > only a single cluster.{color} > > > {color:#ff}[!https://preview.redd.it/hdjwtn43ptj71.png?width=1437=png=webp=2d55343688c0de7a6f3da629e334cd318219c392!|https://preview.redd.it/hdjwtn43ptj71.png?width=1437=png=webp=2d55343688c0de7a6f3da629e334cd318219c392]{color} > {color:#172b4d}Federation Page – Showing both clusters and both nodes{color} > {color:#172b4d} > This page is showing both of my clusters, configured with a single <8 CPU, > 7GB> node.{color} > {color:#172b4d}However the "Nodes" and "About" pages are invalid.{color} > > > [!https://preview.redd.it/lawgst2yotj71.png?width=1373=png=webp=d06663904538bc993418c6184c3686cc2b02ea6e!|https://preview.redd.it/lawgst2yotj71.png?width=1373=png=webp=d06663904538bc993418c6184c3686cc2b02ea6e] > {color:#172b4d}Nodes Page – showing nodes from only one cluster{color} > > > [!https://preview.redd.it/dtuqblquotj71.png?width=482=png=webp=df740ff49df8b8de5015bccc936c9accee65aee0!|https://preview.redd.it/dtuqblquotj71.png?width=482=png=webp=df740ff49df8b8de5015bccc936c9accee65aee0] > {color:#172b4d}About Page – showing nodes from only one cluster{color} > {color:#172b4d}Each node is configured as follows:{color} > {color:#172b4d}*Minimum Memory Allocation:* 512MB{color} > {color:#172b4d}*Minimum CPU Allocation:* 1{color} > {color:#172b4d}*Maximum Memory Allocation:* 7168{color} > {color:#172b4d}*Maximum CPU Allocation:* 7{color} > {color:#172b4d}Federation configuration can be found at this > [link|https://drive.google.com/file/d/16xc2V7CvJLVQgsaDEOHhIaDrz5dnKxB_/view?usp=sharing]{color} > {color:#172b4d}Has anyone had an issue like this before, does anyone have any > solutions?{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10900) Yarn nodes missing from router web app
[ https://issues.apache.org/jira/browse/YARN-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788666#comment-17788666 ] yanbin.zhang commented on YARN-10900: - After enabling YARN Federation, I executed the command {code:java} yarn jar hadoop-mapreduce-examples-3.3.0.jar pi 16 1000{code} , and used services across other sub-clusters in my cluster, and reported an error 'Invalid AMRMToken from appattempt_xxx'. I don't know about you. How is it configured? My version is 3.3.0. [~Babbleshack] [~zhangjunj] > Yarn nodes missing from router web app > -- > > Key: YARN-10900 > URL: https://issues.apache.org/jira/browse/YARN-10900 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router >Affects Versions: 3.2.1 >Reporter: Babble Shack >Priority: Major > Fix For: 3.4.0 > > > {color:#172b4d}Hi, > I am trying to configure YARN Federation mode.I seem to be able to schedule > to all nodes in my federation across each of my subclusters. > > However my federation router shows both of my subclusters, but nodes from > only a single cluster.{color} > > > {color:#ff}[!https://preview.redd.it/hdjwtn43ptj71.png?width=1437=png=webp=2d55343688c0de7a6f3da629e334cd318219c392!|https://preview.redd.it/hdjwtn43ptj71.png?width=1437=png=webp=2d55343688c0de7a6f3da629e334cd318219c392]{color} > {color:#172b4d}Federation Page – Showing both clusters and both nodes{color} > {color:#172b4d} > This page is showing both of my clusters, configured with a single <8 CPU, > 7GB> node.{color} > {color:#172b4d}However the "Nodes" and "About" pages are invalid.{color} > > > [!https://preview.redd.it/lawgst2yotj71.png?width=1373=png=webp=d06663904538bc993418c6184c3686cc2b02ea6e!|https://preview.redd.it/lawgst2yotj71.png?width=1373=png=webp=d06663904538bc993418c6184c3686cc2b02ea6e] > {color:#172b4d}Nodes Page – showing nodes from only one cluster{color} > > > [!https://preview.redd.it/dtuqblquotj71.png?width=482=png=webp=df740ff49df8b8de5015bccc936c9accee65aee0!|https://preview.redd.it/dtuqblquotj71.png?width=482=png=webp=df740ff49df8b8de5015bccc936c9accee65aee0] > {color:#172b4d}About Page – showing nodes from only one cluster{color} > {color:#172b4d}Each node is configured as follows:{color} > {color:#172b4d}*Minimum Memory Allocation:* 512MB{color} > {color:#172b4d}*Minimum CPU Allocation:* 1{color} > {color:#172b4d}*Maximum Memory Allocation:* 7168{color} > {color:#172b4d}*Maximum CPU Allocation:* 7{color} > {color:#172b4d}Federation configuration can be found at this > [link|https://drive.google.com/file/d/16xc2V7CvJLVQgsaDEOHhIaDrz5dnKxB_/view?usp=sharing]{color} > {color:#172b4d}Has anyone had an issue like this before, does anyone have any > solutions?{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11604) Fix code annotation errors such as class DefaultClientRequestInterceptor
[ https://issues.apache.org/jira/browse/YARN-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yanbin.zhang updated YARN-11604: Description: Fix code annotation errors such as class DefaultClientRequestInterceptor: !image-2023-11-02-10-46-50-762.png! was:Fix code annotation errors such as class DefaultClientRequestInterceptor > Fix code annotation errors such as class DefaultClientRequestInterceptor > > > Key: YARN-11604 > URL: https://issues.apache.org/jira/browse/YARN-11604 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.3.6 >Reporter: yanbin.zhang >Priority: Trivial > Attachments: image-2023-11-02-10-46-50-762.png > > > Fix code annotation errors such as class DefaultClientRequestInterceptor: > !image-2023-11-02-10-46-50-762.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11604) Fix code annotation errors such as class DefaultClientRequestInterceptor
[ https://issues.apache.org/jira/browse/YARN-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yanbin.zhang updated YARN-11604: Attachment: image-2023-11-02-10-46-50-762.png > Fix code annotation errors such as class DefaultClientRequestInterceptor > > > Key: YARN-11604 > URL: https://issues.apache.org/jira/browse/YARN-11604 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.3.6 >Reporter: yanbin.zhang >Priority: Trivial > Attachments: image-2023-11-02-10-46-50-762.png > > > Fix code annotation errors such as class DefaultClientRequestInterceptor -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11604) Fix code annotation errors such as class DefaultClientRequestInterceptor
yanbin.zhang created YARN-11604: --- Summary: Fix code annotation errors such as class DefaultClientRequestInterceptor Key: YARN-11604 URL: https://issues.apache.org/jira/browse/YARN-11604 Project: Hadoop YARN Issue Type: Bug Components: federation Affects Versions: 3.3.6 Reporter: yanbin.zhang Fix code annotation errors such as class DefaultClientRequestInterceptor -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11600) After jetty is upgraded to 9.4.51.v20230217, sls cannot load js/css
[ https://issues.apache.org/jira/browse/YARN-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780265#comment-17780265 ] yanbin.zhang commented on YARN-11600: - Thank you very much [~zuston] > After jetty is upgraded to 9.4.51.v20230217, sls cannot load js/css > --- > > Key: YARN-11600 > URL: https://issues.apache.org/jira/browse/YARN-11600 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: yanbin.zhang >Priority: Major > Attachments: image-2023-10-26-09-52-30-975.png > > > !image-2023-10-26-09-52-30-975.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11600) After jetty is upgraded to 9.4.51.v20230217, sls cannot load js/css
yanbin.zhang created YARN-11600: --- Summary: After jetty is upgraded to 9.4.51.v20230217, sls cannot load js/css Key: YARN-11600 URL: https://issues.apache.org/jira/browse/YARN-11600 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: yanbin.zhang Attachments: image-2023-10-26-09-52-30-975.png !image-2023-10-26-09-52-30-975.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11591) Fix some wrong symbols in Federation.md
yanbin.zhang created YARN-11591: --- Summary: Fix some wrong symbols in Federation.md Key: YARN-11591 URL: https://issues.apache.org/jira/browse/YARN-11591 Project: Hadoop YARN Issue Type: Improvement Reporter: yanbin.zhang Fix some wrong symbols in Federation.md -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374442#comment-17374442 ] yanbin.zhang commented on YARN-10178: - Can the problem be solved if the preemption function is turned off? [~zhuqi] [~tuyu] [~wangda] [~pbacsko] > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch, > YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = > (ResourceCommitRequest) r; > > ... > boolean isSuccess = false; > if (attemptId != null) { > FiCaSchedulerApp app = getApplicationAttempt(attemptId); > // Required sanity check for attemptId - when async-scheduling enabled, > // proposal might be outdated if AM failover just finished > // and proposal queue was not be consumed in time > if (app != null && attemptId.equals(app.getApplicationAttemptId())) { > if (app.accept(cluster,
[jira] [Issue Comment Deleted] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yanbin.zhang updated YARN-10178: Comment: was deleted (was: If you turn off the preemption function and solve the problem ?[~zhuqi][~wangda][~tuyu]) > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch, > YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = > (ResourceCommitRequest) r; > > ... > boolean isSuccess = false; > if (attemptId != null) { > FiCaSchedulerApp app = getApplicationAttempt(attemptId); > // Required sanity check for attemptId - when async-scheduling enabled, > // proposal might be outdated if AM failover just finished > // and proposal queue was not be consumed in time > if (app != null && attemptId.equals(app.getApplicationAttemptId())) { > if (app.accept(cluster, request, updatePending) >
[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373412#comment-17373412 ] yanbin.zhang commented on YARN-10178: - If you turn off the preemption function and solve the problem ?[~zhuqi][~wangda][~tuyu] > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch, > YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = > (ResourceCommitRequest) r; > > ... > boolean isSuccess = false; > if (attemptId != null) { > FiCaSchedulerApp app = getApplicationAttempt(attemptId); > // Required sanity check for attemptId - when async-scheduling enabled, > // proposal might be outdated if AM failover just finished > // and proposal queue was not be consumed in time > if (app != null && attemptId.equals(app.getApplicationAttemptId())) { > if (app.accept(cluster, request,