[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457519#comment-17457519 ] Qi Zhu commented on YARN-10178: --- Hi [~epayne] [~gandras] , thanks for looking into this problem, [~gandras] you can feel free to assign this to yourself, i have no free time recently, i used to test the latest patch, thanks a lot. > Global Scheduler async thread crash caused by 'Comparison method violates its > general contract' > --- > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10178.001.patch, YARN-10178.002.patch, > YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch > > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommitRequest function add CSAssignment to backlogs > ResourceCommitterService will tryCommit this CSAssignment,look tryCommit > function,there will update queue resource usage > {code:java} > public boolean tryCommit(Resource cluster, ResourceCommitRequest r, > boolean updatePending) { > long commitStart = System.nanoTime(); > ResourceCommitRequest request = > (ResourceCommitRequest) r; > > ... > boolean isSuccess = false; > if (attemptId != null) { > FiCaSchedulerApp app = getApplicationAttempt(attemptId); > // Required sanity check for attemptId - when async-scheduling enabled, > // proposal might be outdated if AM failover just finished > // and proposal queue was not be consumed in time > if (app != null
[jira] [Updated] (YARN-11034) Add enhanced headroom in AllocateResponse
[ https://issues.apache.org/jira/browse/YARN-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11034: Description: Add enhanced headroom in allocate response. This provides a channel for RMs to return load information for AMRMProxy and decision making when rerouting resource requests. (was: Add enhanced headroom in allocate response. This provides a channel for RMs to return load information for AMRMProxy and decision making when rerouting resource requests.) > Add enhanced headroom in AllocateResponse > - > > Key: YARN-11034 > URL: https://issues.apache.org/jira/browse/YARN-11034 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Add enhanced headroom in allocate response. This provides a channel for RMs > to return load information for AMRMProxy and decision making when rerouting > resource requests. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11033) isAbsoluteResource is not correct for dynamically created queues
[ https://issues.apache.org/jira/browse/YARN-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457165#comment-17457165 ] Szilard Nemeth commented on YARN-11033: --- Hi [~tdomok], Just committed your patch to trunk. Could you please check whether it's required to backport this to branch-3.3 / branch-3.2? Thanks. > isAbsoluteResource is not correct for dynamically created queues > > > Key: YARN-11033 > URL: https://issues.apache.org/jira/browse/YARN-11033 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > The property *isAbsoluteResource* was added in YARN-10237 to the scheduler > response, it uses pattern matching on the capacity configuration value, but > for dynamically created queues (using legacy AQC) the capacity configuration > is not available like that. > The *AbstractCSQueue.getCapacityConfigType()* can be used to determine > whether it is absolute resource or not. > The *isAbsoluteResource* property was not added to the root queue, that > should be fixed for consistency too. E.g.: the *mode* property is added for > the root and for other queues too. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11033) isAbsoluteResource is not correct for dynamically created queues
[ https://issues.apache.org/jira/browse/YARN-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11033: -- Fix Version/s: 3.4.0 > isAbsoluteResource is not correct for dynamically created queues > > > Key: YARN-11033 > URL: https://issues.apache.org/jira/browse/YARN-11033 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The property *isAbsoluteResource* was added in YARN-10237 to the scheduler > response, it uses pattern matching on the capacity configuration value, but > for dynamically created queues (using legacy AQC) the capacity configuration > is not available like that. > The *AbstractCSQueue.getCapacityConfigType()* can be used to determine > whether it is absolute resource or not. > The *isAbsoluteResource* property was not added to the root queue, that > should be fixed for consistency too. E.g.: the *mode* property is added for > the root and for other queues too. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11042) Fix testQueueSubmitWithACLsEnabledWithQueueMapping in TestAppManager
[ https://issues.apache.org/jira/browse/YARN-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11042: -- Labels: pull-request-available (was: ) > Fix testQueueSubmitWithACLsEnabledWithQueueMapping in TestAppManager > > > Key: YARN-11042 > URL: https://issues.apache.org/jira/browse/YARN-11042 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Unfortunately I changed the application submit context from *oldQueue* to > *test* in YARN-11038 in on of the two test cases. It should be oldQueue, so > the placement manager is tested as well. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11042) Fix testQueueSubmitWithACLsEnabledWithQueueMapping in TestAppManager
Tamas Domok created YARN-11042: -- Summary: Fix testQueueSubmitWithACLsEnabledWithQueueMapping in TestAppManager Key: YARN-11042 URL: https://issues.apache.org/jira/browse/YARN-11042 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Tamas Domok Assignee: Tamas Domok Unfortunately I changed the application submit context from *oldQueue* to *test* in YARN-11038 in on of the two test cases. It should be oldQueue, so the placement manager is tested as well. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10880) nodelabels update log is too noisy
[ https://issues.apache.org/jira/browse/YARN-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-10880: -- Labels: pull-request-available (was: ) > nodelabels update log is too noisy > -- > > Key: YARN-10880 > URL: https://issues.apache.org/jira/browse/YARN-10880 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: LuoGe >Priority: Minor > Labels: pull-request-available > Attachments: wx20210806-093...@2x.png, YARN-10880.001.patch > > Time Spent: 10m > Remaining Estimate: 0h > > when use YARN *Distributed* NodeLabel setup, every time the node update, RM > will print INFO log “INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: No Modified Node > label Mapping to replace”,the log is too noisy, see the attachment pic, so > can we just change to DEBUG or remove it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10880) nodelabels update log is too noisy
[ https://issues.apache.org/jira/browse/YARN-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LuoGe updated YARN-10880: - Summary: nodelabels update log is too noisy (was: nodelabels update log is to noisy) > nodelabels update log is too noisy > -- > > Key: YARN-10880 > URL: https://issues.apache.org/jira/browse/YARN-10880 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: LuoGe >Priority: Minor > Attachments: wx20210806-093...@2x.png, YARN-10880.001.patch > > > when use YARN *Distributed* NodeLabel setup, every time the node update, RM > will print INFO log “INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: No Modified Node > label Mapping to replace”,the log is too noisy, see the attachment pic, so > can we just change to DEBUG or remove it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org