[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'

2021-12-10 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457519#comment-17457519
 ] 

Qi Zhu commented on YARN-10178:
---

Hi [~epayne] [~gandras] , thanks for looking into this problem, [~gandras]  you 
can feel free to assign this to yourself, i have no free time recently, i used 
to test  the latest patch, thanks a lot.

 

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract'
> ---
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null 

[jira] [Updated] (YARN-11034) Add enhanced headroom in AllocateResponse

2021-12-10 Thread Minni Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11034:

Description: Add enhanced headroom in allocate response. This provides a 
channel for RMs to return load information for AMRMProxy and decision making 
when rerouting resource requests.   (was: Add enhanced headroom in allocate 
response. This provides a channel for RMs to return load information for 
AMRMProxy and decision making when rerouting resource requests.)

> Add enhanced headroom in AllocateResponse
> -
>
> Key: YARN-11034
> URL: https://issues.apache.org/jira/browse/YARN-11034
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add enhanced headroom in allocate response. This provides a channel for RMs 
> to return load information for AMRMProxy and decision making when rerouting 
> resource requests. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11033) isAbsoluteResource is not correct for dynamically created queues

2021-12-10 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457165#comment-17457165
 ] 

Szilard Nemeth commented on YARN-11033:
---

Hi [~tdomok],
Just committed your patch to trunk.
Could you please check whether it's required to backport this to branch-3.3 / 
branch-3.2?
Thanks.

> isAbsoluteResource is not correct for dynamically created queues
> 
>
> Key: YARN-11033
> URL: https://issues.apache.org/jira/browse/YARN-11033
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The property *isAbsoluteResource* was added in YARN-10237 to the scheduler 
> response, it uses pattern matching on the capacity configuration value, but 
> for dynamically created queues (using legacy AQC) the capacity configuration 
> is not available like that.
> The *AbstractCSQueue.getCapacityConfigType()* can be used to determine 
> whether it is absolute resource or not.
> The *isAbsoluteResource* property was not added to the root queue, that 
> should be fixed for consistency too. E.g.: the *mode* property is added for 
> the root and for other queues too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11033) isAbsoluteResource is not correct for dynamically created queues

2021-12-10 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-11033:
--
Fix Version/s: 3.4.0

> isAbsoluteResource is not correct for dynamically created queues
> 
>
> Key: YARN-11033
> URL: https://issues.apache.org/jira/browse/YARN-11033
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The property *isAbsoluteResource* was added in YARN-10237 to the scheduler 
> response, it uses pattern matching on the capacity configuration value, but 
> for dynamically created queues (using legacy AQC) the capacity configuration 
> is not available like that.
> The *AbstractCSQueue.getCapacityConfigType()* can be used to determine 
> whether it is absolute resource or not.
> The *isAbsoluteResource* property was not added to the root queue, that 
> should be fixed for consistency too. E.g.: the *mode* property is added for 
> the root and for other queues too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11042) Fix testQueueSubmitWithACLsEnabledWithQueueMapping in TestAppManager

2021-12-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11042:
--
Labels: pull-request-available  (was: )

> Fix testQueueSubmitWithACLsEnabledWithQueueMapping in TestAppManager
> 
>
> Key: YARN-11042
> URL: https://issues.apache.org/jira/browse/YARN-11042
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unfortunately I changed the application submit context from *oldQueue* to 
> *test* in YARN-11038 in on of the two test cases. It should be oldQueue, so 
> the placement manager is tested as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11042) Fix testQueueSubmitWithACLsEnabledWithQueueMapping in TestAppManager

2021-12-10 Thread Tamas Domok (Jira)
Tamas Domok created YARN-11042:
--

 Summary: Fix testQueueSubmitWithACLsEnabledWithQueueMapping in 
TestAppManager
 Key: YARN-11042
 URL: https://issues.apache.org/jira/browse/YARN-11042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Tamas Domok
Assignee: Tamas Domok


Unfortunately I changed the application submit context from *oldQueue* to 
*test* in YARN-11038 in on of the two test cases. It should be oldQueue, so the 
placement manager is tested as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10880) nodelabels update log is too noisy

2021-12-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10880:
--
Labels: pull-request-available  (was: )

> nodelabels update log is too noisy
> --
>
> Key: YARN-10880
> URL: https://issues.apache.org/jira/browse/YARN-10880
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: LuoGe
>Priority: Minor
>  Labels: pull-request-available
> Attachments: wx20210806-093...@2x.png, YARN-10880.001.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> when use YARN *Distributed* NodeLabel setup, every time the node update, RM 
> will print INFO log “INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: No Modified Node 
> label Mapping to replace”,the log is too noisy, see the attachment pic, so 
> can we just change to DEBUG or remove it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10880) nodelabels update log is too noisy

2021-12-10 Thread LuoGe (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LuoGe updated YARN-10880:
-
Summary: nodelabels update log is too noisy  (was: nodelabels update log is 
to noisy)

> nodelabels update log is too noisy
> --
>
> Key: YARN-10880
> URL: https://issues.apache.org/jira/browse/YARN-10880
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: LuoGe
>Priority: Minor
> Attachments: wx20210806-093...@2x.png, YARN-10880.001.patch
>
>
> when use YARN *Distributed* NodeLabel setup, every time the node update, RM 
> will print INFO log “INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: No Modified Node 
> label Mapping to replace”,the log is too noisy, see the attachment pic, so 
> can we just change to DEBUG or remove it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org