[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'

2021-07-02 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373412#comment-17373412
 ] 

yanbin.zhang commented on YARN-10178:
-

If you turn off the preemption function and solve the problem 
?[~zhuqi][~wangda][~tuyu]

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract'
> ---
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null && attemptId.equals(app.getApplicationAttemptId())) {
>   if (app.accept(cluster, request, 

[jira] [Updated] (YARN-10838) Implement an optimised version of Configuration getPropsWithPrefix

2021-07-02 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10838:

Attachment: (was: YARN-10838.001.patch)

> Implement an optimised version of Configuration getPropsWithPrefix
> --
>
> Key: YARN-10838
> URL: https://issues.apache.org/jira/browse/YARN-10838
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10838.001.patch
>
>
> AutoCreatedQueueTemplate also has multiple call to 
> Configuration#getPropsWithPrefix. It must be eliminated in order to improve 
> the performance on reinitialisation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10838) Implement an optimised version of Configuration getPropsWithPrefix

2021-07-02 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10838:

Attachment: YARN-10838.001.patch

> Implement an optimised version of Configuration getPropsWithPrefix
> --
>
> Key: YARN-10838
> URL: https://issues.apache.org/jira/browse/YARN-10838
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10838.001.patch
>
>
> AutoCreatedQueueTemplate also has multiple call to 
> Configuration#getPropsWithPrefix. It must be eliminated in order to improve 
> the performance on reinitialisation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10838) Implement an optimised version of Configuration getPropsWithPrefix

2021-07-02 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10838:

Summary: Implement an optimised version of Configuration getPropsWithPrefix 
 (was: Improve auto queue template parsing performance)

> Implement an optimised version of Configuration getPropsWithPrefix
> --
>
> Key: YARN-10838
> URL: https://issues.apache.org/jira/browse/YARN-10838
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>
> AutoCreatedQueueTemplate also has multiple call to 
> Configuration#getPropsWithPrefix. It must be eliminated in order to improve 
> the performance on reinitialisation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org