[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-5188:
------------------------------
    Comment: was deleted

(was: the dead lock is same with Yarn-4090. getQueueUserAcls got the object 
lock of root and root.Parent, and waits for root.Parent.Child. But 
decResourceUsage got the object lock of root.Parent.Child, and waits for 
root.Parnt. That's a deadlock.)

> FairScheduler performance bug
> -----------------------------
>
>                 Key: YARN-5188
>                 URL: https://issues.apache.org/jira/browse/YARN-5188
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.5.0
>            Reporter: ChenFolin
>         Attachments: YARN-5188-1.patch
>
>
>  My Hadoop Cluster has recently encountered a performance problem. Details as 
> Follows.
> There are two point which can cause this performance issue.
> 1: application sort before assign container at FSLeafQueue. TreeSet is not 
> the best, Why not keep orderly ? and then we can use binary search to help 
> keep orderly when a application's resource usage has changed.
> 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue 
> resource usage ,Why can we store the leafqueue usage at memory and update it 
> when assign container op release container happen?
>        
>        The efficiency of assign container in the Resourcemanager may fall 
> when the number of running and pending application grows. And the fact is the 
> cluster has too many PendingMB or PengdingVcore , and the Cluster 
> current utilization rate may below 20%.
>        I checked the resourcemanager logs, I found that every assign 
> container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time.
>      
>        I use TestFairScheduler to reproduce the scene:
>  
>        Just one queue: root.defalut
>      10240 apps.
>  
>        assign container avg time:  6753.9 us ( 6.7539 ms)  
>      apps sort time (FSLeafQueue : Collections.sort(runnableApps, 
> comparator); ): 4657.01 us ( 4.657 ms )
>      compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms )
>      
>      When just root.default, one assign container op contains : ( one apps 
> sort op ) + 2 * ( compute leafqueue usage op )
>        According to the above situation, I think the assign container op has 
> a performance problem  . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to