[ 
https://issues.apache.org/jira/browse/YARN-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947594#comment-15947594
 ] 

Yufei Gu commented on YARN-6407:
--------------------------------

Hi [~zhengchenyu], thanks for filing this jira. IIUC, you reduced frequency of 
NM node update to avoid flooding the network in a 5k nodes cluster, but 
Continuous Scheduling is not necessary when there are still enough node update 
events in the clusters. Besides the improvement of lock in FS, we can always 
balance time interval of continuous scheduling  and frequency of NM node update 
to get better scheduling latency.

> Improve and fix locks of RM scheduler
> -------------------------------------
>
>                 Key: YARN-6407
>                 URL: https://issues.apache.org/jira/browse/YARN-6407
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.1
>         Environment: CentOS 7, 1 Gigabit Ethernet
>            Reporter: zhengchenyu
>             Fix For: 2.7.1
>
>   Original Estimate: 2m
>  Remaining Estimate: 2m
>
> First,this issue dose not duplicate the YARN-3091.
> In our cluster, we have 5k nodes, and the server is configured with 1 Gigabit 
> Ethernet. So network is bottleneck in our cluster.
> We must distcp data from warehouse, because of 1 Gigabit Ethernet, we must 
> set yarn.scheduler.fair.max.assign to 5, or must lead to hotspot.
> The setting that max.assign is 5 lead to the assigned ability decreased. So 
> we start the ContinuousSchedulingThread. 
> As more applicaitons running in our cluster , and with 
> ContinuousSchedulingThread, the problem of lock contention is more serious. 
> In our cluster, the callqueue of ApplicationMasterSeriver's rpc is high 
> occasionally. we worried that more problem occure in future with more 
> application are running.
> Here is our logical graph:
> "1 Gigabit Ethernet" and "data hot spot" ==> "set 
> yarn.scheduler.fair.max.assign to 5" ==> "ContinuousSchedulingThread is 
> started" and "more applcations" => "lock contention"
> I know YARN-3091 solved this problem, but the patch aims that change the 
> object lock to read write lock. This change is still Coarse-Grained. So I 
> think we lock the resources or not lock the large section code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to