[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288633#comment-14288633
 ] 

Li Lu commented on YARN-3091:
-----------------------------

Maybe we want to tweak the wording/organization of this JIRA a little bit? In 
the description of this JIRA, two major points are raised:

bq. Many unnecessary synchronized locks, we have seen several cases recently 
that too frequent access of scheduler makes scheduler hang. Which could be 
addressed by using read/write lock. Components include scheduler, CS queues, 
apps
I agree that readers-writer lock is a viable approach for many synchronization 
performance issues, but other synchronization mechanisms (such as concurrent 
data structures) may also be our options. 

bq. Some fields not properly locked (Like clusterResource)
Improperly synchronized accesses may cause data races, and are generally 
considered as bugs in Java programs (even though the Java memory model provides 
some sort of guarantee on racy programs). To me, it would be better if the 
second point could be categorized as bug fixes, rather than improvements, for 
the RM scheduler code. 

Therefore, maybe we want to solve the problem by two steps: a) fixing 
improperly synchronized data accesses in RM scheduler (correctness) and b) 
improve synchronization performance for RM scheduler code (performance)? I'm 
not sure if there should be two separate JIRAs to trace this, or we can combine 
both in one "giant" JIRA. 

> [Umbrella] Improve locks of RM scheduler
> ----------------------------------------
>
>                 Key: YARN-3091
>                 URL: https://issues.apache.org/jira/browse/YARN-3091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler, fairscheduler, resourcemanager, 
> scheduler
>            Reporter: Wangda Tan
>
> In existing YARN RM scheduler, there're some issues of using locks. For 
> example:
> - Many unnecessary synchronized locks, we have seen several cases recently 
> that too frequent access of scheduler makes scheduler hang. Which could be 
> addressed by using read/write lock. Components include scheduler, CS queues, 
> apps
> - Some fields not properly locked (Like clusterResource)
> We can address them together in this ticket.
> (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to