[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288633#comment-14288633 ]
Li Lu commented on YARN-3091: ----------------------------- Maybe we want to tweak the wording/organization of this JIRA a little bit? In the description of this JIRA, two major points are raised: bq. Many unnecessary synchronized locks, we have seen several cases recently that too frequent access of scheduler makes scheduler hang. Which could be addressed by using read/write lock. Components include scheduler, CS queues, apps I agree that readers-writer lock is a viable approach for many synchronization performance issues, but other synchronization mechanisms (such as concurrent data structures) may also be our options. bq. Some fields not properly locked (Like clusterResource) Improperly synchronized accesses may cause data races, and are generally considered as bugs in Java programs (even though the Java memory model provides some sort of guarantee on racy programs). To me, it would be better if the second point could be categorized as bug fixes, rather than improvements, for the RM scheduler code. Therefore, maybe we want to solve the problem by two steps: a) fixing improperly synchronized data accesses in RM scheduler (correctness) and b) improve synchronization performance for RM scheduler code (performance)? I'm not sure if there should be two separate JIRAs to trace this, or we can combine both in one "giant" JIRA. > [Umbrella] Improve locks of RM scheduler > ---------------------------------------- > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler > Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)