[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288878#comment-14288878 ] Tsuyoshi OZAWA commented on YARN-3091: -- Adding HADOOP-9213 as a related issue which adds a support for jCarder with Jenkins CI. > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288856#comment-14288856 ] Sunil G commented on YARN-3091: --- As per code review, we can get more issues in these areas mentioned by [~leftnoteasy]. I feel a task we can start to run Jcarder to clearly pinpoint the locks problems. So it can help us in designing these subtasks with more clarity and may be helpful in verifying these changes again. > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288755#comment-14288755 ] Li Lu commented on YARN-3091: - I agree we should review the exclude list for potential synchronization problems. However note that findbugs uses static analysis to analyze Java source code, which may introduce both false positives and false negatives when detecting concurrency related bugs. In long term we may want to consider other tools to help detect improper synchronization (although a perfect solution would be hard). For the short term, I think [~leftnoteasy] raised a very valid point (this JIRA) and let's the the problems solved. > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288746#comment-14288746 ] Rohith commented on YARN-3091: -- bq. fixing improperly synchronized data accesses in RM scheduler (correctness) currently findbug exclude xml mask these warnings like IS2_INCONSISTENT_SYNC. I believe these exclude lists are reviewed and now assumptions like a class expected to be thread-safe. Recently had discussion on this in community [Discussion thread|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201412.mbox/%3CCALwhT97BqK_zjQ=MCO_c=Y=7r9ewLN2Ab_qm=vqekvxgzrq...@mail.gmail.com%3E] For identifying 1st level of problems, I think enabling these findbug type would help in better way. > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288633#comment-14288633 ] Li Lu commented on YARN-3091: - Maybe we want to tweak the wording/organization of this JIRA a little bit? In the description of this JIRA, two major points are raised: bq. Many unnecessary synchronized locks, we have seen several cases recently that too frequent access of scheduler makes scheduler hang. Which could be addressed by using read/write lock. Components include scheduler, CS queues, apps I agree that readers-writer lock is a viable approach for many synchronization performance issues, but other synchronization mechanisms (such as concurrent data structures) may also be our options. bq. Some fields not properly locked (Like clusterResource) Improperly synchronized accesses may cause data races, and are generally considered as bugs in Java programs (even though the Java memory model provides some sort of guarantee on racy programs). To me, it would be better if the second point could be categorized as bug fixes, rather than improvements, for the RM scheduler code. Therefore, maybe we want to solve the problem by two steps: a) fixing improperly synchronized data accesses in RM scheduler (correctness) and b) improve synchronization performance for RM scheduler code (performance)? I'm not sure if there should be two separate JIRAs to trace this, or we can combine both in one "giant" JIRA. > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288460#comment-14288460 ] Varun Saxena commented on YARN-3091: Ok... > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288446#comment-14288446 ] Wangda Tan commented on YARN-3091: -- I think may it can be a part of separated fine-grained-lock-enhancement-for-FairScheduler (If there's other similar fine-grained changes needed)? To keep every patches to be easier reviewed, the {{AbstractYarnScheduler - CapacityScheduler - FairScheduler}} could address only general synchronized lock -> r/w lock. > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288421#comment-14288421 ] Varun Saxena commented on YARN-3091: Yeah. I meant YARN-3008 can probably made one subtask of this. It can address this part : AbstractYarnScheduler - CapacityScheduler - FairScheduler > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288417#comment-14288417 ] Wangda Tan commented on YARN-3091: -- Thanks [~varun_saxena] pointing this, I think such fine-grained locking enhancement should also be included in this umbrella ticket. This JIRA is intended to track scheduler lock improvements, not only for one special scheduler type. > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288412#comment-14288412 ] Varun Saxena commented on YARN-3091: That one is for FairScheduler > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288411#comment-14288411 ] Varun Saxena commented on YARN-3091: Similar to YARN-3008 ? Maybe that can be linked to this. > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288400#comment-14288400 ] Wangda Tan commented on YARN-3091: -- Since some classes hierarchy are across module (like AbstractYarnScheduler inheriented by CS and fair). I suggest to make sub tasks class-family-wise. What I proposed for sub tasks are # AbstractYarnScheduler - CapacityScheduler - FairScheduler # SchedulerApplicationAttempt - FiCaSchedulerApp - FSAppAttempt # AbstractCSQueue - ParentQueue - LeafQueue # AppSchedulingInfo Hope to get your thoughts on this, if you agree, I will go ahead and create sub-tickets. Thanks, Wangda > [Umbrella] Improve locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)