[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288446#comment-14288446
 ] 

Wangda Tan commented on YARN-3091:
--

I think may it can be a part of separated 
fine-grained-lock-enhancement-for-FairScheduler (If there's other similar 
fine-grained changes needed)? To keep every patches to be easier reviewed, the 
{{AbstractYarnScheduler - CapacityScheduler - FairScheduler}} could address 
only general synchronized lock - r/w lock.

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288633#comment-14288633
 ] 

Li Lu commented on YARN-3091:
-

Maybe we want to tweak the wording/organization of this JIRA a little bit? In 
the description of this JIRA, two major points are raised:

bq. Many unnecessary synchronized locks, we have seen several cases recently 
that too frequent access of scheduler makes scheduler hang. Which could be 
addressed by using read/write lock. Components include scheduler, CS queues, 
apps
I agree that readers-writer lock is a viable approach for many synchronization 
performance issues, but other synchronization mechanisms (such as concurrent 
data structures) may also be our options. 

bq. Some fields not properly locked (Like clusterResource)
Improperly synchronized accesses may cause data races, and are generally 
considered as bugs in Java programs (even though the Java memory model provides 
some sort of guarantee on racy programs). To me, it would be better if the 
second point could be categorized as bug fixes, rather than improvements, for 
the RM scheduler code. 

Therefore, maybe we want to solve the problem by two steps: a) fixing 
improperly synchronized data accesses in RM scheduler (correctness) and b) 
improve synchronization performance for RM scheduler code (performance)? I'm 
not sure if there should be two separate JIRAs to trace this, or we can combine 
both in one giant JIRA. 

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288746#comment-14288746
 ] 

Rohith commented on YARN-3091:
--

bq. fixing improperly synchronized data accesses in RM scheduler (correctness)
currently findbug exclude xml mask these warnings like IS2_INCONSISTENT_SYNC. I 
believe these exclude lists are reviewed and now assumptions like a class 
expected to be thread-safe. Recently had discussion on this in community 
[Discussion 
thread|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201412.mbox/%3CCALwhT97BqK_zjQ=MCO_c=Y=7r9ewLN2Ab_qm=vqekvxgzrq...@mail.gmail.com%3E]

For identifying 1st level of problems, I think enabling these findbug type 
would help in better way.

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288755#comment-14288755
 ] 

Li Lu commented on YARN-3091:
-

I agree we should review the exclude list for potential synchronization 
problems. However note that findbugs uses static analysis to analyze Java 
source code, which may introduce both false positives and false negatives when 
detecting concurrency related bugs. In long term we may want to consider other 
tools to help detect improper synchronization (although a perfect solution 
would be hard). For the short term, I think [~leftnoteasy] raised a very valid 
point (this JIRA) and let's the the problems solved. 

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288400#comment-14288400
 ] 

Wangda Tan commented on YARN-3091:
--

Since some classes hierarchy are across module (like AbstractYarnScheduler 
inheriented by CS and fair). I suggest to make sub tasks class-family-wise. 
What I proposed for sub tasks are
# AbstractYarnScheduler - CapacityScheduler - FairScheduler
# SchedulerApplicationAttempt - FiCaSchedulerApp - FSAppAttempt
# AbstractCSQueue - ParentQueue - LeafQueue
# AppSchedulingInfo

Hope to get your thoughts on this, if you agree, I will go ahead and create 
sub-tickets.

Thanks,
Wangda

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288460#comment-14288460
 ] 

Varun Saxena commented on YARN-3091:


Ok...

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288411#comment-14288411
 ] 

Varun Saxena commented on YARN-3091:


Similar to YARN-3008 ? Maybe that can be linked to this.

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288421#comment-14288421
 ] 

Varun Saxena commented on YARN-3091:


Yeah. I meant YARN-3008 can probably made one subtask of this.
It can address this part :
AbstractYarnScheduler - CapacityScheduler - FairScheduler

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288412#comment-14288412
 ] 

Varun Saxena commented on YARN-3091:


That one is for FairScheduler

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288417#comment-14288417
 ] 

Wangda Tan commented on YARN-3091:
--

Thanks [~varun_saxena] pointing this, I think such fine-grained locking 
enhancement should also be included in this umbrella ticket. This JIRA is 
intended to track scheduler lock improvements, not only for one special 
scheduler type.

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288856#comment-14288856
 ] 

Sunil G commented on YARN-3091:
---

As per code review, we can get more issues in these areas mentioned by 
[~leftnoteasy].
I feel a task we can start to run Jcarder to clearly pinpoint the locks 
problems. So it can help us in designing these subtasks with more clarity and 
may be helpful in verifying these changes again.




 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3091) [Umbrella] Improve locks of RM scheduler

2015-01-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288878#comment-14288878
 ] 

Tsuyoshi OZAWA commented on YARN-3091:
--

Adding HADOOP-9213 as a related issue which adds a support for jCarder with 
Jenkins CI. 

 [Umbrella] Improve locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)