[jira] [Updated] (YARN-2881) Implement PlanFollower for FairScheduler

2014-12-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2881:

Attachment: YARN-2881.001.patch

Implements rest of the Fair Scheduler Reservation system. 
Tested with unit tests added and by using the sample from YARN-2609 and 
validating in UI from YARN-2664

 Implement PlanFollower for FairScheduler
 

 Key: YARN-2881
 URL: https://issues.apache.org/jira/browse/YARN-2881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2881.001.patch, YARN-2881.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1206) as

2014-12-22 Thread Mukesh Jha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukesh Jha updated YARN-1206:
-
Summary: as  (was: AM container log link broken on NM web page even though 
local container logs are available)

 as
 --

 Key: YARN-1206
 URL: https://issues.apache.org/jira/browse/YARN-1206
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1206.1.patch, YARN-1206.patch


 With log aggregation disabled, when container is running, its logs link works 
 properly, but after the application is finished, the link shows 'Container 
 does not exist.'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1206) AM container log link broken on NM web page

2014-12-22 Thread Mukesh Jha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukesh Jha updated YARN-1206:
-
Summary: AM container log link broken on NM web page  (was: as)

 AM container log link broken on NM web page
 ---

 Key: YARN-1206
 URL: https://issues.apache.org/jira/browse/YARN-1206
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1206.1.patch, YARN-1206.patch


 With log aggregation disabled, when container is running, its logs link works 
 properly, but after the application is finished, the link shows 'Container 
 does not exist.'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429
 ] 

Rohith commented on YARN-2340:
--

[~jianhe] Kindly review the patch

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255589#comment-14255589
 ] 

Hadoop QA commented on YARN-2881:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688624/YARN-2881.001.patch
  against trunk revision ecf1469.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 17 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6167//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6167//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6167//console

This message is automatically generated.

 Implement PlanFollower for FairScheduler
 

 Key: YARN-2881
 URL: https://issues.apache.org/jira/browse/YARN-2881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2881.001.patch, YARN-2881.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255646#comment-14255646
 ] 

Hudson commented on YARN-2939:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6774 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6774/])
YARN-2939. Fix new findbugs warnings in hadoop-yarn-common. (Li Lu via 
junping_du) (junping_du: rev a696fbb001b946ae75f3b8e962839c2fd3decfa1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/VisualizeStateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/PriorityPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java


 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Fix For: 2.7.0

 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255763#comment-14255763
 ] 

Hudson commented on YARN-2939:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1981 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1981/])
YARN-2939. Fix new findbugs warnings in hadoop-yarn-common. (Li Lu via 
junping_du) (junping_du: rev a696fbb001b946ae75f3b8e962839c2fd3decfa1)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/VisualizeStateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/PriorityPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java


 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Fix For: 2.7.0

 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255770#comment-14255770
 ] 

Hudson commented on YARN-2939:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/46/])
YARN-2939. Fix new findbugs warnings in hadoop-yarn-common. (Li Lu via 
junping_du) (junping_du: rev a696fbb001b946ae75f3b8e962839c2fd3decfa1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/PriorityPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/VisualizeStateMachine.java


 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Fix For: 2.7.0

 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255808#comment-14255808
 ] 

Hudson commented on YARN-2939:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #50 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/50/])
YARN-2939. Fix new findbugs warnings in hadoop-yarn-common. (Li Lu via 
junping_du) (junping_du: rev a696fbb001b946ae75f3b8e962839c2fd3decfa1)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/PriorityPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/VisualizeStateMachine.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java


 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Fix For: 2.7.0

 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255838#comment-14255838
 ] 

Hudson commented on YARN-2939:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2000 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2000/])
YARN-2939. Fix new findbugs warnings in hadoop-yarn-common. (Li Lu via 
junping_du) (junping_du: rev a696fbb001b946ae75f3b8e962839c2fd3decfa1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/VisualizeStateMachine.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/PriorityPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java


 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Fix For: 2.7.0

 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-22 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2946:
-
Attachment: 0003-YARN-2946.patch

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, 0003-YARN-2946.patch, 
 RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
 TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255864#comment-14255864
 ] 

Rohith commented on YARN-2946:
--

Thanks [~jianhe] for review.. I updated the patch fixing above review comment..
Kindly review the update patch

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, 0003-YARN-2946.patch, 
 RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
 TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255888#comment-14255888
 ] 

Varun Saxena commented on YARN-2939:


[~djp], [~jianhe], if you have time, can you have a look at other 3 similar 
issues as well. YARN-2940, YARN-2937 and YARN-2938

 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Fix For: 2.7.0

 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()

2014-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255891#comment-14255891
 ] 

Varun Saxena commented on YARN-868:
---

Findbugs to be fixed by YARN-2937 to YARN-2940

 YarnClient should set the service address in tokens returned by 
 getRMDelegationToken()
 --

 Key: YARN-868
 URL: https://issues.apache.org/jira/browse/YARN-868
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Varun Saxena
 Attachments: YARN-868.patch


 Either the client should set this information into the token or the client 
 layer should expose an api that returns the service address.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255964#comment-14255964
 ] 

Hadoop QA commented on YARN-2946:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688659/0003-YARN-2946.patch
  against trunk revision a696fbb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6168//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6168//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6168//console

This message is automatically generated.

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, 0003-YARN-2946.patch, 
 RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
 TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens

2014-12-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255968#comment-14255968
 ] 

Zhijie Shen commented on YARN-2971:
---

[~jeagles], one question: why will  serviceURI be different from restURI? I 
suppose service will be [host:port] of restURI if getting and renewing the DT 
at the same ATS?

 RM uses conf instead of token service address to renew timeline delegation 
 tokens
 -

 Key: YARN-2971
 URL: https://issues.apache.org/jira/browse/YARN-2971
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2971-v1.patch


 The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to 
 renew Timeline DelegationTokens. It should read the service address out of 
 the token to renew the delegation token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-22 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-2.patch

Thanks [~wangda] for the review.
Make sense.
Updating the patch.

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2970) NodeLabel operations in RMAdmin CLI get missing in help command.

2014-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256052#comment-14256052
 ] 

Varun Saxena commented on YARN-2970:


[~djp], kindly review.

 NodeLabel operations in RMAdmin CLI get missing in help command.
 

 Key: YARN-2970
 URL: https://issues.apache.org/jira/browse/YARN-2970
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-2970.patch


 NodeLabel operations in RMAdmin CLI get missing in help command when I am 
 debugging YARN-313, we should add them on as other cmds:
 {noformat} 
 yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshResources] 
 [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
 [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
 [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help 
 [cmd]]
-refreshQueues: Reload the queues' acls, states and scheduler specific 
 properties.
 ResourceManager will reload the mapred-queues configuration 
 file.
-refreshNodes: Refresh the hosts information at the ResourceManager.
-refreshResources: Refresh resources of NodeManagers at the 
 ResourceManager.
-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups 
 mappings
-refreshUserToGroupsMappings: Refresh user-to-groups mappings
-refreshAdminAcls: Refresh acls for administration of ResourceManager
-refreshServiceAcl: Reload the service-level authorization policy file.
 ResoureceManager will reload the authorization policy file.
-getGroups [username]: Get the groups which given user belongs to.
-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): 
 Update resource on specific node.
-help [cmd]: Displays help for the given command or all commands if none 
 is specified.
-addToClusterNodeLabels [label1,label2,label3] (label splitted by ,): 
 add to cluster node labels
-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by 
 ,): remove from cluster node labels
-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
 replace labels on nodes
-directlyAccessNodeLabelStore: Directly access node label store, with this 
 option, all node label related operations will not connect RM. Instead, they 
 will access/modify stored node labels directly. By default, it is false 
 (access via RM). AND PLEASE NOTE: if you configured 
 yarn.node-labels.fs-store.root-dir to a local directory (instead of NFS or 
 HDFS), this option will only work when the command run on the machine where 
 RM is running.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load

2014-12-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256070#comment-14256070
 ] 

Karthik Kambatla commented on YARN-2420:


Should we resolve this as Won't Fix since we want to move to a world with 
only continuous scheduling? 

 Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on 
 cluster load
 ---

 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load

2014-12-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan resolved YARN-2420.
---
Resolution: Won't Fix

 Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on 
 cluster load
 ---

 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load

2014-12-22 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256072#comment-14256072
 ] 

Wei Yan commented on YARN-2420:
---

sure, I'll close it.

 Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on 
 cluster load
 ---

 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256111#comment-14256111
 ] 

Hadoop QA commented on YARN-2933:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688695/YARN-2933-2.patch
  against trunk revision a696fbb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/6169//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 16 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6169//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6169//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6169//console

This message is automatically generated.

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2984) Metrics for container's actual memory usage

2014-12-22 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2984:
--

 Summary: Metrics for container's actual memory usage
 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


It would be nice to capture resource usage per container, for a variety of 
reasons. This JIRA is to track memory usage. 

YARN-2965 tracks the resource usage on the node, and the two implementations 
should reuse code as much as possible. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2984) Metrics for container's actual memory usage

2014-12-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2984:
---
Attachment: yarn-2984-prelim.patch

Posted a prelim patch (yarn-2984-prelim.patch) that captures my intention. I am 
thinking of adding a config to enable/disable this tracking. 

[~rgrandl] - can you take a look and see if this can co-exist with YARN-2965? 

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2938) Fix new findbugs warnings in hadoop-yarn-resourcemanager and hadoop-yarn-applicationhistoryservice

2014-12-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256132#comment-14256132
 ] 

Zhijie Shen commented on YARN-2938:
---

[~varun_saxena]. The explanation is so detailed, and sounds good. Would you 
please rebase the patch?

bq. But as it is a rank 18 findbugs issue, didnt bother changing it. If you 
want, can fix it.

Yes, please

 Fix new findbugs warnings in hadoop-yarn-resourcemanager and 
 hadoop-yarn-applicationhistoryservice
 --

 Key: YARN-2938
 URL: https://issues.apache.org/jira/browse/YARN-2938
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: FindBugs Report.html, YARN-2938.001.patch, 
 YARN-2938.002.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2937) Fix new findbugs warnings in hadoop-yarn-nodemanager

2014-12-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256229#comment-14256229
 ] 

Zhijie Shen commented on YARN-2937:
---

[~varun_saxena], thanks for your explanation, which makes sense to me. Again, 
this patch doesn't apply too. Please kindly update it. And there's one 
additional comment.

The 
[document|http://docs.oracle.com/javase/7/docs/api/java/io/PrintWriter.html] 
says {{Methods in this class never throw I/O exceptions, although some of its 
constructors may. The client may inquire as to whether any errors have occurred 
by invoking checkError().}} Previously if I/O error occurs, we will always 
throw exception. Now some errors will be just recorded as WARN logs. IMHO, we 
should throw IOException here.
{code}
+if(pw.checkError()) {
+  LOG.warn(Error while closing cgroup file  + path);
{code}

 Fix new findbugs warnings in hadoop-yarn-nodemanager
 

 Key: YARN-2937
 URL: https://issues.apache.org/jira/browse/YARN-2937
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2937.001.patch, YARN-2937.002.patch, 
 YARN-2937.003.patch, YARN-2937.004.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2014-12-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-314:
-

Assignee: (was: Karthik Kambatla)

I haven't been able to spend more time to clean this up. Marking it unassigned 
in case anyone wants to work on this. 

 Schedulers should allow resource requests of different sizes at the same 
 priority and location
 --

 Key: YARN-314
 URL: https://issues.apache.org/jira/browse/YARN-314
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
 Fix For: 2.7.0

 Attachments: yarn-314-prelim.patch


 Currently, resource requests for the same container and locality are expected 
 to all be the same size.
 While it it doesn't look like it's needed for apps currently, and can be 
 circumvented by specifying different priorities if absolutely necessary, it 
 seems to me that the ability to request containers with different resource 
 requirements at the same priority level should be there for the future and 
 for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256362#comment-14256362
 ] 

Jian He commented on YARN-2340:
---

looks good, +1

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed

2014-12-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256368#comment-14256368
 ] 

Jian He commented on YARN-2920:
---

looks good, +1

 CapacityScheduler should be notified when labels on nodes changed
 -

 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, 
 YARN-2920.4.patch, YARN-2920.5.patch, YARN-2920.6.patch


 Currently, labels on nodes changes will only be handled by 
 RMNodeLabelsManager, but that is not enough upon labels on nodes changes:
 - Scheduler should be able to do take actions to running containers. (Like 
 kill/preempt/do-nothing)
 - Used / available capacity in scheduler should be updated for future 
 planning.
 We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256391#comment-14256391
 ] 

Hudson commented on YARN-2920:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6776 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6776/])
YARN-2920. Changed CapacityScheduler to kill containers on nodes where node 
labels are changed. Contributed by  Wangda Tan (jianhe: rev 
fdf042dfffa4d2474e3cac86cfb8fe9ee4648beb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/NodeLabelsUpdateSchedulerEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java


 CapacityScheduler should be notified when labels on nodes changed
 -

 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, 
 YARN-2920.4.patch, YARN-2920.5.patch, YARN-2920.6.patch


 Currently, labels on nodes changes will only be handled by 
 RMNodeLabelsManager, but that is not enough upon labels on nodes changes:
 - Scheduler should be able to do take actions to running containers. (Like 
 kill/preempt/do-nothing)
 - Used / available capacity in scheduler should be updated for future 
 planning.
 We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2929) Adding separator ApplicationConstants.FILE_PATH_SEPARATOR for better Windows support

2014-12-22 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256390#comment-14256390
 ] 

Chris Nauroth commented on YARN-2929:
-

[~ozawa], thank you for providing the additional details.

Right now, the typical workflow is that the application submission context 
controls the Java classpath by setting the {{CLASSPATH}} environment variable.  
If we take the example of MapReduce, the relevant code is in {{YARNRunner}} and 
{{MRApps}}.  There is support in there for handling environment variables in 
cross-platform application submission.  However, even putting that aside, I 
believe there is no problem with using '/' for a Windows job submission if you 
use this technique.  The NodeManager ultimately translates the classpath 
through {{Path}} and bundles the whole classpath into a jar file manifest to be 
referenced by the running container.

I believe the only problem shown in the example is that the application 
submission is trying to set classpath by command-line argument.  Is it possible 
to switch to using the {{CLASSPATH}} environment variable technique, similar to 
how the MapReduce code does it?  If yes, then there is no need for the proposed 
patch.

There are also some other potential issues with trying to pass classpath on the 
command line in Windows.  It's very easy to hit the Windows maximum command 
line length limitation of 8191 characters.  NodeManager already has the logic 
to work around this by bundling the classpath into a jar file manifest, and 
you'd get that for free by using the environment variable technique.

 Adding separator ApplicationConstants.FILE_PATH_SEPARATOR for better Windows 
 support
 

 Key: YARN-2929
 URL: https://issues.apache.org/jira/browse/YARN-2929
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2929.001.patch


 Some frameworks like Spark is tackling to run jobs on Windows(SPARK-1825). 
 For better multiple platform support, we should introduce 
 ApplicationConstants.FILE_PATH_SEPARATOR for making filepath 
 platform-independent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-12-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-810:
-
Attachment: YARN-810-3.patch

Rebase a new patch for review.

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810-3.patch, YARN-810.patch, YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 {noformat}
 Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
 0.0%st
 {noformat}
 Turning the quota back to -1:
 {noformat}
 echo -1  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 {noformat}
 Burns the cores again:
 {noformat}
 Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
 0.0%st
 CPU
 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
 {noformat}
 On my dev box, I was testing CGroups by running a python process eight times, 
 to burn through all the cores, since it was doing as described above (giving 
 extra CPU to the process, even with a cpu.shares limit). Toggling the 
 cfs_quota_us seems to enforce a hard limit.
 Implementation:
 What do you guys think about introducing a variable to 

[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-12-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-810:
-
Attachment: YARN-810-4.patch

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810-3.patch, YARN-810-4.patch, YARN-810.patch, 
 YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 {noformat}
 Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
 0.0%st
 {noformat}
 Turning the quota back to -1:
 {noformat}
 echo -1  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 {noformat}
 Burns the cores again:
 {noformat}
 Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
 0.0%st
 CPU
 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
 {noformat}
 On my dev box, I was testing CGroups by running a python process eight times, 
 to burn through all the cores, since it was doing as described above (giving 
 extra CPU to the process, even with a cpu.shares limit). Toggling the 
 cfs_quota_us seems to enforce a hard limit.
 Implementation:
 What do you guys think about introducing a variable to 

[jira] [Commented] (YARN-2970) NodeLabel operations in RMAdmin CLI get missing in help command.

2014-12-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256419#comment-14256419
 ] 

Junping Du commented on YARN-2970:
--

Thanks [~varun_saxena] for the patch! 
I think we should keep sequence of commands consistent here with usage info 
below. I would prefer [help] show up in the last as in general case, so we 
probably should change RMAdminCLI.ADMIN_USAGE. Other looks fine.


 NodeLabel operations in RMAdmin CLI get missing in help command.
 

 Key: YARN-2970
 URL: https://issues.apache.org/jira/browse/YARN-2970
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-2970.patch


 NodeLabel operations in RMAdmin CLI get missing in help command when I am 
 debugging YARN-313, we should add them on as other cmds:
 {noformat} 
 yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshResources] 
 [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
 [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
 [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help 
 [cmd]]
-refreshQueues: Reload the queues' acls, states and scheduler specific 
 properties.
 ResourceManager will reload the mapred-queues configuration 
 file.
-refreshNodes: Refresh the hosts information at the ResourceManager.
-refreshResources: Refresh resources of NodeManagers at the 
 ResourceManager.
-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups 
 mappings
-refreshUserToGroupsMappings: Refresh user-to-groups mappings
-refreshAdminAcls: Refresh acls for administration of ResourceManager
-refreshServiceAcl: Reload the service-level authorization policy file.
 ResoureceManager will reload the authorization policy file.
-getGroups [username]: Get the groups which given user belongs to.
-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): 
 Update resource on specific node.
-help [cmd]: Displays help for the given command or all commands if none 
 is specified.
-addToClusterNodeLabels [label1,label2,label3] (label splitted by ,): 
 add to cluster node labels
-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by 
 ,): remove from cluster node labels
-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
 replace labels on nodes
-directlyAccessNodeLabelStore: Directly access node label store, with this 
 option, all node label related operations will not connect RM. Instead, they 
 will access/modify stored node labels directly. By default, it is false 
 (access via RM). AND PLEASE NOTE: if you configured 
 yarn.node-labels.fs-store.root-dir to a local directory (instead of NFS or 
 HDFS), this option will only work when the command run on the machine where 
 RM is running.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256442#comment-14256442
 ] 

Junping Du commented on YARN-2939:
--

bq. Junping Du, Jian He, if you have time, can you have a look at other 3 
similar issues as well. YARN-2940, YARN-2937 and YARN-2938
Sure. I will review them today.


 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Fix For: 2.7.0

 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256508#comment-14256508
 ] 

Hadoop QA commented on YARN-810:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688757/YARN-810-3.patch
  against trunk revision fdf042d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 52 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6170//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6170//console

This message is automatically generated.

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810-3.patch, YARN-810-4.patch, YARN-810.patch, 
 YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}

[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256529#comment-14256529
 ] 

Hadoop QA commented on YARN-810:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688764/YARN-810-4.patch
  against trunk revision fdf042d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 52 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6171//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6171//console

This message is automatically generated.

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810-3.patch, YARN-810-4.patch, YARN-810.patch, 
 YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you 

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2014-12-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256547#comment-14256547
 ] 

Junping Du commented on YARN-914:
-

Hi [~mingma], Thanks for comments here.
bq. So YARN will reduce the capacity of the nodes as part of the decomission 
process until all its map output are fetched or until all the applications the 
node touches have completed?
Yes. I am not sure if it is necessary for YARN to mark additional 
decommissioned on the node as node's resource is already updated to 0, and no 
container will get chance to be allocated on the node. Auxiliary service should 
still be running which shouldn't consume much resource if no request of service.

bq. In addition, it will be interesting to understand how you handle long 
running jobs.
Do you mean long-running services? 
First, I think we should support a timeout in drain resources of the node 
(ResourceOption already has timeout in design). So running containers should be 
preempted if run out of time. 
Second, we should support special container tag for the long running services 
(some discussions in YARN-1039) so we don't have to waste time to wait 
container finish until timeout. 
Third, in prospective of operation, we could add long-running label to specific 
nodes and try not to do decommission on nodes with long-running tag.
Let me know if this make sense to you.


 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256584#comment-14256584
 ] 

Hadoop QA commented on YARN-2340:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch
  against trunk revision fdf042d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6173//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6173//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6173//console

This message is automatically generated.

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256589#comment-14256589
 ] 

Rohith commented on YARN-2946:
--

Checked test failures, TestContainerAllocation is fails intermediately in trunk.
And I am looking into TestRMRestart test failure.

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, 0003-YARN-2946.patch, 
 RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
 TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256592#comment-14256592
 ] 

Hadoop QA commented on YARN-2946:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688659/0003-YARN-2946.patch
  against trunk revision fdf042d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6172//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6172//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6172//console

This message is automatically generated.

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, 0003-YARN-2946.patch, 
 RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
 TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore

2014-12-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256618#comment-14256618
 ] 

Rohith commented on YARN-2946:
--

Since handleStoreEvent() called from event dispatcher for RMApp Store events 
and syncronously for DT store, TestRMRestart was overriding handleStoreEvent() 
simulate test scnario which was causing start up failure.
Will correct test case and update patch.

 DeadLocks in RMStateStore-ZKRMStateStore
 --

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 
 0002-YARN-2946.patch, 0003-YARN-2946.patch, 
 RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
 TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256631#comment-14256631
 ] 

Rohith commented on YARN-2340:
--

There so many tests are failing randomly in trunk!!

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256637#comment-14256637
 ] 

Jian He commented on YARN-2340:
---

right, we should spend time fixing these..

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256641#comment-14256641
 ] 

Hudson commented on YARN-2340:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6777 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6777/])
YARN-2340. Fixed NPE when queue is stopped during RM restart. Contributed by 
Rohith Sharmaks (jianhe: rev 0d89859b51157078cc504ac81dc8aa75ce6b1782)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)