date:20150827


 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: (was: YARN-3717.20150826-1.patch)

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
 YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
 YARN-3717.20150825-1.patch


 1 Add the default-node-Label expression for each queue in scheduler page.
 2 In Application/Appattempt page  show the app configured node label 
 expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java


 [ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4090:
--
Summary: Make Collections.sort() more efficient in FSParentQueue.java  
(was: Make Collections.sort() more efficient in FSParent.java)

 Make Collections.sort() more efficient in FSParentQueue.java
 

 Key: YARN-4090
 URL: https://issues.apache.org/jira/browse/YARN-4090
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Xianyin Xin

 Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java


[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716785#comment-14716785
 ] 

Xianyin Xin commented on YARN-4090:
---

We should also pay attention to the ReadLock.lock() and unlock() in the first 
img which cost much time.

 Make Collections.sort() more efficient in FSParentQueue.java
 

 Key: YARN-4090
 URL: https://issues.apache.org/jira/browse/YARN-4090
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Xianyin Xin
 Attachments: sampling1.jpg, sampling2.jpg


 Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0009-YARN-3893.patch

Attaching patch after handling comments.
# timeout updated in testcase
# Changed from {{ACTIVE_REFRESH_FAIL}} to {{TRANSITION_TO_ACTIVE_FAILED}}

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4067) available resource could be set negative


[ 
https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716638#comment-14716638
 ] 

Hadoop QA commented on YARN-4067:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 28s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  58m 12s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m 54s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751600/YARN-4067.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0bf2854 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8928/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8928/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8928/console |


This message was automatically generated.

 available resource could be set negative
 

 Key: YARN-4067
 URL: https://issues.apache.org/jira/browse/YARN-4067
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-4067.patch


 as mentioned in YARN-4045 by [~leftnoteasy], available memory could be 
 negative due to reservation, propose to use componentwiseMax to 
 updateQueueStatistics in order to cap negative value to zero



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3717) Improve RM node labels web UI


 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: YARN-3717.20150826-1.patch

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
 YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
 YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch


 1 Add the default-node-Label expression for each queue in scheduler page.
 2 In Application/Appattempt page  show the app configured node label 
 expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4090) Make Collections.sort() more efficient in FSParent.java

Xianyin Xin created YARN-4090:
-

 Summary: Make Collections.sort() more efficient in FSParent.java
 Key: YARN-4090
 URL: https://issues.apache.org/jira/browse/YARN-4090
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Xianyin Xin


Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716784#comment-14716784
 ] 

Bibin A Chundatt commented on YARN-3893:


Testcase failures are not related to this patch.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3717) Improve RM node labels web UI


[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716670#comment-14716670
 ] 

Naganarasimha G R commented on YARN-3717:
-

Seems like some issue in the build process Test results is not getting reported 
properly, deleting and reuploading the patch

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
 YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
 YARN-3717.20150825-1.patch


 1 Add the default-node-Label expression for each queue in scheduler page.
 2 In Application/Appattempt page  show the app configured node label 
 expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java


 [ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4090:
--
Attachment: sampling2.jpg
sampling1.jpg

I construct a queue hierarchy with 3 levels,
   root
   child1  child2child3
child1.child1~10, child2.child1~15, child3.child1~15
the number of leaf queues is 40. A total of 1000 apps running randomly on the 
leaf queues. The sampling results show that about 2/3 of the cpu times of 
FSParentQueue.assignContainers() was spent on Collections.sort(). In 
Collections.sort(), about 40% was spent on 
SchedulerAppplicationAttempt.getCurrentConsumption() and about 36% was spent on 
Resources.substract(). The former time consuming is because 
FSParentQueue.getResourceUsage() will make recursion on it's children, while 
for the latter time consuming, the clone() in substract() takes much cpu time.

 Make Collections.sort() more efficient in FSParentQueue.java
 

 Key: YARN-4090
 URL: https://issues.apache.org/jira/browse/YARN-4090
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Xianyin Xin
 Attachments: sampling1.jpg, sampling2.jpg


 Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716779#comment-14716779
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 42s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752740/0010-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0bf2854 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716642#comment-14716642
 ] 

Naganarasimha G R commented on YARN-3893:
-

Hi [~bibinchundatt]
2 there are test cases related to transition in 
TestRMAdminService.testRMHAWithFileSystemBasedConfiguration but most of it is 
present in TestRMHA so i think it should be fine.

3 Well IMHO it would be better be handled in the later approach i suggested, 
as {{refreshAll}} is just a private method but actual  operation is 
transistionToActive which Failed which is more readable than 
{{ACTIVE_REFRESH_FAIL}}

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716645#comment-14716645
 ] 

Naganarasimha G R commented on YARN-3893:
-

Oops saw this message late !

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0010-YARN-3893.patch

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716644#comment-14716644
 ] 

Naganarasimha G R commented on YARN-3893:
-

Oops saw this message late !

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs


[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717073#comment-14717073
 ] 

Varun Saxena commented on YARN-4074:


Ok..will have a look. We dont need to support a query like list all the flow 
runs for a flow ?

 [timeline reader] implement support for querying for flows and flow runs
 

 Key: YARN-4074
 URL: https://issues.apache.org/jira/browse/YARN-4074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-4074-YARN-2928.POC.001.patch


 Implement support for querying for flows and flow runs.
 We should be able to query for the most recent N flows, etc.
 This includes changes to the {{TimelineReader}} API if necessary, as well as 
 implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-08-27 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4091:
--
Attachment: Improvement on debugdiagnostic information - YARN.pdf

 Improvement: Introduce more debug/diagnostics information to detail out 
 scheduler activity
 --

 Key: YARN-4091
 URL: https://issues.apache.org/jira/browse/YARN-4091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, resourcemanager
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Improvement on debugdiagnostic information - YARN.pdf


 As schedulers are improved with various new capabilities, more configurations 
 which tunes the schedulers starts to take actions such as limit assigning 
 containers to an application, or introduce delay to allocate container etc. 
 There are no clear information passed down from scheduler to outerworld under 
 these various scenarios. This makes debugging very tougher.
 This ticket is an effort to introduce more defined states on various parts in 
 scheduler where it skips/rejects container assignment, activate application 
 etc. Such information will help user to know whats happening in scheduler.
 Attaching a short proposal for initial discussion. We would like to improve 
 on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

[
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717146#comment-14717146
]

Sangjin Lee commented on YARN-4074:
---

cc [~gtCarrera9] and [~vrushalic] also for their thoughts.

There are some options for this, and there are pros and cons. I'm leaning
towards the current proposal ((1) below) for now, but we could enhance this
later as the UI jells more.

# do a specific entity query for each of the flow runs obtained from the flow
activity entity
# return all flow runs (possibly with limits and time windows) for the given
flow
# do a single query for all flow runs specified as a list of flow run id's

One interesting thing to note is that a flow activity entity (record) is an
activity of that flow *for a given day*. In other words, there can be multiple
flow activity entities for the same flow. The flow runs that are returned in
the flow activity entity are only for that given day.

Then the question is, when I click that flow activity record, what flow runs do
I expect to see? It's bit ambiguous, but I think it might make more sense to
return only the flow runs that are referenced in that particular day if we're
using the flow activity to render the landing page.

If we assume that, then (2) is probably not needed for this. Then it leaves us
with (1) or (3). The benefit of (1) is that it fits easily into the existing
reader API (getEntity). The downside is that you may need to make multiple
reader calls to retrieve flow runs But normally the number of flow runs in a
day for a given flow should be very small, so it might not be a big deal.

One hybrid approach may be that the REST API supports URLs based on the list
but the web service code can make multiple reader getEntity() calls. We'd still
need to define the form of the URLs to support that type of queries.

Thoughts?

[timeline reader] implement support for querying for flows and flow runs

Key: YARN-4074
URL: https://issues.apache.org/jira/browse/YARN-4074
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Attachments: YARN-4074-YARN-2928.POC.001.patch

Implement support for querying for flows and flow runs.
We should be able to query for the most recent N flows, etc.
This includes changes to the {{TimelineReader}} API if necessary, as well as
implementation of the API.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-08-27 Thread Sunil G (JIRA)

Sunil G created YARN-4091:
-

 Summary: Improvement: Introduce more debug/diagnostics information 
to detail out scheduler activity
 Key: YARN-4091
 URL: https://issues.apache.org/jira/browse/YARN-4091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, resourcemanager
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G


As schedulers are improved with various new capabilities, more configurations 
which tunes the schedulers starts to take actions such as limit assigning 
containers to an application, or introduce delay to allocate container etc. 
There are no clear information passed down from scheduler to outerworld under 
these various scenarios. This makes debugging very tougher.

This ticket is an effort to introduce more defined states on various parts in 
scheduler where it skips/rejects container assignment, activate application 
etc. Such information will help user to know whats happening in scheduler.

Attaching a short proposal for initial discussion. We would like to improve on 
this as we discuss.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4036) Findbugs warnings in hadoop-yarn-server-common


 [ 
https://issues.apache.org/jira/browse/YARN-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4036:
---
Attachment: findbugs_report.html

 Findbugs warnings in hadoop-yarn-server-common
 --

 Key: YARN-4036
 URL: https://issues.apache.org/jira/browse/YARN-4036
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: findbugs_report.html


 Refer to 
 https://issues.apache.org/jira/browse/YARN-3232?focusedCommentId=14679146page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14679146



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717131#comment-14717131
 ] 

Jian He commented on YARN-4087:
---

[~bibinchundatt],   the logic is that default value for RM_FAIL_FAST is  
YARN_FAIL_FAST

 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717065#comment-14717065
 ] 

Varun Saxena commented on YARN-3528:


+1, latest patch LGTM

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3717) Improve RM node labels web UI


[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716967#comment-14716967
 ] 

Naganarasimha G R commented on YARN-3717:
-

Hi [~leftnoteasy], 
Seems like the patch seems to be in state for review based on the previous 
jenkins report. Can you take a look @ the patch.

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
 YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
 YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch


 1 Add the default-node-Label expression for each queue in scheduler page.
 2 In Application/Appattempt page  show the app configured node label 
 expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3717) Improve RM node labels web UI


[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716903#comment-14716903
 ] 

Hadoop QA commented on YARN-3717:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  25m  1s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  2s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 45s | The applied patch generated  3 
new checkstyle issues (total was 16, now 18). |
| {color:green}+1{color} | whitespace |   0m 12s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  0s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 57s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  58m  0s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 131m 34s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752747/YARN-3717.20150826-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 0bf2854 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/console |


This message was automatically generated.

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
 YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
 YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch


 1 Add the default-node-Label expression for each queue in scheduler page.
 2 In Application/Appattempt page  show the app configured node label 
 expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}

[
https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717267#comment-14717267
]

Srikanth Kandula commented on YARN-4056:

I looked. Sort of similar but not really. The similarity is that both allow
multiple containers to be allocated within fewer calls.

The difference is in the policies and the complexity. Bundling allows any
arbitrary subset of 'legit' tasks to be assigned. Whereas assignMultiple simply
assigns the first few. For example, bundling can decide that the 2nd, 3rd and
10th tasks are a good choice in contrast to assigning just the 1st task (the
others may not fit). assignMultiple does not allow for this.

Bundling is slightly more complex because the actual assignment is deferred
till the loop finishes. Whereas assignMultiple assigns each task in place and
keeps going.

Patch is with [~chris.douglas] for an internal review.

We are pushing out a bundler that mimics the current scheduler. All the tests
pass and there is no performance change. As expected. Note however that the
allocations are still deferred.

Better bundlers are in the works.

Bundling: Searching for multiple containers in a single pass over {queues,
applications, priorities}

Key: YARN-4056
URL: https://issues.apache.org/jira/browse/YARN-4056
Project: Hadoop YARN
Issue Type: New Feature
Components: capacityscheduler, resourcemanager, scheduler
Reporter: Srikanth Kandula
Assignee: Robert Grandl
Attachments: bundling.docx

More than one container is allocated on many NM heartbeats. Yet, the current
scheduler allocates exactly one container per iteration over {{queues,
applications, priorities}}. When there are many queues, applications, or
priorities allocating only one container per iteration can needlessly
increase the duration of the NM heartbeat.

In this JIRA, we propose bundling. That is, allow arbitrarily many containers
to be allocated in a single iteration over {{queues, applications and
priorities}}.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717257#comment-14717257
]

Junping Du commented on YARN-4074:
--

Thanks for uploading a patch, [~sjlee0]! Sorry for coming late on this, but
have a critical question on TimelineReader interface:
bq. Currently I am not planning to add new flow-specific methods to the
TimelineReader interface.
If so , how to query lastest N records with existing getEntities() API?
Actually, I think we should refactor existing getEntities() API before things
get worse. It include too many parameters, and most of them are optional. This
is very un-handy, easily cause bug and very hard to extend in future.
Instead, we should define something like EntityFilter class to include most of
these optional fields (include time range, topN, info/config/metric
sub-filters, etc.) which also be extended easily for other filters in future.
Thoughts?

Still in walking through your POC patch, more comments come after.

[timeline reader] implement support for querying for flows and flow runs

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class


[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717294#comment-14717294
 ] 

Srikanth Kandula commented on YARN-4081:


Ease of expression is a great thing to have. So also is extending to multiple 
resources. That is all cool.

I am mostly worried about the performance impact of replacing a small 
datastructure that has native types with a much larger datastructure that has 
user-defined types.  Could you run a profile?  How much more space would a 
resource object take up now? How much more time would it take to initialize and 
garbage collect 10K resource objects?

 Add support for multiple resource types in the Resource class
 -

 Key: YARN-4081
 URL: https://issues.apache.org/jira/browse/YARN-4081
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-4081-YARN-3926.001.patch


 For adding support for multiple resource types, we need to add support for 
 this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4036) Findbugs warnings in hadoop-yarn-server-common


 [ 
https://issues.apache.org/jira/browse/YARN-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4036:
---
Attachment: (was: findbugs_report.html)

 Findbugs warnings in hadoop-yarn-server-common
 --

 Key: YARN-4036
 URL: https://issues.apache.org/jira/browse/YARN-4036
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena

 Refer to 
 https://issues.apache.org/jira/browse/YARN-3232?focusedCommentId=14679146page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14679146



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat


[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717317#comment-14717317
 ] 

Srikanth Kandula commented on YARN-1012:


Ack. Will do.

 Report NM aggregated container resource utilization in heartbeat
 

 Key: YARN-1012
 URL: https://issues.apache.org/jira/browse/YARN-1012
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Arun C Murthy
Assignee: Inigo Goiri
 Fix For: 2.8.0

 Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
 YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
 YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
 YARN-1012-9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717480#comment-14717480
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8359 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8359/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java


 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Fix For: 2.8.0

 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
 0003-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4087) Set YARN_FAIL_FAST to be false by default


 [ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4087:
--
Attachment: YARN-4087.2.patch

 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch, YARN-4087.2.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717522#comment-14717522
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #313 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/313/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Fix For: 2.8.0

 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
 0003-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir

2015-08-27 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717524#comment-14717524
 ] 

Hitesh Shah commented on YARN-4085:
---

Set values in the environment as compared to a file? If a file, should that be 
a properties file with all useful information written into it and not just the 
resource size info? 

 Generate file with container resource limits in the container work dir
 --

 Key: YARN-4085
 URL: https://issues.apache.org/jira/browse/YARN-4085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Minor

 Currently, a container doesn't know what resource limits are being imposed on 
 it. It would be helpful if the NM generated a simple file in the container 
 work dir with the resource limits specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API


[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717539#comment-14717539
 ] 

Vinod Kumar Vavilapalli commented on YARN-4009:
---

Cross origin support exists for Timeline Service V1, linking related tickets.

 CORS support for ResourceManager REST API
 -

 Key: YARN-4009
 URL: https://issues.apache.org/jira/browse/YARN-4009
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prakash Ramachandran

 Currently the REST API's do not have CORS support. This means any UI (running 
 in browser) cannot consume the REST API's. For ex Tez UI would like to use 
 the REST API for getting application, application attempt information exposed 
 by the API's. 
 It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-27 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.004.patch

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
 YARN-3920.004.patch, YARN-3920.004.patch, yARN-3920.001.patch, 
 yARN-3920.002.patch, yARN-3920.003.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-27 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717526#comment-14717526
 ] 

Hitesh Shah commented on YARN-4087:
---

It would be good to rename the config property to something that provides a bit 
more clarity on what the config knob is meant to control. 

 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717568#comment-14717568
 ] 

Jian He commented on YARN-4087:
---

The YARN_FAIL_FAST is a global knob to control all components, e.g. RM, NM; The 
config description does the clarification. Just can't think of a concise and 
meaningful name. Any naming suggestion is welcome.

Update the patch to carify the config description more.

 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717639#comment-14717639
 ] 

Hudson commented on YARN-3250:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #318 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/318/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Fix For: 2.8.0

 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
 0003-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717634#comment-14717634
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1046 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1046/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java


 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Fix For: 2.8.0

 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
 0003-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

[
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717758#comment-14717758
]

Li Lu commented on YARN-4074:
-

Thank [~sjlee0]! I looked at the current POC patch and have some comments:
# In general, I'm OK with this approach. I think the current FlowEntity design
should provide sufficient information for the web UI POC.
# As a general question, since we're returning our timeline entities as jsons
in our web service, we need to some sort rebuild those entities on the js
client side, right? If this is the case, we need to provide some js object
model to be consistent with our TimelineEntity object model? I'm not a
front-end expert so I'd like to learn the typical practice on this problem.
# Please make sure, in the final patch, to change timeline schema creator so
that we're consistent with the list of tables. Maybe we'd like to find some
better ways to keep all these tables consistent within writer, reader and
schema creator in future.
# I agree with all of you guys that we may want to refactor the current
implementation. For example, we may not want to dispatch incoming timeline
entity to different tables by a list of if-statements (deciding which table to
go has already caused me some confusion when working on the offline aggregator
patch rebase)? Also, the parsing logic can also be easily isolated I believe?
# Some changes in files like FlowActivityRowKey.java are not included in this
patch?

[timeline reader] implement support for querying for flows and flow runs

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

[
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sangjin Lee updated YARN-4074:
--
Attachment: YARN-4074-YARN-2928.POC.002.patch

Posting a v.2 POC patch. This adds the flow run query.

As for [~djp]'s comments, yes, I agree that the reader code needs more serious
refactoring, both in the API as well as the implementation.

I believe [~varun_saxena]'s looking into cleaning up the filters, and so on in
YARN-3863. So improving the API would be taken up by Varun. Varun?

I'd also like to refactor the implementation more to restructure it. This POC
patch is by no means an indication of the final form of this patch. I just
wanted to get it out there so we can ensure it is correct and discuss the
approach taken here. I hope that clarifies things a bit.

[timeline reader] implement support for querying for flows and flow runs

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs


[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717681#comment-14717681
 ] 

Li Lu commented on YARN-4074:
-

Hi [~sjlee0], so far the first option looks good to me. The upside of this is 
that it fits our web UI POC requirements fine, and it's relatively clean to 
maintain. The downside is that in order to support some complex use cases, we 
need to make some compositions. For the current stage I think it's fine and we 
can use it to bootstrap our web UI renderers. 

 [timeline reader] implement support for querying for flows and flow runs
 

 Key: YARN-4074
 URL: https://issues.apache.org/jira/browse/YARN-4074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-4074-YARN-2928.POC.001.patch, 
 YARN-4074-YARN-2928.POC.002.patch


 Implement support for querying for flows and flow runs.
 We should be able to query for the most recent N flows, etc.
 This includes changes to the {{TimelineReader}} API if necessary, as well as 
 implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717721#comment-14717721
 ] 

Junping Du commented on YARN-4074:
--

Ok. Have a separated JIRA to track this refactor work should be fine. Thanks 
for pointing to that JIRA. 

 [timeline reader] implement support for querying for flows and flow runs
 

 Key: YARN-4074
 URL: https://issues.apache.org/jira/browse/YARN-4074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-4074-YARN-2928.POC.001.patch, 
 YARN-4074-YARN-2928.POC.002.patch


 Implement support for querying for flows and flow runs.
 We should be able to query for the most recent N flows, etc.
 This includes changes to the {{TimelineReader}} API if necessary, as well as 
 implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4043) Change logging of warning message : an attempt to override final parameter:

2015-08-27 Thread Spandan Dutta (JIRA)

[
https://issues.apache.org/jira/browse/YARN-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Spandan Dutta updated YARN-4043:

Description:
In the following
[function|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L2739].

When the attr is in the list of final attrs it just outputs this message
without actually updating any resources as per my understanding.

We change this to debug logging.

was:
In the following
[function|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L2739].

When the attr is in the list of final attrs it just outputs this message
without actually updating any resources as per my understanding.

We should remove this warning.

Change logging of warning message : an attempt to override final parameter:
-

Key: YARN-4043
URL: https://issues.apache.org/jira/browse/YARN-4043
Project: Hadoop YARN
Issue Type: Bug
Reporter: Spandan Dutta
Assignee: Spandan Dutta
Attachments: warn-msg.patch, warn-msg.patch

In the following
[function|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L2739].

When the attr is in the list of final attrs it just outputs this message
without actually updating any resources as per my understanding.
We change this to debug logging.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4043) Change logging of warning message : an attempt to override final parameter:

2015-08-27 Thread Spandan Dutta (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Spandan Dutta updated YARN-4043:

Summary: Change logging of warning message : an attempt to override final 
parameter:  (was: Unnecessary warning message : an attempt to override final 
parameter:)

 Change logging of warning message : an attempt to override final parameter:
 -

 Key: YARN-4043
 URL: https://issues.apache.org/jira/browse/YARN-4043
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Spandan Dutta
Assignee: Spandan Dutta
 Attachments: warn-msg.patch, warn-msg.patch


 In the following 
 [function|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L2739].
  
 When the attr is in the list of final attrs it just outputs this message 
 without actually updating any resources as per my understanding. 
 We should remove this warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717746#comment-14717746
]

Bikas Saha commented on YARN-4088:
--

Is the suggestion to process them in concurrently? Not quite sure what async
means here? Is it async wrt the RPC thread?
Another alternative would be to dynamically adjust the NM heartbeat interval.
IIRC, the NM next heartbeat interval is sent by the RM in the response to the
heartbeat. If not, then this could be added. The RM could potentially increase
this interval till it reaches a steady/stable state of heartbeat processing.
This would help in self-adjusting to cluster sizes. Small for small cluster and
high for high cluster. This could tune up under high load and then tune down
once load diminishes.

RM should be able to process heartbeats from NM asynchronously
--

Key: YARN-4088
URL: https://issues.apache.org/jira/browse/YARN-4088
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

Today, the RM sequentially processes one heartbeat after another.
Imagine a 3000 server cluster with each server heart-beating every 3s. This
gives the RM 1ms on average to process each NM heartbeat. That is tough.
It is true that there are several underlying datastructures that will be
touched during heartbeat processing. So, it is non-trivial to parallelize the
NM heartbeat. Yet, it is quite doable...
Parallelizing the NM heartbeat would substantially improve the scalability of
the RM, allowing it to either
a) run larger clusters or
b) support faster heartbeats or dynamic scaling of heartbeats
c) take more asks from each application or
c) use cleverer/ more expensive algorithms such as node labels or better
packing or ...
Indeed the RM's scalability limit has been cited as the motivating reason for
a variety of efforts which will become less needed if this can be solved.
Ditto for slow heartbeats. See Sparrow and Mercury papers for example.
Can we take a shot at this?
If not, could we discuss why.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers


[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717659#comment-14717659
 ] 

Hadoop QA commented on YARN-3920:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 57s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 10s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 41s | The applied patch generated  
127 new checkstyle issues (total was 0, now 127). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 43s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752847/YARN-3920.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a9c8ea7 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8931/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8931/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8931/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8931/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8931/console |


This message was automatically generated.

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
 YARN-3920.004.patch, YARN-3920.004.patch, yARN-3920.001.patch, 
 yARN-3920.002.patch, yARN-3920.003.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717664#comment-14717664
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #305 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/305/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java


 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Fix For: 2.8.0

 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
 0003-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717666#comment-14717666
 ] 

Hadoop QA commented on YARN-4087:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  0s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 22s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752862/YARN-4087.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a9c8ea7 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8932/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8932/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8932/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8932/console |


This message was automatically generated.

 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch, YARN-4087.2.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717865#comment-14717865
 ] 

Sangjin Lee commented on YARN-4053:
---

[~vrushalic], [~jrottinghuis], and I discussed supported types a little more, 
and we're of the opinion that we can *start* supporting only longs for now 
(i.e. no floating point types), while we can consider adding a floating point 
type (namely double) to the list of supported types. So for now, how about 
assuming (and enforcing) long as the type of the metric values, and pursue how 
we can add double later if we need it? Thoughts?

 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-4053-YARN-2928.01.patch


 Currently HBase implementation uses GenericObjectMapper to convert and store 
 values in backend HBase storage. This converts everything into a string 
 representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717905#comment-14717905
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2243 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2243/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java


 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Fix For: 2.8.0

 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
 0003-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate


 [ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2865:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestRMHA before the push. Patch 
applied cleanly.

 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2816) NM fail to start with NPE during container recovery


 [ 
https://issues.apache.org/jira/browse/YARN-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2816:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestNMLeveldbStateStoreService 
before the push. Patch applied cleanly.

 NM fail to start with NPE during container recovery
 ---

 Key: YARN-2816
 URL: https://issues.apache.org/jira/browse/YARN-2816
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: YARN-2816.000.patch, YARN-2816.001.patch, 
 YARN-2816.002.patch, leveldb_records.txt


 NM fail to start with NPE during container recovery.
 We saw the following crash happen:
 2014-10-30 22:22:37,211 INFO org.apache.hadoop.service.AbstractService: 
 Service 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
  failed in state INITED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
 The reason is some DB files used in NMLeveldbStateStoreService are 
 accidentally deleted to save disk space at 
 /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state. This leaves some incomplete 
 container record which don't have CONTAINER_REQUEST_KEY_SUFFIX(startRequest) 
 entry in the DB. When container is recovered at 
 ContainerManagerImpl#recoverContainer, 
 The NullPointerException at the following code cause NM shutdown.
 {code}
 StartContainerRequest req = rcs.getStartRequest();
 ContainerLaunchContext launchContext = req.getContainerLaunchContext();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

[
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717887#comment-14717887
]

Srikanth Kandula commented on YARN-4088:

See the problem with slower heartbeats is that if the tasks are short-running,
there will be a cluster-wide throughput drop due to the feedback delay. This is
one of the points that Sparrow (Spark) and Mercury hammer Yarn on... Of course,
reusing containers *can* help but other ducks have to align well. In general,
slowing the heartbeat is not a good thing.

RM should be able to process heartbeats from NM asynchronously
--

Key: YARN-4088
URL: https://issues.apache.org/jira/browse/YARN-4088
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity


[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717880#comment-14717880
 ] 

Naganarasimha G R commented on YARN-4091:
-

Seems like goal of YARN-3946 is a subset of this jira

 Improvement: Introduce more debug/diagnostics information to detail out 
 scheduler activity
 --

 Key: YARN-4091
 URL: https://issues.apache.org/jira/browse/YARN-4091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, resourcemanager
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Improvement on debugdiagnostic information - YARN.pdf


 As schedulers are improved with various new capabilities, more configurations 
 which tunes the schedulers starts to take actions such as limit assigning 
 containers to an application, or introduce delay to allocate container etc. 
 There are no clear information passed down from scheduler to outerworld under 
 these various scenarios. This makes debugging very tougher.
 This ticket is an effort to introduce more defined states on various parts in 
 scheduler where it skips/rejects container assignment, activate application 
 etc. Such information will help user to know whats happening in scheduler.
 Attaching a short proposal for initial discussion. We would like to improve 
 on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717907#comment-14717907
]

Bikas Saha commented on YARN-4088:
--

Right. So the combined objective is to continue to have small heartbeat
intervals with larger clusters while still using the central scheduler for all
allocations. Clearly, in theory, that is a bottleneck by design and our attempt
is to engineer our way out of it for medium size clusters. Right? :)

RM should be able to process heartbeats from NM asynchronously
--

Key: YARN-4088
URL: https://issues.apache.org/jira/browse/YARN-4088
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so


 [ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2890:
--
   Labels: 2.6.1-candidate 2.7.2-candidate  (was: 2.6.1-candidate)
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestJobHistoryEventHandler, 
TestMRTimelineEventHandling and TestDistributedShell before the push. Patch 
applied cleanly.

 MiniYarnCluster should turn on timeline service if configured to do so
 --

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: 2.6.1-candidate, 2.7.2-candidate
 Fix For: 2.6.1, 2.8.0

 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch, YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3817) [Aggregation] Flow and User level aggregation on Application States table


[ 
https://issues.apache.org/jira/browse/YARN-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717861#comment-14717861
 ] 

Li Lu commented on YARN-3817:
-

Oh BTW the patch is based on YARN-3816-YARN-2928-v1.patch in YARN-3816. 

 [Aggregation] Flow and User level aggregation on Application States table
 -

 Key: YARN-3817
 URL: https://issues.apache.org/jira/browse/YARN-3817
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Li Lu
 Attachments: Detail Design for Flow and User Level Aggregation.pdf, 
 YARN-3817-poc-v1.patch


 We need time-based flow/user level aggregation to present flow/user related 
 states to end users.
 Flow level represents summary info of a specific flow. User level aggregation 
 represents summary info of a specific user, it should include summary info of 
 accumulated and statistic means (by two levels: application and flow), like: 
 number of Flows, applications, resource consumption, resource means per app 
 or flow, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created


 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2414:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestAppPage before the push. Patch 
applied cleanly.

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: YARN-2414.20141104-1.patch, YARN-2414.20141104-2.patch, 
 YARN-2414.patch


 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at

[jira] [Updated] (YARN-2906) CapacitySchedulerPage shows HTML tags for a queue's Active Users


 [ 
https://issues.apache.org/jira/browse/YARN-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2906:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.


 CapacitySchedulerPage shows HTML tags for a queue's Active Users
 

 Key: YARN-2906
 URL: https://issues.apache.org/jira/browse/YARN-2906
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: YARN-2906v1.patch


 On the capacity scheduler web page, expanding the details of a queue shows 
 HTML tags among the details for the active users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

[
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717863#comment-14717863
]

Sangjin Lee commented on YARN-4053:
---

Thanks [~varun_saxena] for the discussion. As you said, one thing that really
causes issues is when inconsistent values are used for the same metric. At a
high level, I think we need to ask these questions:

- How important is it to support this scenario?
- If we don't really support this scenario, then what is the minimally
acceptable behavior if that were to happen?

The gist of the problem is that one cannot really write/read consistent values
without knowing the right type of the metric. The user will likely not know
that either for the write or read path. In the face of this, the main
difference between approach #1 (encoding it into the value) and approach #2
(adding it to the column qualifier) is that approach #1 will mix different-type
values into a single time series (column), and approach #2 will effectively
create two separate time series (columns). The rest is the fallout.

Change the way metric values are stored in HBase Storage

Key: YARN-4053
URL: https://issues.apache.org/jira/browse/YARN-4053
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
Attachments: YARN-4053-YARN-2928.01.patch

Currently HBase implementation uses GenericObjectMapper to convert and store
values in backend HBase storage. This converts everything into a string
representation(ASCII/UTF-8 encoded byte array).
While this is fine in most cases, it does not quite serve our use case for
metrics.
So we need to decide how are we going to encode and decode metric values and
store them in HBase.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED


 [ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2856:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestRMAppTransitions before the 
push. Patch applied cleanly.

 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: YARN-2856.1.patch, YARN-2856.patch


 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2905) AggregatedLogsBlock page can infinitely loop if the aggregated log file is corrupted


 [ 
https://issues.apache.org/jira/browse/YARN-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2905:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.

 AggregatedLogsBlock page can infinitely loop if the aggregated log file is 
 corrupted
 

 Key: YARN-2905
 URL: https://issues.apache.org/jira/browse/YARN-2905
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: YARN-2905.patch


 If the AggregatedLogsBlock page tries to serve up a portion of a log file 
 that has been corrupted (e.g.: like the case that was fixed by YARN-2724) 
 then it can spin forever trying to seek to the targeted log segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716153#comment-14716153
 ] 

Jian He commented on YARN-2884:
---

Looks good to me overall, I think there are still some problems with the 
AMRMProxyToken implementation. Basically, long running service may not work 
with the AMRMProxy.

1) below code in DefaultRequestInterceptor should create and return a new 
AMRMProxyToken in the final returned allocate response when needed. Otherwise, 
AM will fail to talk with AMRMTokenProxy after the key is rolled over in the 
AMRMTokenProxySecretManager. 
{code}
  @Override
  public AllocateResponse allocate(AllocateRequest request)
  throws YarnException, IOException {
if (LOG.isDebugEnabled()) {
  LOG.debug(Forwarding allocate request to the real YARN RM);
}
AllocateResponse allocateResponse = rmClient.allocate(request);
if (allocateResponse.getAMRMToken() != null) {
  updateAMRMToken(allocateResponse.getAMRMToken());
}
return allocateResponse; 
  }
{code}
 Below code in ApplicationMasterService#allocate shows how that is done.
{code}
  if (nextMasterKey != null
   nextMasterKey.getMasterKey().getKeyId() != amrmTokenIdentifier
.getKeyId()) {
RMAppAttemptImpl appAttemptImpl = (RMAppAttemptImpl)appAttempt;
TokenAMRMTokenIdentifier amrmToken = appAttempt.getAMRMToken();
if (nextMasterKey.getMasterKey().getKeyId() !=
appAttemptImpl.getAMRMTokenKeyId()) {
  LOG.info(The AMRMToken has been rolled-over. Send new AMRMToken back
  +  to application:  + applicationId);
  amrmToken = rmContext.getAMRMTokenSecretManager()
  .createAndGetAMRMToken(appAttemptId);
  appAttemptImpl.setAMRMToken(amrmToken);
}
allocateResponse.setAMRMToken(org.apache.hadoop.yarn.api.records.Token
  .newInstance(amrmToken.getIdentifier(), amrmToken.getKind()
.toString(), amrmToken.getPassword(), amrmToken.getService()
.toString()));
  }
{code}
2)  Some methods inside the AMRMProxyTokenSecretManager are not used at all. we 
may remove them ?

3) I think we need at least 1 end-to-end test for this. We can use 
MiniYarnCluster to simulate the whole thing. AM  talks with AMRMProxy which  
talks with RM to register/allocate/finish. In the test, we should also reduce 
the RM_AMRM_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS so that we can simulate the 
token renew behavior.  I'm ok to have a separate jira to track the end-to-end 
test, as this is a bit of work.


 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino
Assignee: Kishore Chaliparambil
 Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
 YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
 YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
 YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch


 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers


[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716162#comment-14716162
 ] 

Hadoop QA commented on YARN-3920:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | javac |   3m 37s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752667/YARN-3920.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4cbbfa2 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8924/console |


This message was automatically generated.

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
 YARN-3920.004.patch, yARN-3920.001.patch, yARN-3920.002.patch, 
 yARN-3920.003.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-08-27 Thread Inigo Goiri (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716178#comment-14716178
 ] 

Inigo Goiri commented on YARN-1012:
---

I think this is very YARN specific. It relies on the ResourceCalculator and so 
on which come from Common though.

Regarding adding network and disk usage, I fully agree. You guys should first 
extend ResourceUtilization (as done in this patch) to support disk and network 
and then extend the node resource monitor (YARN-3534) to collect it from the 
node.

 Report NM aggregated container resource utilization in heartbeat
 

 Key: YARN-1012
 URL: https://issues.apache.org/jira/browse/YARN-1012
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Arun C Murthy
Assignee: Inigo Goiri
 Fix For: 2.8.0

 Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
 YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
 YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
 YARN-1012-9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-27 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.004.patch

retrigger

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
 YARN-3920.004.patch, yARN-3920.001.patch, yARN-3920.002.patch, 
 yARN-3920.003.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler addresss


[ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716146#comment-14716146
 ] 

Jian He commented on YARN-4083:
---

One other thing to think about is what if NM died, should AM fall back to the 
RM?
Also, in case of RM HA, there will be multiple RM scheduler addresses, simply 
swapping out a single scheduler address will not work.

 Add a discovery mechanism for the scheduler addresss
 

 Key: YARN-4083
 URL: https://issues.apache.org/jira/browse/YARN-4083
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan

 Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
 HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler 
 address. This JIRA proposes the addition of an explicit discovery mechanism 
 for the scheduler address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716434#comment-14716434
 ] 

Hadoop QA commented on YARN-3528:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   8m  6s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 46s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  6s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 57s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   7m 24s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  53m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752705/YARN-3528-008.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / fdb56f7 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/console |


This message was automatically generated.

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716447#comment-14716447
 ] 

Brahma Reddy Battula commented on YARN-3528:


Following test case failure is unrelated and it can be handled in YARN-3433..

{noformat}
testNodeContainerXML(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers)
  Time elapsed: 0.008 sec   ERROR!
com.sun.jersey.test.framework.spi.container.TestContainerException: 
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
at 
org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
at 
org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
at 
com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
at 
com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.init(GrizzlyWebTestContainerFactory.java:129)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.init(GrizzlyWebTestContainerFactory.java:86)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
at 
com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
at com.sun.jersey.test.framework.JerseyTest.init(JerseyTest.java:217)
at 
org.apache.hadoop.yarn.webapp.JerseyTestBase.init(JerseyTestBase.java:27)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers.init(TestNMWebServicesContainers.java:180)
{noformat}


 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4089) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-27 Thread Shiwei Guo (JIRA)

Shiwei Guo created YARN-4089:


 Summary: Race condition when calling 
AbstractYarnScheduler.completedContainer.
 Key: YARN-4089
 URL: https://issues.apache.org/jira/browse/YARN-4089
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1, 2.5.2, 2.7.0, 2.6.0
Reporter: Shiwei Guo


There is a  race condition of calling AbstractYarnScheduler.completedContainer, 
which will cause the usedResource counter of application not accurate. At worst 
situation, the scheduler will not allocate any resource to any application in 
some queue( when the usedResource became negative) even there is indeed lots of 
free resource to be allocated.

It also cause the Scheduler UI and metrics report negative resource usage 
value.In our cluster, it has the ability to run 13000+ container, but the WEB 
UI says that:

- Containers Running: -26546
- Memory Used: -82.38 TB
- VCores Used: -26451

This is how it happens in FairSchedular:

completedContainer method will call application.containerCompleted, which will 
subtraction the resources used by this container from the usedResource counter 
of the application. So, if the completedContainer are called twice with the 
same container, the counter is subtracted too much values. So is the 
updateRootQueueMetrics call, so we can see negative allocatedMemory on 
rootQueue.

The solution is to check whether the container being supplied is still live 
inside the completedContainer (as shown in the patch). There is some check 
before calling completedContainer, but that's not enough.

For a more deeply discussion, the completedContainer may be called from two 
place:

1. Trigered by RMContainerEventType.FINISHED event:

{code:title=FairScheduler.nodeUpdate}
// Process completed containers
for (ContainerStatus completedContainer : completedContainers) {
  ContainerId containerId = completedContainer.getContainerId();
  LOG.debug(Container FINISHED:  + containerId);
  completedContainer(getRMContainer(containerId),
  completedContainer, RMContainerEventType.FINISHED);
}
{code}

2. Trigered by RMContainerEventType.RELEASED

{code:title=AbstractYarnScheduler.releaseContainers}
completedContainer(rmContainer,
SchedulerUtils.createAbnormalContainerStatus(containerId,
  SchedulerUtils.RELEASED_CONTAINER), RMContainerEventType.RELEASED);
{code}

RMContainerEventType.RELEASED is not triggered by MapReduce ApplicationMaster, 
so we won't see this problem on MR jobs. But TEZ will triggered it when it do 
not need this this container, while the NodeManger will also report a container 
complete message to RM ,which in turn trigger the RMContainerEventType.FINISHED 
event. If RMContainerEventType.FINISHED event comes to RM early than TEZ AM, 
the problem happens.

This behavior can be more easily seen if the cluster had setup a TimelineServer 
for TEZ, which make it more likely TEZ AM will send 
RMContainerEventType.RELEASED event later than NM send 
RMContainerEventType.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4089) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-27 Thread Shiwei Guo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiwei Guo updated YARN-4089:
-
Attachment: YARN-4089.001.patch

 Race condition when calling AbstractYarnScheduler.completedContainer.
 -

 Key: YARN-4089
 URL: https://issues.apache.org/jira/browse/YARN-4089
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
Reporter: Shiwei Guo
  Labels: patch
 Attachments: YARN-4089.001.patch


 There is a  race condition of calling 
 AbstractYarnScheduler.completedContainer, which will cause the usedResource 
 counter of application not accurate. At worst situation, the scheduler will 
 not allocate any resource to any application in some queue( when the 
 usedResource became negative) even there is indeed lots of free resource to 
 be allocated.
 It also cause the Scheduler UI and metrics report negative resource usage 
 value.In our cluster, it has the ability to run 13000+ container, but the WEB 
 UI says that:
 - Containers Running: -26546
 - Memory Used: -82.38 TB
 - VCores Used: -26451
 This is how it happens in FairSchedular:
 completedContainer method will call application.containerCompleted, which 
 will subtraction the resources used by this container from the usedResource 
 counter of the application. So, if the completedContainer are called twice 
 with the same container, the counter is subtracted too much values. So is the 
 updateRootQueueMetrics call, so we can see negative allocatedMemory on 
 rootQueue.
 The solution is to check whether the container being supplied is still live 
 inside the completedContainer (as shown in the patch). There is some check 
 before calling completedContainer, but that's not enough.
 For a more deeply discussion, the completedContainer may be called from two 
 place:
 1. Trigered by RMContainerEventType.FINISHED event:
 {code:title=FairScheduler.nodeUpdate}
 // Process completed containers
 for (ContainerStatus completedContainer : completedContainers) {
   ContainerId containerId = completedContainer.getContainerId();
   LOG.debug(Container FINISHED:  + containerId);
   completedContainer(getRMContainer(containerId),
   completedContainer, RMContainerEventType.FINISHED);
 }
 {code}
 2. Trigered by RMContainerEventType.RELEASED
 {code:title=AbstractYarnScheduler.releaseContainers}
 completedContainer(rmContainer,
 SchedulerUtils.createAbnormalContainerStatus(containerId,
   SchedulerUtils.RELEASED_CONTAINER), RMContainerEventType.RELEASED);
 {code}
 RMContainerEventType.RELEASED is not triggered by MapReduce 
 ApplicationMaster, so we won't see this problem on MR jobs. But TEZ will 
 triggered it when it do not need this this container, while the NodeManger 
 will also report a container complete message to RM ,which in turn trigger 
 the RMContainerEventType.FINISHED event. If RMContainerEventType.FINISHED 
 event comes to RM early than TEZ AM, the problem happens.
 This behavior can be more easily seen if the cluster had setup a 
 TimelineServer for TEZ, which make it more likely TEZ AM will send 
 RMContainerEventType.RELEASED event later than NM send 
 RMContainerEventType.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics


[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716497#comment-14716497
 ] 

Varun Saxena commented on YARN-3816:


Thanks [~djp] for the replies. 

bq. This will be part of failed over JIRAs
Ok.

bq. I would prefer to use TreeMap because it sort key (timestamp) when 
accessing it. aggregateTo() algorithm assume metrics are sorted by timestamp.
Hmm...Both getValues and getValuesJAXB return the same map but didnt notice the 
return types. So will have to typecast return value from getValues to use 
methods specific to TreeMap. In that case, I guess its fine to use 
getValuesJAXB 

bq. aggregateTo is not straighfoward and generic useful like methods in 
TimelineMetricCalculator, so let's hold on to expose it as utility class for 
now. Make it static sounds good though.
Ok.

I had one more question which you missed.
While TimelineMetric#toAggregate flag is meant to indicate if a metric needs to 
be aggregated. But are we planning to use it to indicate that a metric is an 
aggregated metric as well ? If yes, we should probably set this flag for each 
metric processed in TimelineCollector#appendAggregatedMetricsToEntities.
As Li said above will we be differentiating aggregated metrics from non 
aggregated ones ?

 [Aggregation] App-level Aggregation for YARN system metrics
 ---

 Key: YARN-3816
 URL: https://issues.apache.org/jira/browse/YARN-3816
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: Application Level Aggregation of Timeline Data.pdf, 
 YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch


 We need application level aggregation of Timeline data:
 - To present end user aggregated states for each application, include: 
 resource (CPU, Memory) consumption across all containers, number of 
 containers launched/completed/failed, etc. We need this for apps while they 
 are running as well as when they are done.
 - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
 aggregated to show details of states in framework level.
 - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
 on Application-level aggregations rather than raw entity-level data as much 
 less raws need to scan (with filter out non-aggregated entities, like: 
 events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-27 Thread Shiwei Guo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716506#comment-14716506
 ] 

Shiwei Guo commented on YARN-3933:
--

I created a new [YARN-4089|https://issues.apache.org/jira/browse/YARN-4089] to 
describe the  race condition bug for FairScheduler. I'm a newbie to the hadoop 
community, hope didn't do anything bad. Thanks.

 Resources(both core and memory) are being negative
 --

 Key: YARN-3933
 URL: https://issues.apache.org/jira/browse/YARN-3933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.2
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
  Labels: patch
 Attachments: patch.BUGFIX-JIRA-YARN-3933.txt


 In our cluster we are seeing available memory and cores being negative. 
 Initial inspection:
 Scenario no. 1: 
 In capacity scheduler the method allocateContainersToNode() checks if 
 there are excess reservation of containers for an application, and they are 
 no longer needed then it calls queue.completedContainer() which causes 
 resources being negative. And they were never assigned in the first place. 
 I am still looking through the code. Can somebody suggest how to simulate 
 excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


 [ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3528:
---
Attachment: YARN-3528-008.patch

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716346#comment-14716346
 ] 

Hadoop QA commented on YARN-3528:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   8m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 45s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  5s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 48s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   6m 53s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  53m 19s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestNodeStatusUpdater |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752696/YARN-3528-007.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / fdb56f7 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/console |


This message was automatically generated.

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716343#comment-14716343
 ] 

Brahma Reddy Battula commented on YARN-3528:


hmm.Updated patch which address all the comments..[~rkanter] and 
[~varun_saxena] kindly review.. thanks

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716371#comment-14716371
 ] 

Brahma Reddy Battula commented on YARN-3528:


Updated 008 patch to address above testcase failures..

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-27 Thread Varun Vasudev (JIRA)

[
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716228#comment-14716228
]

Varun Vasudev commented on YARN-4081:
-

Thanks for the feedback Srikanth!

bq. why use a Map. Is there a rough idea how many different resources one may
want to encode?

I would like the resource types supported to be configured and not set in code.
There's a proposal attached in the parent JIRA that goes into more detail on
this. We've seen tickets filed on YARN for adding disk, network, and HDFS
bandwidth as resource types. I would prefer it if we can let users just
configure the types they want to use and allow them to add arbitrary resource
types for scheduling(for example schedule based on the number of licenses
available on a node). Is there an alternate structure you would prefer for me
to use?

bq. Ditto for encapsulating strings in URIs
In the proposal, I propose using URIs as the identifier for the resource
type(similar to what Kubernetes uses).

bq. ResourceInformation wrapper over doubles
I'm didn't understand this - are you asking why we're using ResourceInformation
instead of using doubles?

Add support for multiple resource types in the Resource class
-

Key: YARN-4081
URL: https://issues.apache.org/jira/browse/YARN-4081
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Attachments: YARN-4081-YARN-3926.001.patch

For adding support for multiple resource types, we need to add support for
this in the Resource class.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-08-27 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716206#comment-14716206
 ] 

zhihai xu commented on YARN-3798:
-

[~ozawa], Yes, the latest patch YARN-3798-branch-2.7.006.patch looks good to me.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
  Labels: 2.6.1-candidate
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
 YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
 YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
 YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)

[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


 [ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3528:
---
Attachment: YARN-3528-007.patch

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716306#comment-14716306
 ] 

Varun Saxena commented on YARN-3528:


Thanks for updating the patch [~brahmareddy].

In the latest patch, same port has been assigned to both NM_ADDRESS and 
NM_LOCALIZER_ADDRESS. Haven't ran the test but this should lead to 
BindException in test. 
{code}
1722conf.set(YarnConfiguration.NM_ADDRESS, localhostAddress + : + 
port);
1723conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, localhostAddress + 
:
1724+ port);
{code}

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
 YARN-3528-007.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716543#comment-14716543
 ] 

Naganarasimha G R commented on YARN-3893:
-

Hi [~bibinchundatt],
Thanks for the patch, test cases ran fine, approach and test case seems to be 
fine but few comments from my side 
# timeout of 90 is on the higher side is that much req or was it for local 
testing ?
# instead of test case in RMHA can we think of adding it to TestRMAdminService 
as the failure is related to transition to Active ? 
# May be while throwing RMFatalEvent better to wrap it with another exception 
wrapping the existing one and with the message that transition to active failed 
so that RM Logs have clear information on what operation it exited. or may be 
eventType instead of having {{ACTIVE_REFRESH_FAIL}} we can have more intuitive 
name {{TRANSITION_TO_ACTIVE_FAILED}}

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716563#comment-14716563
 ] 

Bibin A Chundatt commented on YARN-3893:


Hi Naga

Thnks for looking into patch

{quote}
timeout of 90 is on the higher side is that much req or was it for local 
testing ?
{quote}
will update the same.

{quote}
instead of test case in RMHA can we think of adding it to TestRMAdminService as 
the failure is related to transition to Active ?
{quote}
As i understand all transistiontoActive  HA related testcases are added in 
same class.

3.{{TRANSITION_TO_ACTIVE_FAILED}} is not actually failing its {{refreshAll}} 
rt? Thts the reason it gave specific name.

Points 2 and 3 are not mandatory fix items rt?

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717831#comment-14717831
]

Bikas Saha commented on YARN-4088:
--

Why not on a 3K cluster? We could slowdown heartbeats to (say 10s) on a 3K node
cluster. That should work though I agree that NM info would be stale for
longer, if that's your point.

RM should be able to process heartbeats from NM asynchronously
--

Key: YARN-4088
URL: https://issues.apache.org/jira/browse/YARN-4088
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717785#comment-14717785
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2262 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2262/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Fix For: 2.8.0

 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
 0003-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

[
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717798#comment-14717798
]

Srikanth Kandula commented on YARN-4088:

Yes, concurrently. Your suggestion is a good one. In that, it does give the
RM more time to be clever on small clusters. But, no such luck on say a 3K
server cluster. Avoiding serialization may be the answer to most other problems.

RM should be able to process heartbeats from NM asynchronously
--

Key: YARN-4088
URL: https://issues.apache.org/jira/browse/YARN-4088
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager, scheduler
Reporter: Srikanth Kandula

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

[
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717855#comment-14717855
]

Sangjin Lee commented on YARN-4074:
---

Thanks [~gtCarrera9] for your comments.

{quote}
As a general question, since we're returning our timeline entities as jsons in
our web service, we need to some sort rebuild those entities on the js client
side, right? If this is the case, we need to provide some js object model to be
consistent with our TimelineEntity object model? I'm not a front-end expert so
I'd like to learn the typical practice on this problem.
{quote}
I'm not intimately familiar with that either. I hope someone who's familiar
could comment?

I'm going to do some refactoring to move away from the if-else branch (yuck).
There are aspects such as input validation, getting results from HBase, and
creating the entity objects that can be isolated more clearly. I need to give
some more thoughts on how to encapsulate that more clearly. This has some
bearing on the filter-related work that Varun is doing, so I'll try not to
touch that area in this JIRA.

One thing I forgot to mention is that the current POC patch is a diff against
the patch for YARN-3901, to be able to isolate the changes for this JIRA. The
patch for YARN-3901 needs to be reviewed and committed before this can be.
That's why this patch is missing what's included in the YARN-3901 patch.

[timeline reader] implement support for querying for flows and flow runs

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3817) [Aggregation] Flow and User level aggregation on Application States table