date:20150422


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507385#comment-14507385
 ] 

Rohith commented on YARN-3225:
--

+1(non-binding) LGTM.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
 YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled


[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507029#comment-14507029
 ] 

Hadoop QA commented on YARN-2740:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 10  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 22s | The applied patch generated  5 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m  0s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  0s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  54m 14s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m 10s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727201/YARN-2740.20150422-2.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b08908a |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7444/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7444/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7444//console |


This message was automatically generated.

 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
 YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
 YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, 
 YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-04-22 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507396#comment-14507396
 ] 

Naganarasimha G R commented on YARN-2740:
-

unit test failure is not related to this patch and can fix white space issue 
but not clear about the check style output, will correct it once i get some 
confirmation from allen

 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
 YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
 YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, 
 YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-22 Thread sandflee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507255#comment-14507255
 ] 

sandflee commented on YARN-3387:


It seems a bug in LaunchAM in MockRM.java, in LaunchAM:
1, wait App becomes ACCEPTED, after this appAttempt is created
2, node Heart beat 
3, wait appAttempt becomes ALLOCATED

If nodeHeartBeat is handled before appAttempt becomes SCHEDULED, appAttempt 
State will never comes to ALLOCATED if no other nm heartbeat comes.
just as the failed case 
https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testShouldNotCountFailureToMaxAttemptRetry/
https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testPreemptedAMRestartOnRMRestart/

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical
  Labels: patch
 Attachments: YARN-3387.001.patch, YARN-3387.002.patch


 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation，all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.

2015-04-22 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507286#comment-14507286
 ] 

Steve Loughran commented on YARN-2444:
--

+add test to submit 100+K events and see what happens. 

 Primary filters added after first submission not indexed, cause exceptions in 
 logs.
 ---

 Key: YARN-2444
 URL: https://issues.apache.org/jira/browse/YARN-2444
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.5.0
Reporter: Marcelo Vanzin
Assignee: Steve Loughran
 Attachments: YARN-2444-001.patch, ats.java, 
 org.apache.hadoop.yarn.server.timeline.TestTimelineClientPut-output.txt


 See attached code for an example. The code creates an entity with a primary 
 filter, submits it to the ATS. After that, a new primary filter value is 
 added and the entity is resubmitted. At that point two things can be seen:
 - Searching for the new primary filter value does not return the entity
 - The following exception shows up in the logs:
 {noformat}
 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying 
 access for user dr.who (auth:SIMPLE) on the events of the timeline entity { 
 id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test }
 org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the 
 timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test 
 } is corrupted.
 at 
 org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67)
 at 
 org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3495) Confusing log generated by FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507084#comment-14507084
 ] 

Hudson commented on YARN-3495:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/])
YARN-3495. Confusing log generated by FairScheduler. Contributed by Brahma 
Reddy Battula. (ozawa: rev 105afd54779852c518b978101f23526143e234a5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Confusing log generated by FairScheduler
 

 Key: YARN-3495
 URL: https://issues.apache.org/jira/browse/YARN-3495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: YARN-3495.patch


 2015-04-16 12:03:48,531 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507137#comment-14507137
 ] 

Wangda Tan commented on YARN-3413:
--

Test build failures seems caused by build environment, I can get build passed 
locally, retriggerred Jenkins. Checkstyle and whitespace checks are new added 
with Allen's patch, will try to fix them.

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-22 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-3434:

Attachment: YARN-3434.patch

Upmerged patch to latest 

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3495) Confusing log generated by FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507039#comment-14507039
 ] 

Hudson commented on YARN-3495:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/])
YARN-3495. Confusing log generated by FairScheduler. Contributed by Brahma 
Reddy Battula. (ozawa: rev 105afd54779852c518b978101f23526143e234a5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 Confusing log generated by FairScheduler
 

 Key: YARN-3495
 URL: https://issues.apache.org/jira/browse/YARN-3495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: YARN-3495.patch


 2015-04-16 12:03:48,531 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507045#comment-14507045
 ] 

Hudson commented on YARN-3410:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/])
YARN-3410. YARN admin should be able to remove individual application records 
from RMStateStore. (Rohith Sharmaks via wangda) (wangda: rev 
e71d0d87d9b388f211a8eb3d2cd9af347abf9bda)
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestLeveldbRMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
YARN-3410. Addendum fix for compilation error. Contributed by Rohith. 
(aajisaka: rev b08908ae5eaf60a7fc70bf60493a533e915553c5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md


 YARN admin should be able to remove individual application records from 
 RMStateStore
 

 Key: YARN-3410
 URL: https://issues.apache.org/jira/browse/YARN-3410
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, yarn
Reporter: Wangda Tan
Assignee: Rohith
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0001-YARN-3410-v1.patch, 0001-YARN-3410.patch, 
 0001-YARN-3410.patch, 0002-YARN-3410.patch, 0003-YARN-3410.patch, 
 0004-YARN-3410-addendum-branch-2.patch, 0004-YARN-3410-addendum.patch, 
 0004-YARN-3410-branch-2.patch, 0004-YARN-3410.patch


 When RM state store entered an unexpected state, one example is YARN-2340, 
 when an attempt is not in final state but app already completed, RM can never 
 get up unless format RMStateStore.
 I think we should support remove individual application records from 
 RMStateStore to unblock RM admin make choice of either waiting for a fix or 
 format state store.
 In addition, RM should be able to report all fatal errors (which will 
 shutdown RM) when doing app recovery, this can save admin some time to remove 
 apps in bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3503) Expose disk utilization percentage and bad local and log dir counts on NM via JMX


[ 
https://issues.apache.org/jira/browse/YARN-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507050#comment-14507050
 ] 

Hudson commented on YARN-3503:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/])
YARN-3503. Expose disk utilization percentage and bad local and log dir counts 
in NM metrics. Contributed by Varun Vasudev (jianhe: rev 
674c7ef64916fabbe59c8d6cdd50ca19cf7ddb7c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java


 Expose disk utilization percentage and bad local and log dir counts on NM via 
 JMX
 -

 Key: YARN-3503
 URL: https://issues.apache.org/jira/browse/YARN-3503
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-3503.0.patch


 It would be useful to expose the disk utilization as well as the number of 
 bad local disks on the NMs via JMX so that alerts can be setup for nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3494) Expose AM resource limit and usage in QueueMetrics


[ 
https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507044#comment-14507044
 ] 

Hudson commented on YARN-3494:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/])
YARN-3494. Expose AM resource limit and usage in CS QueueMetrics. Contributed 
by Rohith Sharmaks (jianhe: rev bdd90110e6904b59746812d9a093924a65e72280)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueMetrics.java


 Expose AM resource limit and usage in QueueMetrics 
 ---

 Key: YARN-3494
 URL: https://issues.apache.org/jira/browse/YARN-3494
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.8.0

 Attachments: 0001-YARN-3494.patch, 0002-YARN-3494.patch, 
 0002-YARN-3494.patch


 Now we have the AM resource limit and user limit shown on the web UI, it 
 would be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3503) Expose disk utilization percentage and bad local and log dir counts on NM via JMX


[ 
https://issues.apache.org/jira/browse/YARN-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507095#comment-14507095
 ] 

Hudson commented on YARN-3503:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/])
YARN-3503. Expose disk utilization percentage and bad local and log dir counts 
in NM metrics. Contributed by Varun Vasudev (jianhe: rev 
674c7ef64916fabbe59c8d6cdd50ca19cf7ddb7c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java


 Expose disk utilization percentage and bad local and log dir counts on NM via 
 JMX
 -

 Key: YARN-3503
 URL: https://issues.apache.org/jira/browse/YARN-3503
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: YARN-3503.0.patch


 It would be useful to expose the disk utilization as well as the number of 
 bad local disks on the NMs via JMX so that alerts can be setup for nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507090#comment-14507090
 ] 

Hudson commented on YARN-3410:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/])
YARN-3410. YARN admin should be able to remove individual application records 
from RMStateStore. (Rohith Sharmaks via wangda) (wangda: rev 
e71d0d87d9b388f211a8eb3d2cd9af347abf9bda)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestLeveldbRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
YARN-3410. Addendum fix for compilation error. Contributed by Rohith. 
(aajisaka: rev b08908ae5eaf60a7fc70bf60493a533e915553c5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md


 YARN admin should be able to remove individual application records from 
 RMStateStore
 

 Key: YARN-3410
 URL: https://issues.apache.org/jira/browse/YARN-3410
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, yarn
Reporter: Wangda Tan
Assignee: Rohith
Priority: Critical
 Fix For: 2.8.0

 Attachments: 0001-YARN-3410-v1.patch, 0001-YARN-3410.patch, 
 0001-YARN-3410.patch, 0002-YARN-3410.patch, 0003-YARN-3410.patch, 
 0004-YARN-3410-addendum-branch-2.patch, 0004-YARN-3410-addendum.patch, 
 0004-YARN-3410-branch-2.patch, 0004-YARN-3410.patch


 When RM state store entered an unexpected state, one example is YARN-2340, 
 when an attempt is not in final state but app already completed, RM can never 
 get up unless format RMStateStore.
 I think we should support remove individual application records from 
 RMStateStore to unblock RM admin make choice of either waiting for a fix or 
 format state store.
 In addition, RM should be able to report all fatal errors (which will 
 shutdown RM) when doing app recovery, this can save admin some time to remove 
 apps in bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3494) Expose AM resource limit and usage in QueueMetrics


[ 
https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507089#comment-14507089
 ] 

Hudson commented on YARN-3494:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/])
YARN-3494. Expose AM resource limit and usage in CS QueueMetrics. Contributed 
by Rohith Sharmaks (jianhe: rev bdd90110e6904b59746812d9a093924a65e72280)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


 Expose AM resource limit and usage in QueueMetrics 
 ---

 Key: YARN-3494
 URL: https://issues.apache.org/jira/browse/YARN-3494
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.8.0

 Attachments: 0001-YARN-3494.patch, 0002-YARN-3494.patch, 
 0002-YARN-3494.patch


 Now we have the AM resource limit and user limit shown on the web UI, it 
 would be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507123#comment-14507123
 ] 

Hadoop QA commented on YARN-3225:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 54s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 28s | The applied patch generated  
18  additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 48s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m 17s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  55m 49s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 110m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727006/YARN-3225-5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b08908a |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7446/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7446/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7446//console |


This message was automatically generated.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
 YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2015-04-22 Thread gu-chi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506991#comment-14506991
 ] 

gu-chi commented on YARN-2308:
--

Hi, Chang Li, as I went through the patches that you attached, previously these 
was 
+if (application==null) {
+  LOG.info(can't retireve application attempt);
+  return;
+}
but, finally, the patch merged does not have this modification. Is this updated 
on purpose?
What is the concern?
I am now facing one scenario, App status is Finished and AppAttempt status is 
null, this way when doing recover, application is null in CS and then NPE 
occur. I am thinking if condition application==null was there, the issue I 
meet will not occur.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Chang Li
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2308.0.patch, YARN-2308.1.patch, jira2308.patch, 
 jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2015-04-22 Thread gu-chi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506992#comment-14506992
 ] 

gu-chi commented on YARN-2308:
--

Hi, Chang Li, as I went through the patches that you attached, previously these 
was 
+if (application==null) {
+  LOG.info(can't retireve application attempt);
+  return;
+}
but, finally, the patch merged does not have this modification. Is this updated 
on purpose?
What is the concern?
I am now facing one scenario, App status is Finished and AppAttempt status is 
null, this way when doing recover, application is null in CS and then NPE 
occur. I am thinking if condition application==null was there, the issue I 
meet will not occur.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Chang Li
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2308.0.patch, YARN-2308.1.patch, jira2308.patch, 
 jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation


[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507222#comment-14507222
 ] 

Hadoop QA commented on YARN-3434:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 33s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 30s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 17s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |  55m  9s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727222/YARN-3434.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b08908a |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7447/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7447/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7447/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7447//console |


This message was automatically generated.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-04-22 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507181#comment-14507181
 ] 

Junping Du commented on YARN-3225:
--

v5 patch LGTM. The new Jenkins result just report some trivial format issue but 
failed to report details to improve it.
+1. I will go ahead to commit it shortly.

 New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
 ---

 Key: YARN-3225
 URL: https://issues.apache.org/jira/browse/YARN-3225
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Devaraj K
 Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
 YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch


 New CLI (or existing CLI with parameters) should put each node on 
 decommission list to decommissioning status and track timeout to terminate 
 the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2015-04-22 Thread Chang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507245#comment-14507245
 ] 

Chang Li commented on YARN-2308:


Hi [~gu chi], this jira is intended to fix the NPE caused by a missing queue 
result from queue configuration during rm restart. I did some early work on 
this problem, and my initial approach is to do a null check in the exact place 
NPE happened in addApplicationAttempt. Craig Welch carried on with a different 
approach. The final patch is checking if the queue is removed and err out. I 
think your problem is worth firing a separate jira, also I'd like to take on 
the issue you mentioned. Thanks 

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Chang Li
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2308.0.patch, YARN-2308.1.patch, jira2308.patch, 
 jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts

2015-04-22 Thread sandflee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507266#comment-14507266
 ] 

sandflee commented on YARN-2038:


If nm register to rm in a short time, we can add a interface to 
ApplicationMasterService to tell am container comes back.
If nm are not registered to rm after nm expire time, rm knows nothing about nm 
now. Could AM tell RM the node and container Info through 
ApplicationMasterService.registerApplicationMaster  while reregister to rm? 
with this info, RM could treat the unreigstered NM  as a lost NODE after nm 
expire time, and pass the container complete msg to am.  
In this solution , we need am to store container info.

 Revisit how AMs learn of containers from previous attempts
 --

 Key: YARN-2038
 URL: https://issues.apache.org/jira/browse/YARN-2038
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 Based on YARN-556, we need to update the way AMs learn about containers 
 allocation previous attempts. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3409) Add constraint node labels

2015-04-22 Thread David Villegas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507169#comment-14507169
 ] 

David Villegas commented on YARN-3409:
--

Hi [~wangda],

Are you planning to make the constraints static, i.e., set by an administrator? 
Or dynamic, so that they could reflect the current state of the cluster? I was 
wondering if this type of labels could be used to implement anti-affinity as 
described in YARN-1042. It seems to me this feature could potentially be 
similar to [Condor 
ClassAds|http://research.cs.wisc.edu/htcondor/manual/v7.6/4_1Condor_s_ClassAd.html],
 where a container request could specify things like the average load of the 
machine, or whether it is already running containers for a particular 
application type.

 Add constraint node labels
 --

 Key: YARN-3409
 URL: https://issues.apache.org/jira/browse/YARN-3409
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, capacityscheduler, client
Reporter: Wangda Tan
Assignee: Wangda Tan

 Specify only one label for each node (IAW, partition a cluster) is a way to 
 determinate how resources of a special set of nodes could be shared by a 
 group of entities (like teams, departments, etc.). Partitions of a cluster 
 has following characteristics:
 - Cluster divided to several disjoint sub clusters.
 - ACL/priority can apply on partition (Only market team / marke team has 
 priority to use the partition).
 - Percentage of capacities can apply on partition (Market team has 40% 
 minimum capacity and Dev team has 60% of minimum capacity of the partition).
 Constraints are orthogonal to partition, they’re describing attributes of 
 node’s hardware/software just for affinity. Some example of constraints:
 - glibc version
 - JDK version
 - Type of CPU (x86_64/i686)
 - Type of OS (windows, linux, etc.)
 With this, application can be able to ask for resource has (glibc.version = 
 2.20  JDK.version = 8u20  x86_64).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3511) Add errors and warnings page to ATS

2015-04-22 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3511:

Attachment: YARN-3511.002.patch

Added check to ensure only admins can access errors and warnings page.

 Add errors and warnings page to ATS
 ---

 Key: YARN-3511
 URL: https://issues.apache.org/jira/browse/YARN-3511
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-3511.001.patch, YARN-3511.002.patch


 YARN-2901 adds the capability to view errors and warnings on the web UI. The 
 ATS was missed out. Add support for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission


[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507416#comment-14507416
 ] 

Rohith commented on YARN-3223:
--

[~varun_saxena] Are you woking on this JIRA? Would you mind if I take over this 
If you have not started working on this?

 Resource update during NM graceful decommission
 ---

 Key: YARN-3223
 URL: https://issues.apache.org/jira/browse/YARN-3223
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Junping Du
Assignee: Varun Saxena

 During NM graceful decommission, we should handle resource update properly, 
 include: make RMNode keep track of old resource for possible rollback, keep 
 available resource to 0 and used resource get updated when
 container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3522) DistributedShell uses the wrong user to put timeline data


 [ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3522:
--
Attachment: YARN-3522.1.patch

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3522) DistributedShell uses the wrong user to put timeline data


 [ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3522:
--
Attachment: (was: YARN-3522.1.patch)

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data


[ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507503#comment-14507503
 ] 

Zhijie Shen commented on YARN-3522:
---

/cc [~jeagles]

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507409#comment-14507409
 ] 

Wangda Tan commented on YARN-2308:
--

[~gu chi], the issue you mentioned seems like already solved by YARN-2340. 
Could you please check?

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Chang Li
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2308.0.patch, YARN-2308.1.patch, jira2308.patch, 
 jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows

2015-04-22 Thread Inigo Goiri (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507482#comment-14507482
 ] 

Inigo Goiri commented on YARN-3458:
---

Anybody has inputs on the findbugs issues?

What about the unit test? Any proposal?

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor
  Labels: containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-22 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507504#comment-14507504
 ] 

Junping Du commented on YARN-3411:
--

Thanks [~vrushalic] for reply!
bq.  But I will be uploading a refined patch + some more changes like Metric 
writing soon. 
+1. The plan sounds good to me.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507511#comment-14507511
 ] 

Hadoop QA commented on YARN-3413:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 19 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 30  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 25s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 27s | The applied patch generated  
11  additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m  2s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | mapreduce tests | 107m 21s | Tests passed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 28s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  0s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| {color:green}+1{color} | yarn tests |   7m 13s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  54m 14s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 223m 41s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727015/YARN-3413.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b08908a |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7448//console |


This message was automatically generated.

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507531#comment-14507531
 ] 

Wangda Tan commented on YARN-3413:
--

Failed test (TestAMRestart) is not related to this patch, it can get passed 
locally.

checkstyle/whitespace are lack of details, and they're some minor formatting 
suggestions.

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled


[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507436#comment-14507436
 ] 

Wangda Tan commented on YARN-2740:
--

Latest patch LGTM, +1. checkstyle result lack of details and some minor 
formatting suggestions, will commit today.

 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
 YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
 YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, 
 YARN-2740.20150421-1.patch, YARN-2740.20150422-2.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3525) Rename fair scheduler properties increment-allocation-mb and increment-allocation-vcores

2015-04-22 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507449#comment-14507449
 ] 

Bibin A Chundatt commented on YARN-3525:


Hi [~ywskycn] , Thank you for looking in to the issue.
All the fair scheduler properties we mention in yarn-site only except queue 
level configuration. (About 13 properties)
Reference 
:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
Except the mentioned properties all of them follow the same prefix pattern 
specific to fair. So i feel we should sync these two properties also.






 Rename fair scheduler properties increment-allocation-mb and 
 increment-allocation-vcores
 

 Key: YARN-3525
 URL: https://issues.apache.org/jira/browse/YARN-3525
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor

 Rename below two properties since only used by fair scheduler 
 {color:blue}yarn.scheduler.increment-allocation-mb{color} to 
 {color:red}yarn.scheduler.fair.increment-allocation-mb{color}
 {color:blue}yarn.scheduler.increment-allocation-vcores{color} to  
 {color:red}yarn.scheduler.fair.increment-allocation-vcores{color}
 All other properties only for fair scheduler are using {color:red} 
 yarn.scheduler.fair{color} prefix .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3511) Add errors and warnings page to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507489#comment-14507489
 ] 

Hadoop QA commented on YARN-3511:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 51s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   3m 13s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  54m 29s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m 46s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727254/YARN-3511.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b08908a |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7449/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7449/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7449/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7449/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7449//console |


This message was automatically generated.

 Add errors and warnings page to ATS
 ---

 Key: YARN-3511
 URL: https://issues.apache.org/jira/browse/YARN-3511
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-3511.001.patch, YARN-3511.002.patch


 YARN-2901 adds the capability to view errors and warnings on the web UI. The 
 ATS was missed out. Add support for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-04-22 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507501#comment-14507501
 ] 

Sunil G commented on YARN-1963:
---

Thank you [~grey] for sharing the thoughts.
As per the design, integer will be used in schedulers all alone. Hence all 
comparisons and operations can be done on integer. However we can have a label 
mapping for the integer which can be used while application submission, and to 
view in UI etc. Labels can be added as only a mappings to integer.

 


 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: 0001-YARN-1963-prototype.patch, YARN Application 
 Priorities Design.pdf, YARN Application Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data


[ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507516#comment-14507516
 ] 

Hadoop QA commented on YARN-3522:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727285/YARN-3522.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1f4767c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7450//console |


This message was automatically generated.

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3522) DistributedShell uses the wrong user to put timeline data


 [ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3522:
--
Attachment: YARN-3522.2.patch

Previous patch was not generated correctly. Create a new one.

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch, YARN-3522.2.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly

2015-04-22 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2605:

Attachment: YARN-2605.2.patch

 [RM HA] Rest api endpoints doing redirect incorrectly
 -

 Key: YARN-2605
 URL: https://issues.apache.org/jira/browse/YARN-2605
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: bc Wong
Assignee: Xuan Gong
  Labels: newbie
 Attachments: YARN-2605.1.patch, YARN-2605.2.patch


 The standby RM's webui tries to do a redirect via meta-refresh. That is fine 
 for pages designed to be viewed by web browsers. But the API endpoints 
 shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd 
 suggest HTTP 303, or return a well-defined error message (json or xml) 
 stating that the standby status and a link to the active RM.
 The standby RM is returning this today:
 {noformat}
 $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics
 HTTP/1.1 200 OK
 Cache-Control: no-cache
 Expires: Thu, 25 Sep 2014 18:34:53 GMT
 Date: Thu, 25 Sep 2014 18:34:53 GMT
 Pragma: no-cache
 Expires: Thu, 25 Sep 2014 18:34:53 GMT
 Date: Thu, 25 Sep 2014 18:34:53 GMT
 Pragma: no-cache
 Content-Type: text/plain; charset=UTF-8
 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
 Content-Length: 117
 Server: Jetty(6.1.26)
 This is standby RM. Redirecting to the current active RM: 
 http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


 [ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3413:
-
Attachment: YARN-3413.6.patch

Fixed trivial whitespace checks. (Ver.6)

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507689#comment-14507689
 ] 

Hadoop QA commented on YARN-3413:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  1s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727305/YARN-3413.6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 12f4df0 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7453//console |


This message was automatically generated.

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3529:
-
Attachment: (was: output_minicluster.rtf)

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: output_minicluster.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3529:
-
Attachment: output_minicluster2.txt

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3529:
-
Attachment: (was: output_minicluster.txt)

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy

2015-04-22 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507701#comment-14507701
 ] 

Craig Welch commented on YARN-3319:
---

bq. Some minor comments about configuration part

by index:
1) done
2) done
3) done - see below
bq. Do you think is it better to make property in 
queue-name.ordering-policy.policy-name.property-key?...
Now that there is not proper composition only one policy can be active at a 
time and it shouldn't be necessary to namespace config items this way.  At the 
same time, I could see us getting back to proper composition at some point, 
where this would be helpful. I've implemented it as a prefix convention in the 
policy instead of constraining the contents of the map in the capacity 
scheduler configuration.  This is because we still support passing a class name 
as the policy type, which would make the configurations for class name based 
items unwieldy.  It would also allow us to have shared configuration items 
between policies if we do end up with proper composition again.  The end result 
of the configuration was as you suggested
4) done
5) done

bq. FairOrderingPolicy:
all 3 done

bq. Findbugs warning?
Failed to stage change, so it didn't make it into patch, should be there now.


 Implement a FairOrderingPolicy
 --

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
 YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, 
 YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, 
 YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, 
 YARN-3319.72.patch, YARN-3319.73.patch


 Implement a FairOrderingPolicy which prefers to allocate to 
 SchedulerProcesses with least current usage, very similar to the 
 FairScheduler's FairSharePolicy.  
 The Policy will offer allocations to applications in a queue in order of 
 least resources used, and preempt applications in reverse order (from most 
 resources used). This will include conditional support for sizeBasedWeight 
 style adjustment
 Optionally, based on a conditional configuration to enable sizeBasedWeight 
 (default false), an adjustment to boost larger applications (to offset the 
 natural preference for smaller applications) will adjust the resource usage 
 value based on demand, dividing it by the below value:
 Math.log1p(app memory demand) / Math.log(2);
 In cases where the above is indeterminate (two applications are equal after 
 this comparison), behavior falls back to comparison based on the application 
 id, which is generally lexically FIFO for that comparison



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.


 [ 
https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-3530:
-

Assignee: Zhijie Shen

 ATS throws exception on trying to filter results without otherinfo.
 ---

 Key: YARN-3530
 URL: https://issues.apache.org/jira/browse/YARN-3530
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Sreenath Somarajapuram
Assignee: Zhijie Shen
Priority: Blocker

 Scenario:
 Am attempting to make data loading faster by fetching otherinfo on demand. As 
 shown in the attached image, the patch adds a 'Load Counters' checkbox. It 
 would be disabled by default, and on clicking, the counter data also would be 
 loaded.
 Issue:
 Things are good when otherinfo is loaded.
 But ATS throws exception on trying to filter on status or applicationId 
 without otherinfo in fields list.
 In other words, using fields=events,primaryfilters with 
 secondaryFilter=status:RUNNING will return
 { exception: WebApplicationException, message: 
 java.lang.NullPointerException, javaClassName: 
 javax.ws.rs.WebApplicationException }
 from the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3531) Make good local directories available to ContainerExecutors at initialization time

Sidharta Seethana created YARN-3531:
---

 Summary: Make good local directories available to 
ContainerExecutors at initialization time
 Key: YARN-3531
 URL: https://issues.apache.org/jira/browse/YARN-3531
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sidharta Seethana


Currently, in the NodeManager's serviceInit() function, the configured executor 
is initialized before the node health checker/directory handler service are 
initialized. There are use cases where executor initialization requires access 
to 'good' local directories ( e.g for creation of temporary files , see 
YARN-3366 ).  We need to figure out a way to make this possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data

2015-04-22 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507685#comment-14507685
 ] 

Jian He commented on YARN-3522:
---

- I think YARN-3287 in some sense is incompatible, since it forces user to use 
doAs to create the timeLineClient which is not required before.  Is this ok ? 
 I suggest adding a code comment in TimeLineClient#createTimelineClient to say 
caller must use doAs to create the timeLineClient

- start and end event occurred in the same run() method ?
{code}
if(timelineClient != null) {
  publishApplicationAttemptEvent(timelineClient, appAttemptID.toString(),
  DSEvent.DS_APP_ATTEMPT_START, domainId, appSubmitterUgi);
}
{code}

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch, YARN-3522.2.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3529:
-
Attachment: output_minicluster.rtf

Thanks [~gtCarrera9] for filing the jira!

Current status:
I presently am using the hbase minicluster from HBaseTestingUtility  in the 
unit tests for YARN-3411. Right now, I have my setup working in eclipse. 
Attaching the eclipse log that shows that a mini hbase cluster/zookeeper/ 
regionservers are starting and creating tables and shutting down when I run the 
unit test from 
org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.
 

Some relevant code bits:
{code}
  private static HBaseTestingUtility UTIL;

  @BeforeClass
  public static void setupBeforeClass() throws Exception {
UTIL = new HBaseTestingUtility();
UTIL.startMiniCluster();
createSchema();
  }

  @AfterClass
  public static void tearDownAfterClass() throws Exception {
UTIL.shutdownMiniCluster();
  }
{code} 


 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: output_minicluster.rtf


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507576#comment-14507576
 ] 

Wangda Tan commented on YARN-3413:
--

Commented on HADOOP-11746: 
https://issues.apache.org/jira/browse/HADOOP-11746?focusedCommentId=14507573page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507573
 as well.

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests

2015-04-22 Thread Li Lu (JIRA)

Li Lu created YARN-3529:
---

 Summary: Add miniHBase cluster and Phoenix support to ATS v2 unit 
tests
 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


After we have our HBase and Phoenix writer implementations, we may want to find 
a way to set up HBase and Phoenix in our unit tests. We need to do this 
integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.

2015-04-22 Thread Sreenath Somarajapuram (JIRA)

Sreenath Somarajapuram created YARN-3530:


 Summary: ATS throws exception on trying to filter results without 
otherinfo.
 Key: YARN-3530
 URL: https://issues.apache.org/jira/browse/YARN-3530
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Sreenath Somarajapuram
Priority: Blocker


Scenario:
Am attempting to make data loading faster by fetching otherinfo on demand. As 
shown in the attached image, the patch adds a 'Load Counters' checkbox. It 
would be disabled by default, and on clicking, the counter data also would be 
loaded.
Issue:
Things are good when otherinfo is loaded.
But ATS throws exception on trying to filter on status or applicationId without 
otherinfo in fields list.
In other words, using fields=events,primaryfilters with 
secondaryFilter=status:RUNNING will return
{ exception: WebApplicationException, message: 
java.lang.NullPointerException, javaClassName: 
javax.ws.rs.WebApplicationException }
from the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.


 [ 
https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3530:
--
 Component/s: (was: yarn)
  timelineserver
Priority: Critical  (was: Blocker)
Target Version/s: 2.8.0

 ATS throws exception on trying to filter results without otherinfo.
 ---

 Key: YARN-3530
 URL: https://issues.apache.org/jira/browse/YARN-3530
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Sreenath Somarajapuram
Assignee: Zhijie Shen
Priority: Critical

 Scenario:
 Am attempting to make data loading faster by fetching otherinfo on demand. As 
 shown in the attached image, the patch adds a 'Load Counters' checkbox. It 
 would be disabled by default, and on clicking, the counter data also would be 
 loaded.
 Issue:
 Things are good when otherinfo is loaded.
 But ATS throws exception on trying to filter on status or applicationId 
 without otherinfo in fields list.
 In other words, using fields=events,primaryfilters with 
 secondaryFilter=status:RUNNING will return
 { exception: WebApplicationException, message: 
 java.lang.NullPointerException, javaClassName: 
 javax.ws.rs.WebApplicationException }
 from the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data


[ 
https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507641#comment-14507641
 ] 

Hadoop QA commented on YARN-3522:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 36s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   7m 46s | The applied patch generated  2 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 58s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |   6m 43s | Tests failed in 
hadoop-yarn-applications-distributedshell. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| | |  52m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727290/YARN-3522.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1f4767c |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7451/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7451//console |


This message was automatically generated.

 DistributedShell uses the wrong user to put timeline data
 -

 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-3522.1.patch, YARN-3522.2.patch


 YARN-3287 breaks the timeline access control of distributed shell. In 
 distributed shell AM:
 {code}
 if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
   YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
   // Creating the Timeline Client
   timelineClient = TimelineClient.createTimelineClient();
   timelineClient.init(conf);
   timelineClient.start();
 } else {
   timelineClient = null;
   LOG.warn(Timeline service is not enabled);
 }
 {code}
 {code}
   ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
 @Override
 public TimelinePutResponse run() throws Exception {
   return timelineClient.putEntities(entity);
 }
   });
 {code}
 YARN-3287 changes the timeline client to get the right ugi at serviceInit, 
 but DS AM still doesn't use submitter ugi to init timeline client, but use 
 the ugi for each put entity call. It result in the wrong user of the put 
 request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507657#comment-14507657
 ] 

Sidharta Seethana commented on YARN-3366:
-

Hi [~vinodkv],

{quote}
conf.get(hadoop.tmp.dir): We should write to the nmPrivate directories 
instead of /tmp.
{quote}

Digging in this further, it turns out that the change is far from trivial 
because of the way initialization works in the node manager today. I filed a 
separate JIRA to track this : https://issues.apache.org/jira/browse/YARN-3531 .

I'll update the patch based on the rest of the feedback as discussed above.

thanks

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly

2015-04-22 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507656#comment-14507656
 ] 

Xuan Gong commented on YARN-2605:
-

Uploaded a new patch, and verified in a single node HA cluster.

 [RM HA] Rest api endpoints doing redirect incorrectly
 -

 Key: YARN-2605
 URL: https://issues.apache.org/jira/browse/YARN-2605
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: bc Wong
Assignee: Xuan Gong
  Labels: newbie
 Attachments: YARN-2605.1.patch, YARN-2605.2.patch


 The standby RM's webui tries to do a redirect via meta-refresh. That is fine 
 for pages designed to be viewed by web browsers. But the API endpoints 
 shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd 
 suggest HTTP 303, or return a well-defined error message (json or xml) 
 stating that the standby status and a link to the active RM.
 The standby RM is returning this today:
 {noformat}
 $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics
 HTTP/1.1 200 OK
 Cache-Control: no-cache
 Expires: Thu, 25 Sep 2014 18:34:53 GMT
 Date: Thu, 25 Sep 2014 18:34:53 GMT
 Pragma: no-cache
 Expires: Thu, 25 Sep 2014 18:34:53 GMT
 Date: Thu, 25 Sep 2014 18:34:53 GMT
 Pragma: no-cache
 Content-Type: text/plain; charset=UTF-8
 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
 Content-Length: 117
 Server: Jetty(6.1.26)
 This is standby RM. Redirecting to the current active RM: 
 http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3529:
-
Attachment: output_minicluster.txt

Attaching the eclipse log as a .txt

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: output_minicluster.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy

2015-04-22 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3319:
--
Attachment: YARN-3319.74.patch

 Implement a FairOrderingPolicy
 --

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
 YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, 
 YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, 
 YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, 
 YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch


 Implement a FairOrderingPolicy which prefers to allocate to 
 SchedulerProcesses with least current usage, very similar to the 
 FairScheduler's FairSharePolicy.  
 The Policy will offer allocations to applications in a queue in order of 
 least resources used, and preempt applications in reverse order (from most 
 resources used). This will include conditional support for sizeBasedWeight 
 style adjustment
 Optionally, based on a conditional configuration to enable sizeBasedWeight 
 (default false), an adjustment to boost larger applications (to offset the 
 natural preference for smaller applications) will adjust the resource usage 
 value based on demand, dividing it by the below value:
 Math.log1p(app memory demand) / Math.log(2);
 In cases where the above is indeterminate (two applications are equal after 
 this comparison), behavior falls back to comparison based on the application 
 id, which is generally lexically FIFO for that comparison



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-2168) SCM/Client/NM/Admin protocols


 [ 
https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened YARN-2168:
---

 SCM/Client/NM/Admin protocols
 -

 Key: YARN-2168
 URL: https://issues.apache.org/jira/browse/YARN-2168
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch


 This jira is meant to be used to review the main shared cache APIs. They are 
 as follows:
 * ClientSCMProtocol - The protocol between the yarn client and the cache 
 manager. This protocol controls how resources in the cache are claimed and 
 released.
 ** UseSharedCacheResourceRequest
 ** UseSharedCacheResourceResponse
 ** ReleaseSharedCacheResourceRequest
 ** ReleaseSharedCacheResourceResponse
 * SCMAdminProtocol - This is an administrative protocol for the cache 
 manager. It allows administrators to manually trigger cleaner runs.
 ** RunSharedCacheCleanerTaskRequest
 ** RunSharedCacheCleanerTaskResponse
 * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the 
 cache manager. This allows the NodeManager to coordinate with the cache 
 manager when uploading new resources to the shared cache.
 ** NotifySCMRequest
 ** NotifySCMResponse



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (YARN-2168) SCM/Client/NM/Admin protocols


 [ 
https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed YARN-2168.
-

 SCM/Client/NM/Admin protocols
 -

 Key: YARN-2168
 URL: https://issues.apache.org/jira/browse/YARN-2168
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch


 This jira is meant to be used to review the main shared cache APIs. They are 
 as follows:
 * ClientSCMProtocol - The protocol between the yarn client and the cache 
 manager. This protocol controls how resources in the cache are claimed and 
 released.
 ** UseSharedCacheResourceRequest
 ** UseSharedCacheResourceResponse
 ** ReleaseSharedCacheResourceRequest
 ** ReleaseSharedCacheResourceResponse
 * SCMAdminProtocol - This is an administrative protocol for the cache 
 manager. It allows administrators to manually trigger cleaner runs.
 ** RunSharedCacheCleanerTaskRequest
 ** RunSharedCacheCleanerTaskResponse
 * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the 
 cache manager. This allows the NodeManager to coordinate with the cache 
 manager when uploading new resources to the shared cache.
 ** NotifySCMRequest
 ** NotifySCMResponse



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3529:
-
Attachment: (was: output_minicluster2.txt)

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu

 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2168) SCM/Client/NM/Admin protocols


 [ 
https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-2168.
---
   Resolution: Duplicate
Fix Version/s: 2.7.0

Resolving this instead as a duplicate.

 SCM/Client/NM/Admin protocols
 -

 Key: YARN-2168
 URL: https://issues.apache.org/jira/browse/YARN-2168
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch


 This jira is meant to be used to review the main shared cache APIs. They are 
 as follows:
 * ClientSCMProtocol - The protocol between the yarn client and the cache 
 manager. This protocol controls how resources in the cache are claimed and 
 released.
 ** UseSharedCacheResourceRequest
 ** UseSharedCacheResourceResponse
 ** ReleaseSharedCacheResourceRequest
 ** ReleaseSharedCacheResourceResponse
 * SCMAdminProtocol - This is an administrative protocol for the cache 
 manager. It allows administrators to manually trigger cleaner runs.
 ** RunSharedCacheCleanerTaskRequest
 ** RunSharedCacheCleanerTaskResponse
 * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the 
 cache manager. This allows the NodeManager to coordinate with the cache 
 manager when uploading new resources to the shared cache.
 ** NotifySCMRequest
 ** NotifySCMResponse



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-2654) Revisit all shared cache config parameters to ensure quality names


 [ 
https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened YARN-2654:
---

 Revisit all shared cache config parameters to ensure quality names
 --

 Key: YARN-2654
 URL: https://issues.apache.org/jira/browse/YARN-2654
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Blocker
 Attachments: shared_cache_config_parameters.txt


 Revisit all the shared cache config parameters in YarnConfiguration and 
 yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3529:
-
Attachment: output_minicluster2.txt

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2654) Revisit all shared cache config parameters to ensure quality names


 [ 
https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-2654.
---
Resolution: Won't Fix

Closing as 'Won't Fix'

 Revisit all shared cache config parameters to ensure quality names
 --

 Key: YARN-2654
 URL: https://issues.apache.org/jira/browse/YARN-2654
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Blocker
 Attachments: shared_cache_config_parameters.txt


 Revisit all the shared cache config parameters in YarnConfiguration and 
 yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-22 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-3434:

Attachment: YARN-3434.patch

Fixed the line length and the white space style issues.  Other then that I 
moved things around and its just complaining about the same things more.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM

2015-04-22 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508097#comment-14508097
 ] 

Li Lu commented on YARN-3390:
-

Hi [~zjshen], thanks for the patch! Here are some of my comments. Most of them 
are quite minor:

# Changes in RMContainerAllocator.java appears to be irrelevant. Seems like 
this is changed by an IDE by mistake (on a refactoring)?
# In the following lines
{code}
+for (String tag : app.getApplicationTags()) {
+  String value = null;
+  if ((value = getFlowContext(TimelineUtils.FLOW_NAME_TAG_PREFIX, tag)) != 
null
the first null assignment to value is marked as redundant
+  if ((value = getFlowContext(TimelineUtils.FLOW_NAME_TAG_PREFIX, tag)) != 
null
+   !value.isEmpty()) {
+  collector.getTimelineEntityContext().setFlowName(value);
+  } else if ((value = 
getFlowContext(TimelineUtils.FLOW_VERSION_TAG_PREFIX, tag)) != null
+   !value.isEmpty()) {
+collector.getTimelineEntityContext().setFlowVersion(value);
+  } else if ((value = getFlowContext(TimelineUtils.FLOW_RUN_ID_TAG_PREFIX, 
tag)) != null
+   !value.isEmpty()) {
+collector.getTimelineEntityContext().setFlowRunId(Long.valueOf(value));
+  }
{code}
Maybe we’d like to use a switch statement to deal with this? We may first split 
the tag into two parts, based on the first “:”, and then switch the first part 
of the returned array to set the second part of the array into flow name, 
version, and run id. Am I missing any fundamental obstacles for us to do this 
here? (String switch is available from Java 7)
# Rename {{MyNMTimelineCollectorManager}} in 
TestTimelineServiceClientIntegration with something indicating it's for testing?
# In the following lines:
{code}
-  protected TimelineCollectorContext getTimelineEntityContext() {
+  public TimelineCollectorContext getTimelineEntityContext() {
{code}
We're exposing TimelineCollectorContext but we're not annotating the class. 
Even though we may treat unannotated classes as Audience.Private, maybe we'd 
like to mark it as unstable?
# In TimelineCollectorManager, I'm still having this question, although we may 
not want to address it in this JIRA: are there any special consistency 
requirements that prevent us from using ConcurrentHashMap?
# In TimelineCollectorWebService, why we're removing the utility function 
{{getCollector}}? I think we can reuse it when adding new web services. 

 Reuse TimelineCollectorManager for RM
 -

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3390.1.patch


 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2


[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508151#comment-14508151
 ] 

Sangjin Lee commented on YARN-3437:
---

Could you kindly take a look at the latest patch? Thanks!

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508204#comment-14508204
 ] 

Vinod Kumar Vavilapalli commented on YARN-3366:
---

+1 for the latest patch. Checking this in.

Can you file a ticket for the checkstyle rules' issues?

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508228#comment-14508228
 ] 

Hudson commented on YARN-3366:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7642 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7642/])
YARN-3366. Enhanced NodeManager to support classifying/shaping outgoing network 
bandwidth traffic originating from YARN containers Contributed by Sidharta 
Seethana. (vinodkv: rev a100be685cc4521e9949589948219231aa5d2733)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestResourceHandlerModule.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestTrafficControlBandwidthHandlerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestTrafficController.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/OutboundBandwidthResourceHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java


 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2


[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508378#comment-14508378
 ] 

Sangjin Lee commented on YARN-3437:
---

Thanks for the review [~djp]!

bq. For performance perspective, we should move LOG.info() out of synchronized 
block (may be move out of collector.start()?).

I can move the LOG.info() call outside the synchronized block. That said, I 
don't think this would have a meaningful performance impact. Aside from the 
fact that logging calls are usually synchronized themselves, it is reasonable 
to expect that the contention for this lock (collectors) would be quite low. 
We're talking about contention when multiple AMs are competing to create 
collectors on the same node, and the chances that there is any contention on 
this lock would be very low.

Also, when you said may be move out of collector.start(), did you mean moving 
the collector.start() call outside the synchronization block? If so, I'd be 
hesitant to do that. We just had a discussion on this in another JIRA (see 
https://issues.apache.org/jira/browse/YARN-3390?focusedCommentId=14508121page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14508121).

bq. we don't need to LOG.ERROR (replace with INFO?)

That is a good suggestion. I'll update this (and remove()) to lower the logging 
level for this.

bq. For remove(), similar that we should move collector.stop() and LOG.info() 
out of synchronized block.
This we can do safely. I'll update the patch.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508220#comment-14508220
 ] 

Sidharta Seethana commented on YARN-3366:
-

Here is the ticket : https://issues.apache.org/jira/browse/HADOOP-11869

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin

2015-04-22 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508312#comment-14508312
 ] 

Brahma Reddy Battula commented on YARN-3532:


findbugs are handled in HADOOP-11821.

 nodemanager version in RM nodes page didn't update when NMs rejoin
 --

 Key: YARN-3532
 URL: https://issues.apache.org/jira/browse/YARN-3532
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3532.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin


[ 
https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508402#comment-14508402
 ] 

Rohith commented on YARN-3532:
--

Is it dup of YARN-1981?

 nodemanager version in RM nodes page didn't update when NMs rejoin
 --

 Key: YARN-3532
 URL: https://issues.apache.org/jira/browse/YARN-3532
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3532.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled


[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508414#comment-14508414
 ] 

Rohith commented on YARN-3533:
--

+1(non-binding) LGTM .. 


 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-22 Thread Jonathan Eagles (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Eagles updated YARN-3448:
--
Component/s: timelineserver

Add Rolling Time To Lives Level DB Plugin Capabilities
--

Key: YARN-3448
URL: https://issues.apache.org/jira/browse/YARN-3448
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Attachments: YARN-3448.1.patch, YARN-3448.10.patch,
YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch,
YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch

For large applications, the majority of the time in LeveldbTimelineStore is
spent deleting old entities record at a time. An exclusive write lock is held
during the entire deletion phase which in practice can be hours. If we are to
relax some of the consistency constraints, other performance enhancing
techniques can be employed to maximize the throughput and minimize locking
time.
Split the 5 sections of the leveldb database (domain, owner, start time,
entity, index) into 5 separate databases. This allows each database to
maximize the read cache effectiveness based on the unique usage patterns of
each database. With 5 separate databases each lookup is much faster. This can
also help with I/O to have the entity and index databases on separate disks.
Rolling DBs for entity and index DBs. 99.9% of the data are in these two
sections 4:1 ration (index to entity) at least for tez. We replace DB record
removal with file system removal if we create a rolling set of databases that
age out and can be efficiently removed. To do this we must place a constraint
to always place an entity's events into it's correct rolling db instance
based on start time. This allows us to stitching the data back together while
reading and artificial paging.
Relax the synchronous writes constraints. If we are willing to accept losing
some records that we not flushed in the operating system during a crash, we
can use async writes that can be much faster.
Prefer Sequential writes. sequential writes can be several times faster than
random writes. Spend some small effort arranging the writes in such a way
that will trend towards sequential write performance over random write
performance.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests

2015-04-22 Thread Siddharth Wagle (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508304#comment-14508304
 ] 

Siddharth Wagle commented on YARN-3529:
---

Enlisted deps here :
{code}
dependency
  groupIdorg.apache.phoenix/groupId
  artifactIdphoenix-core/artifactId
  version${phoenix.version}/version
  exclusions
exclusion
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-common/artifactId
/exclusion
exclusion
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-annotations/artifactId
/exclusion
  /exclusions
/dependency

!-- for unit tests only --
dependency
  groupIdorg.apache.phoenix/groupId
  artifactIdphoenix-core/artifactId
  typetest-jar/type
  version${phoenix.version}/version
  scopetest/scope
/dependency
 dependency
  groupIdorg.apache.hbase/groupId
  artifactIdhbase-it/artifactId
  version${hbase.version}/version
  scopetest/scope
  classifiertests/classifier
/dependency
  dependency
groupIdorg.apache.hbase/groupId
artifactIdhbase-testing-util/artifactId
version${hbase.version}/version
scopetest/scope
optionaltrue/optional
exclusions
  exclusion
groupIdorg.jruby/groupId
artifactIdjruby-complete/artifactId
  /exclusion
/exclusions
  /dependency
dependency

{code}

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: AbstractMiniHBaseClusterTest.java, 
 output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3534) Report node resource utilization

2015-04-22 Thread Inigo Goiri (JIRA)

Inigo Goiri created YARN-3534:
-

 Summary: Report node resource utilization
 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri


YARN should be aware of the resource utilization of the nodes when scheduling 
containers. For this, this task will implement the NodeResourceMonitor and send 
this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2


[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508400#comment-14508400
 ] 

Hadoop QA commented on YARN-3437:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727521/YARN-3437.004.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / a100be6 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7466//console |


This message was automatically generated.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch, YARN-3437.004.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime


[ 
https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508183#comment-14508183
 ] 

Wangda Tan commented on YARN-3413:
--

Failed test is not related to the patch.

 Node label attributes (like exclusivity) should settable via 
 addToClusterNodeLabels but shouldn't be changeable at runtime
 --

 Key: YARN-3413
 URL: https://issues.apache.org/jira/browse/YARN-3413
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, 
 YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch, YARN-3413.7.patch


 As mentioned in : 
 https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947.
 Changing node label exclusivity and/or other attributes may not be a real use 
 case, and also we should support setting node label attributes whiling adding 
 them to cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2


 [ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3437:
--
Attachment: YARN-3437.004.patch

Patch v.4.

- moved logging statements out of the synchronized blocks
- dropped logging level from ERROR to INFO
- reduced the synchronization scope in remove()

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch, YARN-3437.002.patch, 
 YARN-3437.003.patch, YARN-3437.004.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508192#comment-14508192
 ] 

Sidharta Seethana commented on YARN-3366:
-

The test failure is unrelated is unrelated to this patch. The checkstyle script 
and the rules in place need to be revisited - for example, I see warnings for 
line too long for import statements. 

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-22 Thread Jonathan Eagles (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Eagles updated YARN-3448:
--
Attachment: YARN-3448.12.patch

Improved the patch further by running the code with a java profile. This patch
is 25% faster and generates roughly 20% smaller database than the previous
version.
- Removed unnecessary PREFIX since each type is in its own database and is not
needed to distinguish.
- Removed unused invisible related entities to reduce to reduce further
operations.
- Changed database serialization method to more quickly generate a smaller
serialized size of the primary filter values and other info. Library introduced
is verified Apache License 2.0 from fast-serialization.
- Profile show much time spent converting Strings to byte arrays. Converted the
strings once and reused for all the database keys.
- Reduced the read cache and write buffer size to take into consideration the 7
day default retention.
- Removed insert time from start time database. This feature is used to detect
changes since last query, but is not functional since it forces a scan of all
data entries. Could be added back at a later time.

Add Rolling Time To Lives Level DB Plugin Capabilities
--

Key: YARN-3448
URL: https://issues.apache.org/jira/browse/YARN-3448
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Attachments: YARN-3448.1.patch, YARN-3448.10.patch,
YARN-3448.12.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch,
YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin


[ 
https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508250#comment-14508250
 ] 

Hadoop QA commented on YARN-3532:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  7s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 31s | The applied patch generated  4 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m  1s | The patch appears to introduce 
13 new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 52s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |  52m  7s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m 50s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-sls |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, 
String):in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, 
String): new java.io.FileReader(String)  At RumenToSLSConverter.java:[line 122] 
|
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, 
String):in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, 
String): new java.io.FileWriter(String)  At RumenToSLSConverter.java:[line 124] 
|
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSNodeFile(String):in 
org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSNodeFile(String): new 
java.io.FileWriter(String)  At RumenToSLSConverter.java:[line 145] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(Resource, int):in 
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(Resource, int): new 
java.io.FileReader(String)  At SLSRunner.java:[line 280] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics():in 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(): 
new java.io.FileWriter(String)  At ResourceSchedulerWrapper.java:[line 490] |
|  |  Found reliance on default encoding in new 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper$MetricsLogRunnable(ResourceSchedulerWrapper):in
 new 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper$MetricsLogRunnable(ResourceSchedulerWrapper):
 new java.io.FileWriter(String)  At ResourceSchedulerWrapper.java:[line 695] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics():in 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new 
java.io.FileWriter(String)  At SLSCapacityScheduler.java:[line 493] |
|  |  Found reliance on default encoding in new 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):in
 new 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):
 new java.io.FileWriter(String)  At SLSCapacityScheduler.java:[line 698] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromNodeFile(String):in 
org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromNodeFile(String): new 
java.io.FileReader(String)  At SLSUtils.java:[line 119] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromSLSTrace(String):in 
org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromSLSTrace(String): new 
java.io.FileReader(String)  At SLSUtils.java:[line 92] |
|  |  Class org.apache.hadoop.yarn.sls.web.SLSWebApp defines non-transient 
non-serializable instance field handleOperTimecostHistogramMap  In 
SLSWebApp.java:instance field handleOperTimecostHistogramMap  In SLSWebApp.java 
|
|  |  Class org.apache.hadoop.yarn.sls.web.SLSWebApp defines non-transient 
non-serializable instance field queueAllocatedMemoryCounterMap  In

[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities


[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508279#comment-14508279
 ] 

Hadoop QA commented on YARN-3448:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 23s | The applied patch generated  6 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m  8s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   3m 17s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  45m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727487/YARN-3448.12.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a100be6 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7465/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7465//console |


This message was automatically generated.

 Add Rolling Time To Lives Level DB Plugin Capabilities
 --

 Key: YARN-3448
 URL: https://issues.apache.org/jira/browse/YARN-3448
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3448.1.patch, YARN-3448.10.patch, 
 YARN-3448.12.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, 
 YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch


 For large applications, the majority of the time in LeveldbTimelineStore is 
 spent deleting old entities record at a time. An exclusive write lock is held 
 during the entire deletion phase which in practice can be hours. If we are to 
 relax some of the consistency constraints, other performance enhancing 
 techniques can be employed to maximize the throughput and minimize locking 
 time.
 Split the 5 sections of the leveldb database (domain, owner, start time, 
 entity, index) into 5 separate databases. This allows each database to 
 maximize the read cache effectiveness based on the unique usage patterns of 
 each database. With 5 separate databases each lookup is much faster. This can 
 also help with I/O to have the entity and index databases on separate disks.
 Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
 sections 4:1 ration (index to entity) at least for tez. We replace DB record 
 removal with file system removal if we create a rolling set of databases that 
 age out and can be efficiently removed. To do this we must place a constraint 
 to always place an entity's events into it's correct rolling db instance 
 based on start time. This allows us to stitching the data back together while 
 reading and artificial paging.
 Relax the synchronous writes constraints. If we are willing to accept losing 
 some records that we not flushed in the operating system during a crash, we 
 can use async writes that can be much faster.
 Prefer Sequential writes. sequential writes can be several times faster than 
 random writes. Spend some small effort arranging the writes in such a way 
 that will trend towards sequential write performance over random write 
 performance.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy

2015-04-22 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507917#comment-14507917
 ] 

Craig Welch commented on YARN-3319:
---

The failed tests pass on my box with the patch, unrelated.  The checkstyle is 
referring to ResourceLimits, which the patch doesn't change... poking around in 
the build artifacts there are some exceptions in some of the checkstyle stuff, 
I'm not sure it's actually working correctly

 Implement a FairOrderingPolicy
 --

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
 YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, 
 YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, 
 YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, 
 YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch


 Implement a FairOrderingPolicy which prefers to allocate to 
 SchedulerProcesses with least current usage, very similar to the 
 FairScheduler's FairSharePolicy.  
 The Policy will offer allocations to applications in a queue in order of 
 least resources used, and preempt applications in reverse order (from most 
 resources used). This will include conditional support for sizeBasedWeight 
 style adjustment
 Optionally, based on a conditional configuration to enable sizeBasedWeight 
 (default false), an adjustment to boost larger applications (to offset the 
 natural preference for smaller applications) will adjust the resource usage 
 value based on demand, dividing it by the below value:
 Math.log1p(app memory demand) / Math.log(2);
 In cases where the above is indeterminate (two applications are equal after 
 this comparison), behavior falls back to comparison based on the application 
 id, which is generally lexically FIFO for that comparison



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507914#comment-14507914
 ] 

Vrushali C commented on YARN-3134:
--

Hi [~gtCarrera9]
Thanks for the patch, I had some questions:
- I don't see the isRelatedTo and relatesTo entities being written in this patch
- For the metrics timeseries, I see that the metric values are being written as 
a ; separated list of values as a string, is that right? But I could not 
figure how where the timestamps associated with each metric value are stored. 
Storing metric values as strings would make it harder I think to query in 
numerical queries, like how many entities had GC MILLIS that were more than 25% 
of the CPU MILLIS. 


 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation


[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507952#comment-14507952
 ] 

Hadoop QA commented on YARN-3434:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 36s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 15s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  52m 23s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 31s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727317/YARN-3434.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a3b1d8c |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7455/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7455//console |


This message was automatically generated.

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, 
 YARN-3434.patch, YARN-3434.patch, YARN-3434.patch


 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


 [ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3366:

Attachment: YARN-3366.007.patch

Attaching a new patch based on code-review feedback from [~vinodkv]

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM


[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508121#comment-14508121
 ] 

Sangjin Lee commented on YARN-3390:
---

bq. In TimelineCollectorManager, I'm still having this question, although we 
may not want to address it in this JIRA: are there any special consistency 
requirements that prevent us from using ConcurrentHashMap?

I can answer this as I added that code. :) In putIfAbsent(), it needs to start 
the collector as well if get() returns null. If we used ConcurrentHashMap and 
removed synchronization, multiple threads could start their own collectors 
unnecessarily. It is probably not a show stopper but less than desirable. Also, 
in real life the contention on TimelineCollectorManager is low enough that 
synchronization should be perfectly adequate. If we want to do this without 
synchronization, then we would want to use something like guava's LoadingCache.

 Reuse TimelineCollectorManager for RM
 -

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3390.1.patch


 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin

2015-04-22 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reassigned YARN-3532:
-

Assignee: Siqi Li

 nodemanager version in RM nodes page didn't update when NMs rejoin
 --

 Key: YARN-3532
 URL: https://issues.apache.org/jira/browse/YARN-3532
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508155#comment-14508155
 ] 

Hadoop QA commented on YARN-3366:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 30s | The applied patch generated  6 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 27s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   5m 48s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  49m 29s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727369/YARN-3366.007.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0ebe84d |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7463/testReport/ |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7463//console |


This message was automatically generated.

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly