[jira] [Commented] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370844#comment-14370844
 ] 

Hadoop QA commented on YARN-3375:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705561/YARN-3375.patch
  against trunk revision 4e886eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7038//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7038//console

This message is automatically generated.

> NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting 
> NodeHealthScriptRunner
> --
>
> Key: YARN-3375
> URL: https://issues.apache.org/jira/browse/YARN-3375
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-3375.patch
>
>
> 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting 
> the NodeHealthScriptRunner.
> {code:title=NodeManager.java|borderStyle=solid}
> if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) {
>   LOG.info("Abey khali");
>   return null;
> }
> {code}
> {code:title=NodeHealthCheckerService.java|borderStyle=solid}
> if (NodeHealthScriptRunner.shouldRun(
> conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) {
>   addService(nodeHealthScriptRunner);
> }
> {code}
> {code:title=NodeHealthScriptRunner.java|borderStyle=solid}
> if (!shouldRun(nodeHealthScript)) {
>   LOG.info("Not starting node health monitor");
>   return;
> }
> {code}
> 2. If we don't configure node health script or configured health script 
> doesn't execute permission, NM logs with the below message.
> {code:xml}
> 2015-03-19 19:55:45,713 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.005.patch

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370811#comment-14370811
 ] 

Yongjun Zhang commented on YARN-3021:
-

HI Jian,

Thanks a lot for your detailed review and comments! I'm attaching rev5 to 
address all of them.

* Replaced {{new Text("HDFS_DELEGATION_TOKEN")}} with predefined constant
* About "does conf.getStrings strip off the leading or ending empty strings? if 
not, we may strip those off.", I followed {{JobSubmitter#populateTokenCache}}. 
I think it makes sense for user to not to put leading and ending empty strings.
* Removed NON_RENEWER. But still use empty renewer string instead of null. 
* I did test rev 4 earlier, and I also tested rev5 with real clusters.

Thanks for taking look at the new rev.



> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files

2015-03-19 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370784#comment-14370784
 ] 

Li Lu commented on YARN-3378:
-

Hi [~Naganarasimha], YARN-2556 was mainly opened for measuring the existing ATS 
v1's performance. In this JIRA our main focus we'd like to build a client to 
generate reasonable load to guide v2 timeline service's storage design. From 
our discussion about Phoenix/hbase I believe this is a quite necessary step for 
us to understand our v2 design. These two JIRAs also work on two different 
branches. Just like timeline service v1 and v2 may co-exist (so do all related 
jiras) in YARN-2928 branch, I don't see any reason to prevent both JIRAs exist. 
If you have any special concerns about this feel free to let us know. Thanks! 

> a load test client that can replay a volume of history files
> 
>
> Key: YARN-3378
> URL: https://issues.apache.org/jira/browse/YARN-3378
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
>
> It might be good to create a load test client that can replay a large volume 
> of history files into the timeline service. One can envision running such a 
> load test client as a mapreduce job and generate a fair amount of load. It 
> would be useful to spot check correctness, and more importantly observe 
> performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370777#comment-14370777
 ] 

Hudson commented on YARN-3379:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7379 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7379/])
YARN-3379. Fixed missing data in localityTable and ResourceRequests table in RM 
WebUI. Contributed by Xuan Gong (jianhe: rev 
4e886eb9cbd2dcb128bbfd17309c734083093a4c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppAttemptPage.java
* hadoop-yarn-project/CHANGES.txt


> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, 
> YARN-3379.3.1.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files

2015-03-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370772#comment-14370772
 ] 

Naganarasimha G R commented on YARN-3378:
-

Hi [~sjlee0] & [~gtCarrera], Is this jira planning to do something differently 
from YARN-2556, Already patch is available for it there and if the current 
scope of the jira is not targeting to do anything differently than the older 
one, then we can close this jira and continue to do with  YARN-2556, else may 
be we can close the older jira and leverage the patch here or continue with the 
new changes...

> a load test client that can replay a volume of history files
> 
>
> Key: YARN-3378
> URL: https://issues.apache.org/jira/browse/YARN-3378
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
>
> It might be good to create a load test client that can replay a large volume 
> of history files into the timeline service. One can envision running such a 
> load test client as a mapreduce job and generate a fair amount of load. It 
> would be useful to spot check correctness, and more importantly observe 
> performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370767#comment-14370767
 ] 

Hadoop QA commented on YARN-3126:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12697432/resourcelimit-02.patch
  against trunk revision e37ca22.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/7035//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7035//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7035//console

This message is automatically generated.

> FairScheduler: queue's usedResource is always more than the maxResource limit
> -
>
> Key: YARN-3126
> URL: https://issues.apache.org/jira/browse/YARN-3126
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0
> Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. 
>Reporter: Xia Hu
>  Labels: assignContainer, fairscheduler, resources
> Fix For: trunk-win
>
> Attachments: resourcelimit-02.patch, resourcelimit.patch
>
>
> When submitting spark application(both spark-on-yarn-cluster and 
> spark-on-yarn-cleint model), the queue's usedResources assigned by 
> fairscheduler always can be more than the queue's maxResources limit.
> And by reading codes of fairscheduler, I suppose this issue happened because 
> of ignore to check the request resources when assign Container.
> Here is the detail:
> 1. choose a queue. In this process, it will check if queue's usedResource is 
> bigger than its max, with assignContainerPreCheck. 
> 2. then choose a app in the certain queue. 
> 3. then choose a container. And here is the question, there is no check 
> whether this container would make the queue sources over its max limit. If a 
> queue's usedResource is 13G, the maxResource limit is 16G, then a container 
> which asking for 4G resources may be assigned successful. 
> This problem will always happen in spark application, cause we can ask for 
> different container resources in different applications. 
> By the way, I have already use the patch from YARN-2083. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3382) Some of UserMetricsInfo metrics are incorrectly set to root queue metrics

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370768#comment-14370768
 ] 

Hadoop QA commented on YARN-3382:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705811/YARN-3382.patch
  against trunk revision e37ca22.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7036//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7036//console

This message is automatically generated.

> Some of UserMetricsInfo metrics are incorrectly set to root queue metrics
> -
>
> Key: YARN-3382
> URL: https://issues.apache.org/jira/browse/YARN-3382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
> Attachments: YARN-3382.patch
>
>
> {{appsCompleted}}, {{appsPending}}, {{appsRunning}} etc. in 
> {{UserMetricsInfo}} are incorrectly set to the root queue's value instead of 
> the user's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3368) Improve YARN web UI

2015-03-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370738#comment-14370738
 ] 

Jian He commented on YARN-3368:
---

Yes, we should refine the current web service and also expose the stuff that is 
missing. 

The intention is to build a nicer UI using some front-end tools like Bootstrap. 
 yes, it can be on branch.

> Improve YARN web UI
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370727#comment-14370727
 ] 

Hadoop QA commented on YARN-3381:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705819/YARN-3381.patch
  against trunk revision e37ca22.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7037//console

This message is automatically generated.

> A typographical error in "InvalidStateTransitonException"
> -
>
> Key: YARN-3381
> URL: https://issues.apache.org/jira/browse/YARN-3381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: John Wang
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3381.patch
>
>
> Appears that "InvalidStateTransitonException" should be 
> "InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370661#comment-14370661
 ] 

Rohith commented on YARN-3369:
--

Thanks [~brahmareddy] for providing patch.
I think all the caller of getResourceRequest() needs be verified for null 
check.  Below code also should check for null else NPE can be thrown. 
{code}
 public synchronized Resource getResource(Priority priority) {
ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
return request.getCapability();
  }
{code}

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.2.patch, YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-03-19 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370651#comment-14370651
 ] 

Brahma Reddy Battula commented on YARN-3381:


Attached patch..Kindly Review!!!

> A typographical error in "InvalidStateTransitonException"
> -
>
> Key: YARN-3381
> URL: https://issues.apache.org/jira/browse/YARN-3381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: John Wang
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3381.patch
>
>
> Appears that "InvalidStateTransitonException" should be 
> "InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-03-19 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370649#comment-14370649
 ] 

Devaraj K commented on YARN-3225:
-

ok sure, I will update the patch soon, Thanks.

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
> Attachments: YARN-3225.patch, YARN-914.patch
>
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-03-19 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3381:
---
Attachment: YARN-3381.patch

> A typographical error in "InvalidStateTransitonException"
> -
>
> Key: YARN-3381
> URL: https://issues.apache.org/jira/browse/YARN-3381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: John Wang
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3381.patch
>
>
> Appears that "InvalidStateTransitonException" should be 
> "InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-03-19 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3381:
---
Target Version/s: 3.0.0  (was: 2.8.0)
Hadoop Flags: Incompatible change

Marking this as an incompatible change.

> A typographical error in "InvalidStateTransitonException"
> -
>
> Key: YARN-3381
> URL: https://issues.apache.org/jira/browse/YARN-3381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: John Wang
>Assignee: Brahma Reddy Battula
>
> Appears that "InvalidStateTransitonException" should be 
> "InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3111) Fix ratio problem on FairScheduler page

2015-03-19 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370625#comment-14370625
 ] 

Peng Zhang commented on YARN-3111:
--

[~ashwinshankar77]
Thanks, I got it.

I'll update patch to implement 1 & 3 in your advices.

> Fix ratio problem on FairScheduler page
> ---
>
> Key: YARN-3111
> URL: https://issues.apache.org/jira/browse/YARN-3111
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Minor
> Attachments: YARN-3111.1.patch, YARN-3111.png, parenttooltip.png
>
>
> Found 3 problems on FairScheduler page:
> 1. Only compute memory for ratio even when queue schedulingPolicy is DRF.
> 2. When min resources is configured larger than real resources, the steady 
> fair share ratio is so long that it is out the page.
> 3. When cluster resources is 0(no nodemanager start), ratio is displayed as 
> "NaN% used"
> Attached image shows the snapshot of above problems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3382) Some of UserMetricsInfo metrics are incorrectly set to root queue metrics

2015-03-19 Thread Rohit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Agarwal updated YARN-3382:

Attachment: YARN-3382.patch

Patch attached.

> Some of UserMetricsInfo metrics are incorrectly set to root queue metrics
> -
>
> Key: YARN-3382
> URL: https://issues.apache.org/jira/browse/YARN-3382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
> Attachments: YARN-3382.patch
>
>
> {{appsCompleted}}, {{appsPending}}, {{appsRunning}} etc. in 
> {{UserMetricsInfo}} are incorrectly set to the root queue's value instead of 
> the user's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3382) Some of UserMetricsInfo metrics are incorrectly set to root queue metrics

2015-03-19 Thread Rohit Agarwal (JIRA)
Rohit Agarwal created YARN-3382:
---

 Summary: Some of UserMetricsInfo metrics are incorrectly set to 
root queue metrics
 Key: YARN-3382
 URL: https://issues.apache.org/jira/browse/YARN-3382
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal


{{appsCompleted}}, {{appsPending}}, {{appsRunning}} etc. in {{UserMetricsInfo}} 
are incorrectly set to the root queue's value instead of the user's value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-03-19 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned YARN-3381:
--

Assignee: Brahma Reddy Battula

> A typographical error in "InvalidStateTransitonException"
> -
>
> Key: YARN-3381
> URL: https://issues.apache.org/jira/browse/YARN-3381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.6.0
>Reporter: John Wang
>Assignee: Brahma Reddy Battula
>
> Appears that "InvalidStateTransitonException" should be 
> "InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3381) A typographical error in "InvalidStateTransitonException"

2015-03-19 Thread John Wang (JIRA)
John Wang created YARN-3381:
---

 Summary: A typographical error in "InvalidStateTransitonException"
 Key: YARN-3381
 URL: https://issues.apache.org/jira/browse/YARN-3381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: John Wang


Appears that "InvalidStateTransitonException" should be 
"InvalidStateTransitionException".  Transition was misspelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370573#comment-14370573
 ] 

Brahma Reddy Battula commented on YARN-3369:


Thanks a lot!!!

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.2.patch, YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370549#comment-14370549
 ] 

Hadoop QA commented on YARN-3345:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705763/YARN-3345.5.patch
  against trunk revision 91baca1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 9 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/7034//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.api.TestPBImplRecords

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7034//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7034//console

This message is automatically generated.

> Add non-exclusive node label RMAdmin CLI/API
> 
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370503#comment-14370503
 ] 

Hadoop QA commented on YARN-3356:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705759/YARN-3356.5.patch
  against trunk revision 91baca1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7033//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7033//console

This message is automatically generated.

> Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track 
> used-resources-by-label.
> --
>
> Key: YARN-3356
> URL: https://issues.apache.org/jira/browse/YARN-3356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, 
> YARN-3356.4.patch, YARN-3356.5.patch
>
>
> Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp 
> should use ResourceRequest to track resource-usage/pending by label for 
> better resource tracking and preemption. 
> And also, when application's pending resource changed (container allocated, 
> app completed, moved, etc.), we need update ResourceUsage of queue 
> hierarchies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370480#comment-14370480
 ] 

Hadoop QA commented on YARN-3369:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705751/YARN-3369.2.patch
  against trunk revision 91baca1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7032//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7032//console

This message is automatically generated.

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.2.patch, YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedu

[jira] [Commented] (YARN-2828) Enable auto refresh of web pages (using http parameter)

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370467#comment-14370467
 ] 

Hadoop QA commented on YARN-2828:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705729/YARN-2828.006.patch
  against trunk revision 91baca1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7029//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7029//console

This message is automatically generated.

> Enable auto refresh of web pages (using http parameter)
> ---
>
> Key: YARN-2828
> URL: https://issues.apache.org/jira/browse/YARN-2828
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tim Robertson
>Assignee: Vijay Bhat
>Priority: Minor
> Attachments: YARN-2828.001.patch, YARN-2828.002.patch, 
> YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch, 
> YARN-2828.006.patch
>
>
> The MR1 Job Tracker had a useful HTTP parameter of e.g. "&refresh=3" that 
> could be appended to URLs which enabled a page reload.  This was very useful 
> when developing mapreduce jobs, especially to watch counters changing.  This 
> is lost in the the Yarn interface.
> Could be implemented as a page element (e.g. drop down or so), but I'd 
> recommend that the page not be more cluttered, and simply bring back the 
> optional "refresh" HTTP param.  It worked really nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370449#comment-14370449
 ] 

Hadoop QA commented on YARN-3379:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705739/YARN-3379.3.1.patch
  against trunk revision 91baca1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7031//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7031//console

This message is automatically generated.

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, 
> YARN-3379.3.1.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370446#comment-14370446
 ] 

Junping Du commented on YARN-3212:
--

Thanks [~jlowe] and [~mingma] for review and comments!
bq. Do we want to handle the DECOMMISSIONING_WITH_TIMEOUT event when the node 
is already in the DECOMMISSIONING state? Curious if we might get a duplicate 
decommission event somehow and want to ignore it or if we know for sure this 
cannot happen in practice.
This case is possibly happen when user submit another decommssion CLI while the 
node still in decommissioning. I think it just ignore it now as nothing need to 
update if node already in decommissioning. We will not have timeout tracking 
and update in RM side (may only pass to AM for notification) according to 
discussions in YARN-914 and YARN-3225. 

bq. Do we want to consider DECOMMISSIONING nodes as not active? There are 
containers actively running on them, and in that sense they are participating 
in the cluster (and contributing to the overall cluster resource). I think they 
should still be considered active, but I could be persuaded otherwise.
I think we discussed this on YARN-914 before. The conclusion so far is keeping 
node in decommissioning as active (or may broken some services - I am not 100% 
sure on this) and make node resource equals to resource of assigned containers 
at anytime. Do we need to change this conclusion?

bq.  In the reconnected node transition there is a switch statement that will 
debug-log an unexpected state message when in fact the DECOMMISSIONING state is 
expected for this transition.
That's a good point. Will fix it in v3 patch. Thanks!

bq. Curious why the null check is needed in handleNMContainerStatuses? What 
about this change allows the container statuses to be null?
I think so. Look like the RMNodeReconnectEvent comes from 
RegisterNodeManagerRequest and containerStatuses field (getting from proto) 
could be nullable. So there is an NPE bug here and I found through unit test 
where we created event like "new RMNodeReconnectEvent(node.getNodeID(), node, 
null, null)" even before this patch. Am I missing something here?

bq. It would be nice to see some refactoring of the common code between 
StatusUpdateWhenHealthyTransition, StatusUpdateWhenUnhealthyTransition, and 
StatusUpdateWhenDecommissioningTransition.
Yes. I should do earlier. Will do it in v3 patch.

bq. These change seems unnecessary?
These are still necessary because we changed state transition from one final 
state to multiple final states (like below example) and interface only accept 
EnumSet. 
{code}
   public static class ReconnectNodeTransition implements
-  SingleArcTransition {
+  MultipleArcTransition {
{code}

bq. Do we need to support the scenario where NM becomes dead when it is being 
decommissioned? Say decommission timeout is 30 minutes larger than the NM 
liveness timeout. The node drops out of the cluster for some time and rejoin 
later all within the decommission time out. Will Yarn show the status as just 
dead node, or
{dead, decommissioning}
Now, the node can be LOST (dead) when it is in decommissioning. It is not 
different with running node get lost but cannot join back except user put it 
back through recommission. Make sense?

bq. Seems useful for admins to know about it. If we need that, we can consider 
two types of NodeState. One is liveness state, one is admin state. Then you 
will have different combinations.
We can add necessary log to let admin know about it. Are you talking about 
scenario like this: admin put some node in decommissioning with a timeout, some 
upgrade script doing OS upgrade and finish with a restart in a random time 
which could be shorter than decommissioning time. Admin want these nodes can 
join back automatically. But how YARN know about Admin want these nodes back or 
not after a restart? May be a explicitly set back to white list (recommission) 
still necessary.

> RMNode State Transition Update with DECOMMISSIONING state
> -
>
> Key: YARN-3212
> URL: https://issues.apache.org/jira/browse/YARN-3212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel prev

[jira] [Created] (YARN-3380) Add protobuf compatibility checker to jenkins test runs

2015-03-19 Thread Li Lu (JIRA)
Li Lu created YARN-3380:
---

 Summary: Add protobuf compatibility checker to jenkins test runs
 Key: YARN-3380
 URL: https://issues.apache.org/jira/browse/YARN-3380
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


We may want to run the protobuf compatibility checker for each incoming patch, 
to prevent incompatible changes for rolling upgrades. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370440#comment-14370440
 ] 

Jian He commented on YARN-3021:
---

thanks Yongjun, some comments on the patch !

- DelegationTokenRenewer: the skipTokenRenewal check should be done under the 
existing code {{if (token.getKind().equals(new 
Text("HDFS_DELEGATION_TOKEN")))}} as below. And I think only doing this check 
is enough, we don't need checks in other places.
{code}
  if (token.isManaged()) {
if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) {
  LOG.info(applicationId + " found existing hdfs token " + token);
  hasHdfsToken = true;
  Text renewer = ((Token) token).
  decodeIdentifier().getRenewer();
  if ((renewer != null && renewer.toString()
  .equals(Token.NON_RENEWER))) {
continue;
  }
}
{code}

- does conf.getStrings strip off the leading or ending empty strings? if not, 
we may strip those off.
{code}
String [] nns =

conf.getStrings(MRJobConfig.JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE);
{code}
- given that this is a work-around fix, maybe not adding the NON_RENEWER 
publicly in common ? just check for null ?
- Did you test the patch on real cluster ?

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370440#comment-14370440
 ] 

Jian He edited comment on YARN-3021 at 3/20/15 12:29 AM:
-

thanks Yongjun, some comments on the patch :

- DelegationTokenRenewer: the skipTokenRenewal check should be done under the 
existing code {{if (token.getKind().equals(new 
Text("HDFS_DELEGATION_TOKEN")))}} as below. And I think only doing this check 
is enough, we don't need checks in other places.
{code}
  if (token.isManaged()) {
if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) {
  LOG.info(applicationId + " found existing hdfs token " + token);
  hasHdfsToken = true;
  Text renewer = ((Token) token).
  decodeIdentifier().getRenewer();
  if ((renewer != null && renewer.toString()
  .equals(Token.NON_RENEWER))) {
continue;
  }
}
{code}

- does conf.getStrings strip off the leading or ending empty strings? if not, 
we may strip those off.
{code}
String [] nns =

conf.getStrings(MRJobConfig.JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE);
{code}
- given that this is a work-around fix, maybe not adding the NON_RENEWER 
publicly in common ? just check for null ?
- Did you test the patch on real cluster ?


was (Author: jianhe):
thanks Yongjun, some comments on the patch !

- DelegationTokenRenewer: the skipTokenRenewal check should be done under the 
existing code {{if (token.getKind().equals(new 
Text("HDFS_DELEGATION_TOKEN")))}} as below. And I think only doing this check 
is enough, we don't need checks in other places.
{code}
  if (token.isManaged()) {
if (token.getKind().equals(new Text("HDFS_DELEGATION_TOKEN"))) {
  LOG.info(applicationId + " found existing hdfs token " + token);
  hasHdfsToken = true;
  Text renewer = ((Token) token).
  decodeIdentifier().getRenewer();
  if ((renewer != null && renewer.toString()
  .equals(Token.NON_RENEWER))) {
continue;
  }
}
{code}

- does conf.getStrings strip off the leading or ending empty strings? if not, 
we may strip those off.
{code}
String [] nns =

conf.getStrings(MRJobConfig.JOB_NAMENODES_TOKEN_RENEWAL_EXCLUDE);
{code}
- given that this is a work-around fix, maybe not adding the NON_RENEWER 
publicly in common ? just check for null ?
- Did you test the patch on real cluster ?

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370438#comment-14370438
 ] 

Wangda Tan commented on YARN-3362:
--

For the active-user-info, we need some queue-user-by-label metrics as well, 
such as used-resource-by-user-and-label. Which can be placed to queue-label 
metrics table.

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370434#comment-14370434
 ] 

Wangda Tan commented on YARN-3362:
--

Hi [~Naganarasimha],
Thanks for your comments,
bq. There will be some common queue metrics across the labels, wont it get 
repeated across for each label if a queue is mapped to multiple labels?
Some common fields may get repeated (like absolute max capacity, etc.). I think 
repeat some of them is not a very big issue to me. I think we can show 
queue-label-metrics + queue-common-metrics for each queue-label

bq. IIUC most of the queue Metrics might not be specific to a label, like 
Capacity, Absolute max capacity, Max apps, Max AM's per user etc... . Correct 
me if my understanding on this is wrong.
Yes, they're, but there're more parameters / metrics in queues for both 
label/queue, different labels under same queue can have different 
user-limit/capacity/maximum-capacity/max-am-resource, etc.). We also need to 
show them to users if possible

bq. Apart from the label specific queue metrics like (label capacity, label abs 
capacity,used) are there any new Label specific queue metrics you have in your 
mind ?
I think above can answer your question.

bq. would it be better to list like
If we have this view, 
1) How you show label-specific metrics? 
2) What's the "used-resource" in queue level means (used-resource make more 
sense when it's per-label).
3) How to check "label-wise" resource usage for parent queues.

bq. Also if required we can have seperate page (/in the labels page/append at 
the end of CS page) like
I think my proposal is still a little more clear, we need to show label-wise 
metrics to user. And with that, user can clear understand resource usage for 
each partition (just check each label's usage. Also parent's label-wise usage 
can show as well.

> Add node label usage in RM CapacityScheduler web UI
> ---
>
> Key: YARN-3362
> URL: https://issues.apache.org/jira/browse/YARN-3362
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, webapp
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>
> We don't have node label usage in RM CapacityScheduler web UI now, without 
> this, user will be hard to understand what happened to nodes have labels 
> assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-19 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370432#comment-14370432
 ] 

Robert Kanter commented on YARN-3040:
-

Sorry I didn't reply earlier.  I still haven't found the cycles to do patches 
for ATS work, so please go ahead and continue working on the updated patch.  
I'll be sure to review it.

> [Data Model] Make putEntities operation be aware of the app's context
> -
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API

2015-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3345:
-
Attachment: YARN-3345.5.patch

bq. public/unstable annotations for the newly added records, e.g. 
SetNodeLabelsAttributesRequest, 
Done 

bq. NodeLabelAttributes -> NodeLabel, so that AddToClusterNodeLabelsRequest can 
later on use the same data structure.
Done 

bq. for node exclusiveness - I think we may use NodeLabel#(get/set)IsExclusive
Done 

bq. “ an un existed node-label=%s” - “non-existing node-label”
Done

bq. throw YarnException instead of IOException
Done

bq. below code, how about user wants to set the attributes to be empty if 
(attr.getAttributes().isEmpty()) {
  // simply ignore
  continue;
}
Done, removed map of attributes, added top level "shareable"

bq. add a newInstance method in SetNodeLabelsAttributesResponse and use that
Done

bq. revert RMNodeLabelsManager change
There're some renames, so we cannot revert RMNodeLabelsMgr changes.

> Add non-exclusive node label RMAdmin CLI/API
> 
>
> Key: YARN-3345
> URL: https://issues.apache.org/jira/browse/YARN-3345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3345.1.patch, YARN-3345.2.patch, YARN-3345.3.patch, 
> YARN-3345.4.patch, YARN-3345.5.patch
>
>
> As described in YARN-3214 (see design doc attached to that JIRA), we need add 
> non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.

2015-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370401#comment-14370401
 ] 

Wangda Tan commented on YARN-3356:
--

Thanks for review! Addressed in (ver.5)

> Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track 
> used-resources-by-label.
> --
>
> Key: YARN-3356
> URL: https://issues.apache.org/jira/browse/YARN-3356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, 
> YARN-3356.4.patch, YARN-3356.5.patch
>
>
> Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp 
> should use ResourceRequest to track resource-usage/pending by label for 
> better resource tracking and preemption. 
> And also, when application's pending resource changed (container allocated, 
> app completed, moved, etc.), we need update ResourceUsage of queue 
> hierarchies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.

2015-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3356:
-
Attachment: YARN-3356.5.patch

> Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track 
> used-resources-by-label.
> --
>
> Key: YARN-3356
> URL: https://issues.apache.org/jira/browse/YARN-3356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, 
> YARN-3356.4.patch, YARN-3356.5.patch
>
>
> Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp 
> should use ResourceRequest to track resource-usage/pending by label for 
> better resource tracking and preemption. 
> And also, when application's pending resource changed (container allocated, 
> app completed, moved, etc.), we need update ResourceUsage of queue 
> hierarchies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3356) Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.

2015-03-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370396#comment-14370396
 ] 

Jian He commented on YARN-3356:
---

- how about doing the for loop inside the write lock ?
{code}
public void copyAllUsed(ResourceUsage other) {
  for (Entry entry : other.usages.entrySet()) {
setUsed(entry.getKey(), 
Resources.clone(entry.getValue().getUsed()));
  }
}
{code}

> Capacity Scheduler FiCaSchedulerApp should use ResourceUsage to track 
> used-resources-by-label.
> --
>
> Key: YARN-3356
> URL: https://issues.apache.org/jira/browse/YARN-3356
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3356.1.patch, YARN-3356.2.patch, YARN-3356.3.patch, 
> YARN-3356.4.patch
>
>
> Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp 
> should use ResourceRequest to track resource-usage/pending by label for 
> better resource tracking and preemption. 
> And also, when application's pending resource changed (container allocated, 
> app completed, moved, etc.), we need update ResourceUsage of queue 
> hierarchies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370386#comment-14370386
 ] 

Hadoop QA commented on YARN-2495:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705721/YARN-2495.20150320-1.patch
  against trunk revision 91baca1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7028//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7028//console

This message is automatically generated.

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370370#comment-14370370
 ] 

Junping Du commented on YARN-3225:
--

bq. I feel timeout would be enough, anyway we can wait for some other to 
comment or suggest here.
I am fine for keeping timeout here for now. We can discuss the naming issue on 
other JIRAs later. Shouldn't block the major feature here.

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
> Attachments: YARN-3225.patch, YARN-914.patch
>
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370363#comment-14370363
 ] 

Hadoop QA commented on YARN-3379:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705737/YARN-3379.3.patch
  against trunk revision 91baca1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7030//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7030//console

This message is automatically generated.

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, 
> YARN-3379.3.1.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370361#comment-14370361
 ] 

Jian He commented on YARN-3379:
---

looks good overall, the attempt page has some format issues which is tracked 
down at YARN-3301.

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, 
> YARN-3379.3.1.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3369:
-
Attachment: YARN-3369.2.patch

Attached same patch with correct indent, will commit when Jenkins get back.

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.2.patch, YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370333#comment-14370333
 ] 

Wangda Tan commented on YARN-2495:
--

Go through patch again, hopefully this is last round from my side :)

1) StringArrayProto.stringElement -> elements

2) After thought, I think {{optional bool areNodeLabelsAcceptedByRM = 7 
\[default = false\];}} should be true to be more defensive: We need make sure 
there's no error when somebody forget to set this field.

3) testNodeHeartbeatRequestPBImplWithNullLabels: remove 
{{original.setNodeLabels(null);}}, test should still pass.

4) NodeLabelsProviderService -> NodeLabelsProvider, like most other modules, we 
don't need to make "service" as a part of the classname, change sub classes and 
NodeManager.createNodeLabelsProviderService as well.

5) NodeStatusUpdaterImpl.run:
{code}
617 int lastHeartbeatID = 0;
618 Set nodeLabelsLastUpdatedToRM = null;
619 if (hasNodeLabelsProvider) {
{code}
No matter if hasNodeLabelsProvider, nodeLabelsLastUpdatedToRM should be null? 
By default is "not change" instead of "empty", correct?

6) nodeLabelsLastUpdatedToRM -> lastUpdatedNodeLabelsToRM

7) areNodeLabelsUpdated: Need check null? And could you add more test to cover 
the case when new fetched node labels and/or last node labels are null?


> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3379:

Attachment: YARN-3379.3.1.patch

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, 
> YARN-3379.3.1.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370299#comment-14370299
 ] 

Sangjin Lee commented on YARN-3034:
---

[~Naganarasimha], I do see the commit, and I'm able to pull it 
(dda84085cabd8fdf143b380e54e1730802fd9912). You might want to try again.

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370290#comment-14370290
 ] 

Xuan Gong commented on YARN-3379:
-

Submit a new patch after have some offline- discussion with Jian

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3379:

Attachment: YARN-3379.3.patch

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch, YARN-3379.3.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2581) NMs need to find a way to get LogAggregationContext

2015-03-19 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370279#comment-14370279
 ] 

Anubhav Dhoot commented on YARN-2581:
-

This is a breaking change and should have been marked as such. This is the 
error seen on upgrade from previous version
{noformat}
2015-03-17 10:29:57,984 FATAL 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
NodeManager
org.apache.hadoop.service.ServiceStateException: java.io.EOFException
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:253)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:187)
at 
org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:142)
at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:271)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:298)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:254)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:237)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 5 more
2015-03-17 10:29:57,995 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
{noformat}

> NMs need to find a way to get LogAggregationContext
> ---
>
> Key: YARN-2581
> URL: https://issues.apache.org/jira/browse/YARN-2581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2581.1.patch, YARN-2581.2.patch, YARN-2581.3.patch, 
> YARN-2581.4.patch
>
>
> After YARN-2569, we have LogAggregationContext for application in 
> ApplicationSubmissionContext. NMs need to find a way to get this information.
> We have this requirement: For all containers in the same application should 
> honor the same LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370271#comment-14370271
 ] 

Hadoop QA commented on YARN-3369:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705656/YARN-3369.patch
  against trunk revision 61a4c7f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1152 javac 
compiler warnings (more than the trunk's current 206 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/7026//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ipc.TestRPCWaitForProxy
  org.apache.hadoop.tracing.TestTracing
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.qjournal.TestSecureNNWithQJM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7026//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7026//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7026//console

This message is automatically generated.

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.j

[jira] [Commented] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370246#comment-14370246
 ] 

Hadoop QA commented on YARN-3379:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705704/YARN-3379.2.patch
  against trunk revision 61a4c7f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7027//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7027//console

This message is automatically generated.

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)

2015-03-19 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2828:
-
Attachment: YARN-2828.006.patch

> Enable auto refresh of web pages (using http parameter)
> ---
>
> Key: YARN-2828
> URL: https://issues.apache.org/jira/browse/YARN-2828
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tim Robertson
>Assignee: Vijay Bhat
>Priority: Minor
> Attachments: YARN-2828.001.patch, YARN-2828.002.patch, 
> YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch, 
> YARN-2828.006.patch
>
>
> The MR1 Job Tracker had a useful HTTP parameter of e.g. "&refresh=3" that 
> could be appended to URLs which enabled a page reload.  This was very useful 
> when developing mapreduce jobs, especially to watch counters changing.  This 
> is lost in the the Yarn interface.
> Could be implemented as a page element (e.g. drop down or so), but I'd 
> recommend that the page not be more cluttered, and simply bring back the 
> optional "refresh" HTTP param.  It worked really nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-19 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495.20150320-1.patch

Hi [~wangda], 
Updated Patch with following changes :
* Typo, lable -> label, 
* NodeStatusUpdaterImpl: no need to call nodeLabelsProvider.getNodeLabels() 
twice when register/heartbeat
* HeartBeat -> Heartbeat
* NodeStatusUpdaterImpl: When labels are rejected by RM, you should log it with 
diag message.
* StringArrayProto instead of NodeIdToLabelsProto
* NodeStatusUpdaterTest.testNMRegistrationWithLabels  to 
testNodeStatusUpdaterForNodeLabels
* TestResourceTrackerService, lblsMgr -> nodeLabelsMgr 
* Validation for "heartbeat without updating labels"

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3379:

Attachment: YARN-3379.2.patch

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch, YARN-3379.2.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370056#comment-14370056
 ] 

Wangda Tan commented on YARN-2495:
--

ResourceTrackerForLabels.labels -> lastReceivedLabels

{code}
262 // heartbeat without updating labels
263 nm.getNodeStatusUpdater().sendOutofBandHeartBeat();
264 resourceTracker.waitTillHeartBeat();
265 resourceTracker.resetNMHeartbeatReceiveFlag();
{code}

Need add some check for this operation?

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3379:

Attachment: YARN-3379.1.patch

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3379.1.patch
>
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3379:

Description: 
After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and AHS 
WebUI.
But there are some information, such as containerLocalityStatistics, 
ResourceRequests, are only useful for the Running Applications.

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> After YARN-1809, we have common appBlock/attemptBlock for both RM WebUI and 
> AHS WebUI.
> But there are some information, such as containerLocalityStatistics, 
> ResourceRequests, are only useful for the Running Applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-3379:
---

Assignee: Xuan Gong

> Missing data in localityTable and ResourceRequests table in RM WebUI
> 
>
> Key: YARN-3379
> URL: https://issues.apache.org/jira/browse/YARN-3379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3379) Missing data in localityTable and ResourceRequests table in RM WebUI

2015-03-19 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3379:
---

 Summary: Missing data in localityTable and ResourceRequests table 
in RM WebUI
 Key: YARN-3379
 URL: https://issues.apache.org/jira/browse/YARN-3379
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370041#comment-14370041
 ] 

Naganarasimha G R commented on YARN-3034:
-

Hi [~djp], thanks for informing, i was trying to update from YARN-2928 branch 
but unfortunately was not getting the modifications from YARN-, 
tried using {{git pull -v --progress "origin"}} and also pulling through tools 
like git-cola. Anyway will try to clone the branch tomorrow and try again, mean 
while can you check once whether you are able to get the modifications in 
YARN-2928 branch ?

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues

2015-03-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3181:
---
Priority: Major  (was: Blocker)

The earlier commit was causing issues, and I have reverted it recently. I don't 
think this is an urgent concern.

The only reason we would want to fix this is to generally reduce the tech debt 
in the FairScheduler.

Am I missing something? 

> FairScheduler: Fix up outdated findbugs issues
> --
>
> Key: YARN-3181
> URL: https://issues.apache.org/jira/browse/YARN-3181
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3181-002.patch, yarn-3181-1.patch
>
>
> In FairScheduler, we have excluded some findbugs-reported errors. Some of 
> them aren't applicable anymore, and there are a few that can be easily fixed 
> without needing an exclusion. It would be nice to fix them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended

2015-03-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369965#comment-14369965
 ] 

Jason Lowe commented on YARN-2369:
--

Sure, feel free to post a proposed design/patch for it.

> Environment variable handling assumes values should be appended
> ---
>
> Key: YARN-2369
> URL: https://issues.apache.org/jira/browse/YARN-2369
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jason Lowe
>Assignee: Dustin Cote
>
> When processing environment variables for a container context the code 
> assumes that the value should be appended to any pre-existing value in the 
> environment.  This may be desired behavior for handling path-like environment 
> variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
> non-intuitive and harmful way to handle any variable that does not have 
> path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended

2015-03-19 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369953#comment-14369953
 ] 

Dustin Cote commented on YARN-2369:
---

[~jlowe] or [~aw] is this one still needed?  If it is, I'd like to take a crack 
at it.  I've had problems with the LD_LIBRARY_PATH in my own experiences, so if 
it's not fixed by something else in a later version I think it should be.

> Environment variable handling assumes values should be appended
> ---
>
> Key: YARN-2369
> URL: https://issues.apache.org/jira/browse/YARN-2369
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jason Lowe
>Assignee: Dustin Cote
>
> When processing environment variables for a container context the code 
> assumes that the value should be appended to any pre-existing value in the 
> environment.  This may be desired behavior for handling path-like environment 
> variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
> non-intuitive and harmful way to handle any variable that does not have 
> path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled

2015-03-19 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-2740:
---

Assignee: Naganarasimha G R  (was: Wangda Tan)

> RM AdminService should prevent admin change labels on nodes when distributed 
> node label configuration enabled
> -
>
> Key: YARN-2740
> URL: https://issues.apache.org/jira/browse/YARN-2740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch
>
>
> According to YARN-2495, labels of nodes will be specified when NM do 
> heartbeat. We shouldn't allow admin modify labels on nodes when distributed 
> node label configuration enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled

2015-03-19 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2740:

Attachment: YARN-2740.20150320-1.patch

Hi [~wangda],
As per your  
[comment|https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14353353&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14353353]
 in YARN-2495,
bq. when distributed node label configuration is set, any direct modify node to 
labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels). 
This can be done in a separated JIRA.
As there was jira already existing for it, taking over this jira and also it 
was requiring a similar check for RMWebServices flow so added a check and test 
case for the same   

> RM AdminService should prevent admin change labels on nodes when distributed 
> node label configuration enabled
> -
>
> Key: YARN-2740
> URL: https://issues.apache.org/jira/browse/YARN-2740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch
>
>
> According to YARN-2495, labels of nodes will be specified when NM do 
> heartbeat. We shouldn't allow admin modify labels on nodes when distributed 
> node label configuration enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3377) TestTimelineServiceClientIntegration fails

2015-03-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369926#comment-14369926
 ] 

Zhijie Shen commented on YARN-3377:
---

Saw that failure too. Will review the patch.

> TestTimelineServiceClientIntegration fails
> --
>
> Key: YARN-3377
> URL: https://issues.apache.org/jira/browse/YARN-3377
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-3377.001.patch
>
>
> TestTimelineServiceClientIntegration fails. It appears we are getting 500 
> from the timeline collector. This appears to be mostly an issue with the test 
> itself.
> {noformat}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
> testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration)
>   Time elapsed: 32.606 sec  <<< ERROR!
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
> from the timeline server.
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74)
> {noformat}
> The relevant piece from the server side:
> {noformat}
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init
> INFO: Scanning for root resource and provider classes in the packages:
>   org.apache.hadoop.yarn.server.timelineservice.collector
>   org.apache.hadoop.yarn.webapp
>   org.apache.hadoop.yarn.webapp
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
> logClasses
> INFO: Root resource classes found:
>   class org.apache.hadoop.yarn.webapp.MyTestWebService
>   class 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
> logClasses
> INFO: Provider classes found:
>   class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider
>   class org.apache.hadoop.yarn.webapp.GenericExceptionHandler
>   class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver
> Mar 19, 2015 10:48:30 AM 
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
> Mar 19, 2015 10:48:31 AM 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 
> resolve
> SEVERE: null
> java.lang.IllegalAccessException: Class 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can 
> not access a member of class 
> org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers "public"
>   at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95)
>   at java.lang.Class.newInstance0(Class.java:366)
>   at java.lang.Class.newInstance(Class.java:325)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467)
>   at 
> com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181)
>   at 
> com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518)
>   at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
>   at 
> com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.u

[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369917#comment-14369917
 ] 

Wangda Tan commented on YARN-3369:
--

I think we don't need check null here, since Map will be always 
created when priority added.

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3377) TestTimelineServiceClientIntegration fails

2015-03-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369915#comment-14369915
 ] 

Sangjin Lee commented on YARN-3377:
---

With the patch all our tests pass. Could someone take a quick look and provide 
reviews? Thanks!

> TestTimelineServiceClientIntegration fails
> --
>
> Key: YARN-3377
> URL: https://issues.apache.org/jira/browse/YARN-3377
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-3377.001.patch
>
>
> TestTimelineServiceClientIntegration fails. It appears we are getting 500 
> from the timeline collector. This appears to be mostly an issue with the test 
> itself.
> {noformat}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
> testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration)
>   Time elapsed: 32.606 sec  <<< ERROR!
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
> from the timeline server.
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74)
> {noformat}
> The relevant piece from the server side:
> {noformat}
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init
> INFO: Scanning for root resource and provider classes in the packages:
>   org.apache.hadoop.yarn.server.timelineservice.collector
>   org.apache.hadoop.yarn.webapp
>   org.apache.hadoop.yarn.webapp
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
> logClasses
> INFO: Root resource classes found:
>   class org.apache.hadoop.yarn.webapp.MyTestWebService
>   class 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
> logClasses
> INFO: Provider classes found:
>   class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider
>   class org.apache.hadoop.yarn.webapp.GenericExceptionHandler
>   class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver
> Mar 19, 2015 10:48:30 AM 
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
> Mar 19, 2015 10:48:31 AM 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 
> resolve
> SEVERE: null
> java.lang.IllegalAccessException: Class 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can 
> not access a member of class 
> org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers "public"
>   at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95)
>   at java.lang.Class.newInstance0(Class.java:366)
>   at java.lang.Class.newInstance(Class.java:325)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467)
>   at 
> com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181)
>   at 
> com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518)
>   at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
>   at 
> com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.

[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369914#comment-14369914
 ] 

Wangda Tan commented on YARN-3369:
--

Mostly LGTM, could you make code inner {{if ... }} indent?

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3377) TestTimelineServiceClientIntegration fails

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3377:
--
Attachment: YARN-3377.001.patch

In TimelineCollectorManager.startWebApp(), it sets the singleton instance of 
TimelineCollectorManager to the context. This is fine in a normal situation, 
but some tests need to provide mocked instances over the singleton. The patch 
keeps the same behavior for the non-test case, but makes tests work as well.

> TestTimelineServiceClientIntegration fails
> --
>
> Key: YARN-3377
> URL: https://issues.apache.org/jira/browse/YARN-3377
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-3377.001.patch
>
>
> TestTimelineServiceClientIntegration fails. It appears we are getting 500 
> from the timeline collector. This appears to be mostly an issue with the test 
> itself.
> {noformat}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
> testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration)
>   Time elapsed: 32.606 sec  <<< ERROR!
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
> from the timeline server.
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74)
> {noformat}
> The relevant piece from the server side:
> {noformat}
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init
> INFO: Scanning for root resource and provider classes in the packages:
>   org.apache.hadoop.yarn.server.timelineservice.collector
>   org.apache.hadoop.yarn.webapp
>   org.apache.hadoop.yarn.webapp
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
> logClasses
> INFO: Root resource classes found:
>   class org.apache.hadoop.yarn.webapp.MyTestWebService
>   class 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
> logClasses
> INFO: Provider classes found:
>   class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider
>   class org.apache.hadoop.yarn.webapp.GenericExceptionHandler
>   class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver
> Mar 19, 2015 10:48:30 AM 
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
> Mar 19, 2015 10:48:31 AM 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 
> resolve
> SEVERE: null
> java.lang.IllegalAccessException: Class 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can 
> not access a member of class 
> org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers "public"
>   at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95)
>   at java.lang.Class.newInstance0(Class.java:366)
>   at java.lang.Class.newInstance(Class.java:325)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467)
>   at 
> com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181)
>   at 
> com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518)
>   at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
>   at 
> com.sun.

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369909#comment-14369909
 ] 

Wangda Tan commented on YARN-2495:
--

Hi [~Naganarasimha],
For 1)
I suggest to use a separated PB message, such as NodeLabelsProto (or make it 
generic like StringArrayProto), which contains {{repeat string}}.
When using NodeIdToLabelsProto, but don't use nodeId, that will confuse people.

About test cases
1. NodeStatusUpdaterTest:
Some places need to cover:
- NM register, should check RTS (ResourceTrackerService) labels (done)
- NM heartheat, should check RTS labels (TODO)
- NM headtbeat without update, should check RTS received labels (TODO)
- NM heartbeat with update, should check RTS received labels (TODO)

2. TestResourceTrackerService
Test generally very good to me,
- lblsMgr -> nodeLabelsMgr or labelsMgr

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495.20150318-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2369) Environment variable handling assumes values should be appended

2015-03-19 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reassigned YARN-2369:
-

Assignee: Dustin Cote

> Environment variable handling assumes values should be appended
> ---
>
> Key: YARN-2369
> URL: https://issues.apache.org/jira/browse/YARN-2369
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jason Lowe
>Assignee: Dustin Cote
>
> When processing environment variables for a container context the code 
> assumes that the value should be appended to any pre-existing value in the 
> environment.  This may be desired behavior for handling path-like environment 
> variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
> non-intuitive and harmful way to handle any variable that does not have 
> path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369897#comment-14369897
 ] 

Brahma Reddy Battula commented on YARN-3369:


Attached patch ..[~leftnoteasy] and [~kasha] kindly review!!

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues

2015-03-19 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369892#comment-14369892
 ] 

Brahma Reddy Battula commented on YARN-3181:


[~ka...@cloudera.com] kindly review the attached patch... ,This is failing all 
the patches, marking it as a blocker...please feel free to change defect 
severity, if wn't agree with me..

> FairScheduler: Fix up outdated findbugs issues
> --
>
> Key: YARN-3181
> URL: https://issues.apache.org/jira/browse/YARN-3181
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3181-002.patch, yarn-3181-1.patch
>
>
> In FairScheduler, we have excluded some findbugs-reported errors. Some of 
> them aren't applicable anymore, and there are a few that can be easily fixed 
> without needing an exclusion. It would be nice to fix them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369886#comment-14369886
 ] 

Yongjun Zhang commented on YARN-3021:
-

Running the failed test TestRM locally is successful.


> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues

2015-03-19 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3181:
---
Priority: Blocker  (was: Major)

> FairScheduler: Fix up outdated findbugs issues
> --
>
> Key: YARN-3181
> URL: https://issues.apache.org/jira/browse/YARN-3181
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3181-002.patch, yarn-3181-1.patch
>
>
> In FairScheduler, we have excluded some findbugs-reported errors. Some of 
> them aren't applicable anymore, and there are a few that can be easily fixed 
> without needing an exclusion. It would be nice to fix them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues

2015-03-19 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3181:
---
Issue Type: Bug  (was: Improvement)

> FairScheduler: Fix up outdated findbugs issues
> --
>
> Key: YARN-3181
> URL: https://issues.apache.org/jira/browse/YARN-3181
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3181-002.patch, yarn-3181-1.patch
>
>
> In FairScheduler, we have excluded some findbugs-reported errors. Some of 
> them aren't applicable anymore, and there are a few that can be easily fixed 
> without needing an exclusion. It would be nice to fix them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369875#comment-14369875
 ] 

Brahma Reddy Battula commented on YARN-3369:


Yes,Adding null check is sufficient..priority can null if container got 
allocated under any other priority,,please check the following snippet for 
same..

{code}
Map nodeRequests = requests.get(priority);
return (nodeRequests == null) ? null : nodeRequests.get(resourceName);
{code}

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369872#comment-14369872
 ] 

Junping Du commented on YARN-3034:
--

I have commit YARN- in. [~Naganarasimha], would you mind rebase the patch 
and address my minor comments? Thanks!

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die

2015-03-19 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3369:
---
Attachment: YARN-3369.patch

> Missing NullPointer check in AppSchedulingInfo causes RM to die 
> 
>
> Key: YARN-3369
> URL: https://issues.apache.org/jira/browse/YARN-3369
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-3369.patch
>
>
> In AppSchedulingInfo.java the method checkForDeactivation() has these 2 
> consecutive lines:
> {code}
> ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY);
> if (request.getNumContainers() > 0) {
> {code}
> the first line calls getResourceRequest and it can return null.
> {code}
> synchronized public ResourceRequest getResourceRequest(
> Priority priority, String resourceName) {
> Map nodeRequests = requests.get(priority);
> return  (nodeRequests == null) ? {color:red} null : 
> nodeRequests.get(resourceName);
> }
> {code}
> The second line dereferences the pointer directly without a check.
> If the pointer is null, the RM dies. 
> {quote}2015-03-17 14:14:04,757 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739)
> at java.lang.Thread.run(Thread.java:722)
> {color:red} *2015-03-17 14:14:04,758 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, 
> bbye..*{color} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369868#comment-14369868
 ] 

Hadoop QA commented on YARN-3021:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705619/YARN-3021.004.patch
  against trunk revision 1ccbc29.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7024//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7024//console

This message is automatically generated.

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector

2015-03-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-.
--
  Resolution: Fixed
Hadoop Flags: Reviewed

> rename TimelineAggregator etc. to TimelineCollector
> ---
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN--unit-tests-fixes.patch, YARN-.001.patch, 
> YARN-.002.patch
>
>
> Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to 
> TimelineCollector, etc.
> There are also several minor issues on the current branch, which can be fixed 
> as part of this:
> - fixing some imports
> - missing license in TestTimelineServerClientIntegration.java
> - whitespaces
> - missing direct dependency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector

2015-03-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369865#comment-14369865
 ] 

Junping Du commented on YARN-:
--

Thanks [~sjlee0] for the patch! I have commit the patch to YARN-2928.

> rename TimelineAggregator etc. to TimelineCollector
> ---
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN--unit-tests-fixes.patch, YARN-.001.patch, 
> YARN-.002.patch
>
>
> Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to 
> TimelineCollector, etc.
> There are also several minor issues on the current branch, which can be fixed 
> as part of this:
> - fixing some imports
> - missing license in TestTimelineServerClientIntegration.java
> - whitespaces
> - missing direct dependency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3039) [Collector wireup] Implement timeline app-level collector service discovery

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3039:
--
Summary: [Collector wireup] Implement timeline app-level collector service 
discovery  (was: [Aggregator wireup] Implement ATS app-appgregator service 
discovery)

> [Collector wireup] Implement timeline app-level collector service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Fix For: YARN-2928
>
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
> YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
> YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch, 
> YARN-3039-v6.patch, YARN-3039-v7.patch, YARN-3039-v8.patch, YARN-3039.9.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3374) Collector's web server should randomly bind an available port

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3374:
--
Summary: Collector's web server should randomly bind an available port  
(was: Aggregator's web server should randomly bind an available port)

> Collector's web server should randomly bind an available port
> -
>
> Key: YARN-3374
> URL: https://issues.apache.org/jira/browse/YARN-3374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> It's based on the configuration now. The approach won't work if we move to 
> app-level aggregator container solution. On NM my start multiple such 
> aggregators, which cannot bind to the same configured port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3359) Recover collector list in RM failed over

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3359:
--
Summary: Recover collector list in RM failed over  (was: Recover aggregator 
(collector) list in RM failed over)

> Recover collector list in RM failed over
> 
>
> Key: YARN-3359
> URL: https://issues.apache.org/jira/browse/YARN-3359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>
> Per discussion in YARN-3039, split the recover work from RMStateStore in a 
> separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3210) [Source organization] Refactor timeline collector according to new code organization

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3210:
--
Summary: [Source organization] Refactor timeline collector according to new 
code organization  (was: [Source organization] Refactor timeline aggregator 
according to new code organization)

> [Source organization] Refactor timeline collector according to new code 
> organization
> 
>
> Key: YARN-3210
> URL: https://issues.apache.org/jira/browse/YARN-3210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: refactor
> Fix For: YARN-2928
>
> Attachments: YARN-3210-022715.patch, YARN-3210-030215.patch, 
> YARN-3210-030215_1.patch, YARN-3210-030215_2.patch
>
>
> We may want to refactor the code of timeline aggregator according to the 
> discussion of YARN-3166, the code organization for timeline service v2. We 
> need to refactor the code after we reach an agreement on the aggregator part 
> of YARN-3166. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-19 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3040:
--
Summary: [Data Model] Make putEntities operation be aware of the app's 
context  (was: [Data Model] Implement client-side API for handling flows)

> [Data Model] Make putEntities operation be aware of the app's context
> -
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3167) [Collector implementation] Implement the core functionality of the timeline collector

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3167:
--
Summary: [Collector implementation] Implement the core functionality of the 
timeline collector  (was: [Aggregator implementation] Implement the core 
functionality of the TimelineAggregator service)

> [Collector implementation] Implement the core functionality of the timeline 
> collector
> -
>
> Key: YARN-3167
> URL: https://issues.apache.org/jira/browse/YARN-3167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
> Attachments: RM-AM-NM-Aggregator.png, 
> Sequence_diagram_User_RM_AM_NM_Aggregator_Writer.png
>
>
> The basic skeleton of the timeline aggregator has been set up by YARN-3030. 
> We need to implement the core functionality of the base aggregator service. 
> The key things include
> - handling the requests from clients (sync or async)
> - buffering data
> - handling the aggregation logic
> - invoking the storage API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3116:
--
Summary: [Collector wireup] We need an assured way to determine if a 
container is an AM container on NM  (was: [Aggregator wireup] We need an 
assured way to determine if a container is an AM container on NM)

> [Collector wireup] We need an assured way to determine if a container is an 
> AM container on NM
> --
>
> Key: YARN-3116
> URL: https://issues.apache.org/jira/browse/YARN-3116
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> In YARN-3030, to start the per-app aggregator only for a started AM 
> container,  we need to determine if the container is an AM container or not 
> from the context in NM (we can do it on RM). This information is missing, 
> such that we worked around to considered the container with ID "_01" as 
> the AM container. Unfortunately, this is neither necessary or sufficient 
> condition. We need to have a way to determine if a container is an AM 
> container on NM. We can add flag to the container object or create an API to 
> do the judgement. Perhaps the distributed AM information may also be useful 
> to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-03-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369848#comment-14369848
 ] 

Zhijie Shen commented on YARN-3040:
---

Please hold the review. Per offline discussion. For AM and NM use case, we can 
move the context info to the aggregator directly. I'll create a new patch soon.

> [Data Model] Implement client-side API for handling flows
> -
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Zhijie Shen
> Attachments: YARN-3040.1.patch
>
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3115) [Collector wireup] Work-preserving restarting of per-node timeline collector

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3115:
--
Summary: [Collector wireup] Work-preserving restarting of per-node timeline 
collector  (was: [Aggregator wireup] Work-preserving restarting of per-node 
aggregator)

> [Collector wireup] Work-preserving restarting of per-node timeline collector
> 
>
> Key: YARN-3115
> URL: https://issues.apache.org/jira/browse/YARN-3115
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Junping Du
>
> YARN-3030 makes the per-node aggregator work as the aux service of a NM. It 
> contains the states of the per-app aggregators corresponding to the running 
> AM containers on this NM. While NM is restarted in work-preserving mode, this 
> information of per-node aggregator needs to be carried on over restarting too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3087) [Collector implementation] the REST server (web server) for per-node collector does not work if it runs inside node manager

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3087:
--
Summary: [Collector implementation] the REST server (web server) for 
per-node collector does not work if it runs inside node manager  (was: 
[Aggregator implementation] the REST server (web server) for per-node 
aggregator does not work if it runs inside node manager)

> [Collector implementation] the REST server (web server) for per-node 
> collector does not work if it runs inside node manager
> ---
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, 
> YARN-3087-022615.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3032) [Collector implementation] Implement timeline collector functionality to serve ATS readers' requests for live apps

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3032:
--
Summary: [Collector implementation] Implement timeline collector 
functionality to serve ATS readers' requests for live apps  (was: [Aggregator 
implementation] Implement ATS writer functionality to serve ATS readers' 
requests for live apps)

> [Collector implementation] Implement timeline collector functionality to 
> serve ATS readers' requests for live apps
> --
>
> Key: YARN-3032
> URL: https://issues.apache.org/jira/browse/YARN-3032
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Per design in YARN-2928, implement the functionality in ATS writer to serve 
> data for live apps coming from ATS readers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3038) [Collector wireup] Handle timeline collector failure scenarios

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3038:
--
Summary: [Collector wireup] Handle timeline collector failure scenarios  
(was: [Aggregator wireup] Handle ATS writer failure scenarios)

> [Collector wireup] Handle timeline collector failure scenarios
> --
>
> Key: YARN-3038
> URL: https://issues.apache.org/jira/browse/YARN-3038
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Per design in YARN-2928, consider various ATS writer failure scenarios, and 
> implement proper handling.
> For example, ATS writers may fail and exit due to OOM. It should be retried a 
> certain number of times in that case. We also need to tie fatal ATS writer 
> failures (after exhausting all retries) to the application failure, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for timeline collectors

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3031:
--
Summary: [Storage abstraction] Create backing storage write interface for 
timeline collectors  (was: [Storage abstraction] Create backing storage write 
interface for ATS writers)

> [Storage abstraction] Create backing storage write interface for timeline 
> collectors
> 
>
> Key: YARN-3031
> URL: https://issues.apache.org/jira/browse/YARN-3031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
> Attachments: Sequence_diagram_write_interaction.2.png, 
> Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
> YARN-3031.02.patch, YARN-3031.03.patch
>
>
> Per design in YARN-2928, come up with the interface for the ATS writer to 
> write to various backing storages. The interface should be created to capture 
> the right level of abstractions so that it will enable all backing storage 
> implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3033) [Collector wireup] Implement NM starting the standalone timeline collector daemon

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3033:
--
Summary: [Collector wireup] Implement NM starting the standalone timeline 
collector daemon  (was: [Aggregator wireup] Implement NM starting the 
standalone ATS writer companion)

> [Collector wireup] Implement NM starting the standalone timeline collector 
> daemon
> -
>
> Key: YARN-3033
> URL: https://issues.apache.org/jira/browse/YARN-3033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf
>
>
> Per design in YARN-2928, implement node managers starting the ATS writer 
> companion. In YARN-2928, we already have an auxiliary service based solution. 
> Per discussion below, the bulk of that approach has actually been done as 
> part of YARN-3030. In this ticket we can work on the remaining tasks, for 
> example:
> # any needed change for configuration, esp. running it inside the NM (e.g. 
> the number of servlet threads)
> # set up a start script that starts the per-node aggregator as a standalone 
> daemon
> # for the standalone mode, implement a service that receives requests to set 
> up and tear down the app-level data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3034) [Collector wireup] Implement RM starting its timeline collector

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3034:
--
Summary: [Collector wireup] Implement RM starting its timeline collector  
(was: [Aggregator wireup] Implement RM starting its ATS writer)

> [Collector wireup] Implement RM starting its timeline collector
> ---
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch, 
> YARN-3034.20150316-1.patch, YARN-3034.20150318-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3378) a load test client that can replay a volume of history files

2015-03-19 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-3378:
---

Assignee: Li Lu

> a load test client that can replay a volume of history files
> 
>
> Key: YARN-3378
> URL: https://issues.apache.org/jira/browse/YARN-3378
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
>
> It might be good to create a load test client that can replay a large volume 
> of history files into the timeline service. One can envision running such a 
> load test client as a mapreduce job and generate a fair amount of load. It 
> would be useful to spot check correctness, and more importantly observe 
> performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3030) [Collector wireup] Set up timeline collector with basic request serving structure and lifecycle

2015-03-19 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3030:
--
Summary: [Collector wireup] Set up timeline collector with basic request 
serving structure and lifecycle  (was: [Aggregator wireup] Set up ATS writer 
with basic request serving structure and lifecycle)

> [Collector wireup] Set up timeline collector with basic request serving 
> structure and lifecycle
> ---
>
> Key: YARN-3030
> URL: https://issues.apache.org/jira/browse/YARN-3030
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: YARN-2928
>
> Attachments: YARN-3030.001.patch, YARN-3030.002.patch, 
> YARN-3030.003.patch, YARN-3030.004.patch
>
>
> Per design in YARN-2928, create an ATS writer as a service, and implement the 
> basic service structure including the lifecycle management.
> Also, as part of this JIRA, we should come up with the ATS client API for 
> sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3377) TestTimelineServiceClientIntegration fails

2015-03-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369838#comment-14369838
 ] 

Sangjin Lee commented on YARN-3377:
---

The root cause is known. I'll post a patch once YARN- is resolved.

> TestTimelineServiceClientIntegration fails
> --
>
> Key: YARN-3377
> URL: https://issues.apache.org/jira/browse/YARN-3377
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
>
> TestTimelineServiceClientIntegration fails. It appears we are getting 500 
> from the timeline collector. This appears to be mostly an issue with the test 
> itself.
> {noformat}
> ---
> Test set: 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
> testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration)
>   Time elapsed: 32.606 sec  <<< ERROR!
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
> from the timeline server.
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74)
> {noformat}
> The relevant piece from the server side:
> {noformat}
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init
> INFO: Scanning for root resource and provider classes in the packages:
>   org.apache.hadoop.yarn.server.timelineservice.collector
>   org.apache.hadoop.yarn.webapp
>   org.apache.hadoop.yarn.webapp
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
> logClasses
> INFO: Root resource classes found:
>   class org.apache.hadoop.yarn.webapp.MyTestWebService
>   class 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService
> Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
> logClasses
> INFO: Provider classes found:
>   class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider
>   class org.apache.hadoop.yarn.webapp.GenericExceptionHandler
>   class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver
> Mar 19, 2015 10:48:30 AM 
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
> Mar 19, 2015 10:48:31 AM 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 
> resolve
> SEVERE: null
> java.lang.IllegalAccessException: Class 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can 
> not access a member of class 
> org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers "public"
>   at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95)
>   at java.lang.Class.newInstance0(Class.java:366)
>   at java.lang.Class.newInstance(Class.java:325)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467)
>   at 
> com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181)
>   at 
> com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518)
>   at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
>   at 
> com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandP

  1   2   >