date:20130904


[ 
https://issues.apache.org/jira/browse/YARN-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757652#comment-13757652
 ] 

Hudson commented on YARN-1074:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #322 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/322/])
YARN-1124. Modified YARN CLI application list to display new and submitted 
applications together with running apps by default, following up YARN-1074. 
Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


 Clean up YARN CLI app list to show only running apps.
 -

 Key: YARN-1074
 URL: https://issues.apache.org/jira/browse/YARN-1074
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-1074.1.patch, YARN-1074.2.patch, YARN-1074.3.patch, 
 YARN-1074.4.patch, YARN-1074.5.patch, YARN-1074.6.patch, YARN-1074.7.patch, 
 YARN-1074.8.patch


 Once a user brings up YARN daemon, runs jobs, jobs will stay in output 
 returned by $ yarn application -list even after jobs complete already. We 
 want YARN command line to clean up this list. Specifically, we want to remove 
 applications with FINISHED state(not Final-State) or KILLED state from the 
 result.
 {code}
 [user1@host1 ~]$ yarn application -list
 Total Applications:150
 Application-IdApplication-Name
 Application-Type  User   Queue   State   
 Final-State   ProgressTracking-URL
 application_1374638600275_0109   Sleep job   
 MAPREDUCEuser1  default  KILLED
 KILLED   100%host1:54059
 application_1374638600275_0121   Sleep job   
 MAPREDUCEuser1  defaultFINISHED 
 SUCCEEDED   100% host1:19888/jobhistory/job/job_1374638600275_0121
 application_1374638600275_0020   Sleep job   
 MAPREDUCEuser1  defaultFINISHED 
 SUCCEEDED   100% host1:19888/jobhistory/job/job_1374638600275_0020
 application_1374638600275_0038   Sleep job   
 MAPREDUCEuser1  default  
 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1124) By default yarn application -list should display all the applications in a state other than FINISHED / FAILED


[ 
https://issues.apache.org/jira/browse/YARN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757654#comment-13757654
 ] 

Hudson commented on YARN-1124:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #322 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/322/])
YARN-1124. Modified YARN CLI application list to display new and submitted 
applications together with running apps by default, following up YARN-1074. 
Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


 By default yarn application -list should display all the applications in a 
 state other than FINISHED / FAILED
 -

 Key: YARN-1124
 URL: https://issues.apache.org/jira/browse/YARN-1124
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: YARN-1124.1.patch


 Today we are just listing application in RUNNING state by default for yarn 
 application -list. Instead we should show all the applications which are 
 either submitted/accepted/running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-09-04 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757710#comment-13757710
]

Junping Du commented on YARN-311:
-

Thanks for review! [~tucu00]
bq. If we make totalCapability volatile then we don't need to use a read/write
lock.
Yes. Make it as volatile sounds better as locking whole object is not
necessary. Will update patch soon.
bq. Does this mean that if the node is restarted we lose the capacity
correction done thru the RM admin API for that node?
Yes and No. It is correct that this patch will not guarantee capacity
correction persist through NM restart but the other jira (YARN-998) under the
same umbrella will address this persistent issue. My current thinking is we can
cache a mapping in RM as NodeID - updatedResource which is updated by RM admin
call and NM restart heartbeat will try to find if new resource there before
registering node's resource. Does that make sense to you? May be we can discuss
more options in YARN-998.

Dynamic node resource configuration: core scheduler changes
---

Key: YARN-311
URL: https://issues.apache.org/jira/browse/YARN-311
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch,
YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch,
YARN-311-v6.patch

As the first step, we go for resource change on RM side and expose admin APIs
(admin protocol, CLI, REST and JMX API). In this jira, we will only contain
changes in scheduler.
For design details, please refer proposal and discussions in parent JIRA:
YARN-291.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1074) Clean up YARN CLI app list to show only running apps.


[ 
https://issues.apache.org/jira/browse/YARN-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757749#comment-13757749
 ] 

Hudson commented on YARN-1074:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1512 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1512/])
YARN-1124. Modified YARN CLI application list to display new and submitted 
applications together with running apps by default, following up YARN-1074. 
Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


 Clean up YARN CLI app list to show only running apps.
 -

 Key: YARN-1074
 URL: https://issues.apache.org/jira/browse/YARN-1074
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-1074.1.patch, YARN-1074.2.patch, YARN-1074.3.patch, 
 YARN-1074.4.patch, YARN-1074.5.patch, YARN-1074.6.patch, YARN-1074.7.patch, 
 YARN-1074.8.patch


 Once a user brings up YARN daemon, runs jobs, jobs will stay in output 
 returned by $ yarn application -list even after jobs complete already. We 
 want YARN command line to clean up this list. Specifically, we want to remove 
 applications with FINISHED state(not Final-State) or KILLED state from the 
 result.
 {code}
 [user1@host1 ~]$ yarn application -list
 Total Applications:150
 Application-IdApplication-Name
 Application-Type  User   Queue   State   
 Final-State   ProgressTracking-URL
 application_1374638600275_0109   Sleep job   
 MAPREDUCEuser1  default  KILLED
 KILLED   100%host1:54059
 application_1374638600275_0121   Sleep job   
 MAPREDUCEuser1  defaultFINISHED 
 SUCCEEDED   100% host1:19888/jobhistory/job/job_1374638600275_0121
 application_1374638600275_0020   Sleep job   
 MAPREDUCEuser1  defaultFINISHED 
 SUCCEEDED   100% host1:19888/jobhistory/job/job_1374638600275_0020
 application_1374638600275_0038   Sleep job   
 MAPREDUCEuser1  default  
 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-09-04 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-311:


Attachment: YARN-311-v6.2.patch

Address recent comments from Alejandro on replacing read/write lock with 
volatile on setCapability in RMNodeImpl.

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
 YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
 YARN-311-v6.2.patch, YARN-311-v6.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API). In this jira, we will only contain 
 changes in scheduler.
 For design details, please refer proposal and discussions in parent JIRA: 
 YARN-291.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1124) By default yarn application -list should display all the applications in a state other than FINISHED / FAILED


[ 
https://issues.apache.org/jira/browse/YARN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757751#comment-13757751
 ] 

Hudson commented on YARN-1124:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1512 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1512/])
YARN-1124. Modified YARN CLI application list to display new and submitted 
applications together with running apps by default, following up YARN-1074. 
Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


 By default yarn application -list should display all the applications in a 
 state other than FINISHED / FAILED
 -

 Key: YARN-1124
 URL: https://issues.apache.org/jira/browse/YARN-1124
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: YARN-1124.1.patch


 Today we are just listing application in RUNNING state by default for yarn 
 application -list. Instead we should show all the applications which are 
 either submitted/accepted/running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (YARN-1106) The RM should point the tracking url to the RM app page if its empty


[ 
https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757787#comment-13757787
 ] 

Thomas Graves edited comment on YARN-1106 at 9/4/13 2:03 PM:
-

Commented on that jira, I don't see how that fixes the tracking url being empty 
issue.  Once we have the generic history server that fixes at least some of the 
cases, but as you say in the other jira there are a bunch of corner cases.

  was (Author: tgraves):
Commented on that jira, I don't see how that fixing the tracking url being 
empty issue.  Once we have the generic history server that fixes at least some 
of the cases, but as you say in the other jira there are a bunch of corner 
cases.
  
 The RM should point the tracking url to the RM app page if its empty
 

 Key: YARN-1106
 URL: https://issues.apache.org/jira/browse/YARN-1106
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Thomas Graves

 It would be nice if the Resourcemanager set the tracking url to the RM app 
 page if the application master doesn't pass one or passes the empty string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes


[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757791#comment-13757791
 ] 

Hadoop QA commented on YARN-311:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601377/YARN-311-v6.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1829//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1829//console

This message is automatically generated.

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
 YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
 YARN-311-v6.2.patch, YARN-311-v6.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API). In this jira, we will only contain 
 changes in scheduler.
 For design details, please refer proposal and discussions in parent JIRA: 
 YARN-291.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-09-04 Thread Trevor Lorimer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Lorimer updated YARN-696:


Attachment: YARN-696.diff

No problem Zhijie, they are great comments thanks.
I have applied the changes and broke the lines where I can at 80 characters.

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
 Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, 
 YARN-696.diff, YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1106) The RM should point the tracking url to the RM app page if its empty


[ 
https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757787#comment-13757787
 ] 

Thomas Graves commented on YARN-1106:
-

Commented on that jira, I don't see how that fixing the tracking url being 
empty issue.  Once we have the generic history server that fixes at least some 
of the cases, but as you say in the other jira there are a bunch of corner 
cases.

 The RM should point the tracking url to the RM app page if its empty
 

 Key: YARN-1106
 URL: https://issues.apache.org/jira/browse/YARN-1106
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Thomas Graves

 It would be nice if the Resourcemanager set the tracking url to the RM app 
 page if the application master doesn't pass one or passes the empty string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1140) Tracking URL is broken in a lots of corner cases, and can be the AM page or the application page depending on the situation


[ 
https://issues.apache.org/jira/browse/YARN-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757788#comment-13757788
 ] 

Thomas Graves commented on YARN-1140:
-

I understand always linking to the per app page to try to make it more 
consistent but at the same time I don't like that power users will have to 
click one more time.  I also don't see how this solves the issue with the 
tracking url being a bad link, unless you are also proposing to handle that 
better on the app page?  if an app finishes and doesn't set the history link 
(for instance a non-mapreduce app) or crashes before they can set it, the 
tracking url link is still going to to go a bad page.

 Tracking URL is broken in a lots of corner cases, and can be the AM page or 
 the application page depending on the situation
 ---

 Key: YARN-1140
 URL: https://issues.apache.org/jira/browse/YARN-1140
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli

 Today, there are so many corner cases, specifically when the AM fails to 
 start, when users will see that the tracking URL is broken or redirected to 
 the per-app page. I am thinking of removing the tracking URL completely from 
 the landing web-page and always force users to first jump on to the 
 application-page. That way, there is consistency and there will always be one 
 page that users can go to for their app information and then subsequently 
 navigate to the AM page if all went well.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1124) By default yarn application -list should display all the applications in a state other than FINISHED / FAILED


[ 
https://issues.apache.org/jira/browse/YARN-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757811#comment-13757811
 ] 

Hudson commented on YARN-1124:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1539 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1539/])
YARN-1124. Modified YARN CLI application list to display new and submitted 
applications together with running apps by default, following up YARN-1074. 
Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


 By default yarn application -list should display all the applications in a 
 state other than FINISHED / FAILED
 -

 Key: YARN-1124
 URL: https://issues.apache.org/jira/browse/YARN-1124
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: YARN-1124.1.patch


 Today we are just listing application in RUNNING state by default for yarn 
 application -list. Instead we should show all the applications which are 
 either submitted/accepted/running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1074) Clean up YARN CLI app list to show only running apps.


[ 
https://issues.apache.org/jira/browse/YARN-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757809#comment-13757809
 ] 

Hudson commented on YARN-1074:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1539 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1539/])
YARN-1124. Modified YARN CLI application list to display new and submitted 
applications together with running apps by default, following up YARN-1074. 
Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519869)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


 Clean up YARN CLI app list to show only running apps.
 -

 Key: YARN-1074
 URL: https://issues.apache.org/jira/browse/YARN-1074
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-1074.1.patch, YARN-1074.2.patch, YARN-1074.3.patch, 
 YARN-1074.4.patch, YARN-1074.5.patch, YARN-1074.6.patch, YARN-1074.7.patch, 
 YARN-1074.8.patch


 Once a user brings up YARN daemon, runs jobs, jobs will stay in output 
 returned by $ yarn application -list even after jobs complete already. We 
 want YARN command line to clean up this list. Specifically, we want to remove 
 applications with FINISHED state(not Final-State) or KILLED state from the 
 result.
 {code}
 [user1@host1 ~]$ yarn application -list
 Total Applications:150
 Application-IdApplication-Name
 Application-Type  User   Queue   State   
 Final-State   ProgressTracking-URL
 application_1374638600275_0109   Sleep job   
 MAPREDUCEuser1  default  KILLED
 KILLED   100%host1:54059
 application_1374638600275_0121   Sleep job   
 MAPREDUCEuser1  defaultFINISHED 
 SUCCEEDED   100% host1:19888/jobhistory/job/job_1374638600275_0121
 application_1374638600275_0020   Sleep job   
 MAPREDUCEuser1  defaultFINISHED 
 SUCCEEDED   100% host1:19888/jobhistory/job/job_1374638600275_0020
 application_1374638600275_0038   Sleep job   
 MAPREDUCEuser1  default  
 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1106) The RM should point the tracking url to the RM app page if its empty


 [ 
https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1106:


Attachment: YARN-1106.patch

 The RM should point the tracking url to the RM app page if its empty
 

 Key: YARN-1106
 URL: https://issues.apache.org/jira/browse/YARN-1106
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1106.patch


 It would be nice if the Resourcemanager set the tracking url to the RM app 
 page if the application master doesn't pass one or passes the empty string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1106) The RM should point the tracking url to the RM app page if its empty


[ 
https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757866#comment-13757866
 ] 

Hadoop QA commented on YARN-1106:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601390/YARN-1106.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1831//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1831//console

This message is automatically generated.

 The RM should point the tracking url to the RM app page if its empty
 

 Key: YARN-1106
 URL: https://issues.apache.org/jira/browse/YARN-1106
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1106.patch


 It would be nice if the Resourcemanager set the tracking url to the RM app 
 page if the application master doesn't pass one or passes the empty string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757839#comment-13757839
 ] 

Jason Lowe commented on YARN-540:
-

Sorry for arriving late, but why wouldn't we want to implement choice (1) 
above?  (i.e.: block until store confirms app state is removed).  From an AM's 
perspective, that's the simplest solution.  Returning control to the AM early 
from the unregister is inviting the AM to do bad things wrt. a potential 
restart (e.g.: MR AM will remove its staging directory, effectively preventing 
the restart from succeeding and leading the RM to believe the app failed).  The 
unregister call is a terminal call in the AM-RM protocol, so I think it's 
appropriate for that to not return until the app truly is unregistered.

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch, 
 YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken


[ 
https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757904#comment-13757904
 ] 

Daryn Sharp commented on YARN-707:
--

Ug, the RM and AM are abusing the same secret manager impl.  The RM wants the 
secret key to be generated, whereas the AM really wants to verify it.  2.x 
fixed this.

 Add user info in the YARN ClientToken
 -

 Key: YARN-707
 URL: https://issues.apache.org/jira/browse/YARN-707
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, 
 YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, 
 YARN-707-20130830.branch-0.23.txt


 If user info is present in the client token then it can be used to do limited 
 authz in the AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken


[ 
https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757909#comment-13757909
 ] 

Jason Lowe commented on YARN-707:
-

bq. Ug, the RM and AM are abusing the same secret manager impl. The RM wants 
the secret key to be generated, whereas the AM really wants to verify it. 2.x 
fixed this.

Right, this condition as well as the fact that the RM leaks keys in the secret 
manager for each app (no way to remove them) is not new with this patch as it 
was already pre-existing in 0.23.  IMO those issues should be fixed in another 
JIRA since they're not introduced by this change.

 Add user info in the YARN ClientToken
 -

 Key: YARN-707
 URL: https://issues.apache.org/jira/browse/YARN-707
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, 
 YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, 
 YARN-707-20130830.branch-0.23.txt


 If user info is present in the client token then it can be used to do limited 
 authz in the AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken


[ 
https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757890#comment-13757890
 ] 

Daryn Sharp commented on YARN-707:
--

Still reviewing, but an initial observation is 
{{ClientToAMSecretManager#getMasterKey}} is fabricating a new secret key if 
there is no pre-existing key for the appId.  This should be an error condition. 
 The secret manager knows the secret key for the specific app so there's no 
need to ever generate a secret key, right?  Else I can flood the AM with 
invalid appIds to make it go OOM from generating secret keys for invalid appIds.

 Add user info in the YARN ClientToken
 -

 Key: YARN-707
 URL: https://issues.apache.org/jira/browse/YARN-707
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, 
 YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, 
 YARN-707-20130830.branch-0.23.txt


 If user info is present in the client token then it can be used to do limited 
 authz in the AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1144) Unmanaged AMs registering a tracking URI should not be proxy-fied

2013-09-04 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created YARN-1144:


 Summary: Unmanaged AMs registering a tracking URI should not be 
proxy-fied
 Key: YARN-1144
 URL: https://issues.apache.org/jira/browse/YARN-1144
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.1.1-beta


Unmanaged AMs do not run in the cluster, their tracking URL should not be 
proxy-fied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics

2013-09-04 Thread Srimanth Gunturi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757925#comment-13757925
 ] 

Srimanth Gunturi commented on YARN-1001:


Ambari's primary use-case is to show in YARN UI the app-type/state distribution 
as a time-series graph. For this we would make periodic calls to 
{{/ws/v1/cluster/appscount}}. Apart from that we need similar information for 
MR2 UI, where a call to {{/ws/v1/cluster/appscount?types=mapreduce}} would be 
made.

For now having these calls should suffice. 

I am hoping that these calls include both the current/real-time app-type 
counts, as well as historical information (atleast until last RM restart)?

 YARN should provide per application-type and state statistics
 -

 Key: YARN-1001
 URL: https://issues.apache.org/jira/browse/YARN-1001
 Project: Hadoop YARN
  Issue Type: Task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
 Attachments: YARN-1001.1.patch


 In Ambari we plan to show for MR2 the number of applications finished, 
 running, waiting, etc. It would be efficient if YARN could provide per 
 application-type and state aggregated counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-1145) Potential file handler leak in JobHistoryServer web ui.


 [ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe moved MAPREDUCE-5486 to YARN-1145:
-

  Component/s: (was: jobhistoryserver)
 Assignee: (was: Rohith Sharma K S)
 Target Version/s: 2.1.1-beta  (was: 2.1.1-beta)
Affects Version/s: (was: 2.1.1-beta)
   (was: 2.0.5-alpha)
   2.1.1-beta
   2.0.5-alpha
  Key: YARN-1145  (was: MAPREDUCE-5486)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Potential file handler leak in JobHistoryServer web ui.
 ---

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 2.1.1-beta
Reporter: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken


[ 
https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757926#comment-13757926
 ] 

Daryn Sharp commented on YARN-707:
--

Minor:
# {{ClientToAMTokenIdentifier#getUser()}} doesn't do a null check on the client 
name (because it can't be null) but should perhaps still check isEmpty()?
# Is {{ResourceManager#clientToAMSecretManager}} still needed now that it's in 
the context?
# Now that the client token is generated in {{RMAppAttemptImpl}} - should it 
contain the attemptId, not the appId?

 Add user info in the YARN ClientToken
 -

 Key: YARN-707
 URL: https://issues.apache.org/jira/browse/YARN-707
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, 
 YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, 
 YARN-707-20130830.branch-0.23.txt


 If user info is present in the client token then it can be used to do limited 
 authz in the AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1145) Potential file handler leak in JobHistoryServer web ui.


 [ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-1145:


Assignee: Rohith Sharma K S

 Potential file handler leak in JobHistoryServer web ui.
 ---

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call


[ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757824#comment-13757824
 ] 

Hadoop QA commented on YARN-696:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601382/YARN-696.diff
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1830//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1830//console

This message is automatically generated.

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
 Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, 
 YARN-696.diff, YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1145) Potential file handler leak in JobHistoryServer web ui.


[ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757932#comment-13757932
 ] 

Hadoop QA commented on YARN-1145:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12600541/MAPREDUCE-5486.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1832//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1832//console

This message is automatically generated.

 Potential file handler leak in JobHistoryServer web ui.
 ---

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics

2013-09-04 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757973#comment-13757973
 ] 

Zhijie Shen commented on YARN-1001:
---

[~srimanth.gunturi], for /ws/v1/cluster/appscount, does Ambari want to specify 
multiple states and multiple application-types in the params, and get the per 
application-type and state count for every combination of one application-type 
and one state? Or Ambari is fine to make multiple calls, each of which just 
specify one (or zero) application-type and state? The two cases make a big 
difference to the results. The former one needs to return a table which 
contains the counts of all application-type and state combinations, while the 
latter one is so simple to return one number.

bq. I am hoping that these calls include both the current/real-time app-type 
counts, as well as historical information (atleast until last RM restart)?
A count every constant time interval? Would you please specify more about the 
requirement?

bq. There's no restrictions on the type of an app (unless someone wants to add 
those restrictions right now, which may not be a bad idea), so you need to make 
sure the unusual characters make it all the way to this API and back

Sounds a good idea

 YARN should provide per application-type and state statistics
 -

 Key: YARN-1001
 URL: https://issues.apache.org/jira/browse/YARN-1001
 Project: Hadoop YARN
  Issue Type: Task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
 Attachments: YARN-1001.1.patch


 In Ambari we plan to show for MR2 the number of applications finished, 
 running, waiting, etc. It would be efficient if YARN could provide per 
 application-type and state aggregated counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1146) RM DTSM and RMStateStore mismanage sequence number


[ 
https://issues.apache.org/jira/browse/YARN-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757983#comment-13757983
 ] 

Daryn Sharp commented on YARN-1146:
---

Note that bug #2 will not self-correct if the following sequence occurs:
# Issue token 1, 2, 3, 4 (seq=4)
# Renew token 2 (seq=2)
# Cancel token 3, 4 (seq=2)
# Stop RM
# Start RM (seq=2) and will issue token 3 and 4 again

The issue is _probably_ benign given the current implementation, but is a bug 
if anything relies on sequence number.

 RM DTSM and RMStateStore mismanage sequence number
 --

 Key: YARN-1146
 URL: https://issues.apache.org/jira/browse/YARN-1146
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp

 {{RMDelegationTokenSecretManager}} implements {{storeNewToken}} and 
 {{updateStoredToken}} (renew) to pass the token and its sequence number to 
 {{RMStateStore#storeRMDelegationTokenAndSequenceNumber}}.
 There are two problems:
 # The assumption is that new tokens will be synchronously stored in-order.  
 With an async secret manager this may not hold true and the state's sequence 
 number may be incorrect.
 # A token renewal will reset the state's sequence number to _that token's_ 
 sequence number.
 Bug #2 is generally masked.  Creating a new token (with the first caveat) 
 will bump the state's sequence number back up.  Restoring the dtsm will first 
 set the state's stored sequence number, then re-add all the tokens which will 
 update the sequence number if the token's sequence number is greater than the 
 dtsm's current sequence number.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1145) Potential file handle leak in aggregated logs web ui


 [ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1145:
-

 Target Version/s: 0.23.10, 2.1.1-beta  (was: 2.1.1-beta)
Affects Version/s: 0.23.9
  Summary: Potential file handle leak in aggregated logs web ui  
(was: Potential file handler leak in JobHistoryServer web ui.)

+1 lgtm, will commit this shortly.  I noticed this affects 0.23 as well.


 Potential file handle leak in aggregated logs web ui
 

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-09-04 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757991#comment-13757991
 ] 

Zhijie Shen commented on YARN-696:
--

Thanks, Trevor.

+1, the patch looks good to me.

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
 Attachments: YARN-696.diff, YARN-696.diff, YARN-696.diff, 
 YARN-696.diff, YARN-696.diff


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1146) RM DTSM and RMStateStore mismanage sequence number

Daryn Sharp created YARN-1146:
-

 Summary: RM DTSM and RMStateStore mismanage sequence number
 Key: YARN-1146
 URL: https://issues.apache.org/jira/browse/YARN-1146
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp


{{RMDelegationTokenSecretManager}} implements {{storeNewToken}} and 
{{updateStoredToken}} (renew) to pass the token and its sequence number to 
{{RMStateStore#storeRMDelegationTokenAndSequenceNumber}}.

There are two problems:
# The assumption is that new tokens will be synchronously stored in-order.  
With an async secret manager this may not hold true and the state's sequence 
number may be incorrect.
# A token renewal will reset the state's sequence number to _that token's_ 
sequence number.

Bug #2 is generally masked.  Creating a new token (with the first caveat) will 
bump the state's sequence number back up.  Restoring the dtsm will first set 
the state's stored sequence number, then re-add all the tokens which will 
update the sequence number if the token's sequence number is greater than the 
dtsm's current sequence number.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2013-09-04 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757994#comment-13757994
]

Bikas Saha commented on YARN-1063:
--

Will this be used in secure and non-secure clusters?

I dont think I fully understood the privileges of the launcher. By that do we
mean the TaskTracker/NodeManager or the winutils process that is launched by
the TT/NM. If its the TT/NM then do we end up having a long-running Hadoop
service with elevated privileges?

Winutils needs ability to create task as domain user

Key: YARN-1063
URL: https://issues.apache.org/jira/browse/YARN-1063
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Affects Versions: trunk-win
Environment: Windows
Reporter: Kyle Leckie
Labels: security
Fix For: trunk-win

Attachments: YARN-1063.patch

h1. Summary:
Securing a Hadoop cluster requires constructing some form of security
boundary around the processes executed in YARN containers. Isolation based on
Windows user isolation seems most feasible. This approach is similar to the
approach taken by the existing LinuxContainerExecutor. The current patch to
winutils.exe adds the ability to create a process as a domain user.
h1. Alternative Methods considered:
h2. Process rights limited by security token restriction:
On Windows access decisions are made by examining the security token of a
process. It is possible to spawn a process with a restricted security token.
Any of the rights granted by SIDs of the default token may be restricted. It
is possible to see this in action by examining the security tone of a
sandboxed process launch be a web browser. Typically the launched process
will have a fully restricted token and need to access machine resources
through a dedicated broker process that enforces a custom security policy.
This broker process mechanism would break compatibility with the typical
Hadoop container process. The Container process must be able to utilize
standard function calls for disk and network IO. I performed some work
looking at ways to ACL the local files to the specific launched without
granting rights to other processes launched on the same machine but found
this to be an overly complex solution.
h2. Relying on APP containers:
Recent versions of windows have the ability to launch processes within an
isolated container. Application containers are supported for execution of
WinRT based executables. This method was ruled out due to the lack of
official support for standard windows APIs. At some point in the future
windows may support functionality similar to BSD jails or Linux containers,
at that point support for containers should be added.
h1. Create As User Feature Description:
h2. Usage:
A new sub command was added to the set of task commands. Here is the syntax:
winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
Some notes:
* The username specified is in the format of user@domain
* The machine executing this command must be joined to the domain of the user
specified
* The domain controller must allow the account executing the command access
to the user information. For this join the account to the predefined group
labeled Pre-Windows 2000 Compatible Access
* The account running the command must have several rights on the local
machine. These can be managed manually using secpol.msc:
** Act as part of the operating system - SE_TCB_NAME
** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
* The launched process will not have rights to the desktop so will not be
able to display any information or create UI.
* The launched process will have no network credentials. Any access of
network resources that requires domain authentication will fail.
h2. Implementation:
Winutils performs the following steps:
# Enable the required privileges for the current process.
# Register as a trusted process with the Local Security Authority (LSA).
# Create a new logon for the user passed on the command line.
# Load/Create a profile on the local machine for the new logon.
# Create a new environment for the new logon.
# Launch the new process in a job with the task name specified and using the
created logon.
# Wait for the JOB to exit.
h2. Future work:
The following work was scoped out of this check in:
* Support for non-domain users or machine that are not domain joined.
* Support for privilege isolation by running the task launcher in a high
privilege service with access over an ACLed named pipe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact

[jira] [Created] (YARN-1147) Add end-to-end tests for HA

Karthik Kambatla created YARN-1147:
--

 Summary: Add end-to-end tests for HA
 Key: YARN-1147
 URL: https://issues.apache.org/jira/browse/YARN-1147
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
 Fix For: 2.3.0


While individual sub-tasks add tests for the code they include, it will be 
handy to write end-to-end tests for HA including some stress testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1145) Potential file handle leak in aggregated logs web ui


[ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758007#comment-13758007
 ] 

Vinod Kumar Vavilapalli commented on YARN-1145:
---

Reader.close() calls BCFile.Reader.close() which. Isn't doing anything. Am I 
missing something?

 Potential file handle leak in aggregated logs web ui
 

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1140) Tracking URL is broken in a lots of corner cases, and can be the AM page or the application page depending on the situation

[
https://issues.apache.org/jira/browse/YARN-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758019#comment-13758019
]

Vinod Kumar Vavilapalli commented on YARN-1140:
---

bq. I understand always linking to the per app page to try to make it more
consistent but at the same time I don't like that power users will have to
click one more time.
Yeah, but it's a trade off for consistency. I've seen people struggling when
failures happen and that is a bigger pain than clicking through one more link.

bq. I also don't see how this solves the issue with the tracking url being a
bad link, unless you are also proposing to handle that better on the app page?
if an app finishes and doesn't set the history link (for instance a
non-mapreduce app) or crashes before they can set it, the tracking url link is
still going to to go a bad page.
We should do a better job fixing such bad links, so yeah +1 for YARN-1106 and
the likes. But even without that, this is still fine. Without my proposed
change, users will hit bad links and then have no clue of what happened. With
the change, they'll land up on the app-page, learn *something* about their
apps, and then click the bad link. Net-net, they get more info than they do now.

Tracking URL is broken in a lots of corner cases, and can be the AM page or
the application page depending on the situation
---

Key: YARN-1140
URL: https://issues.apache.org/jira/browse/YARN-1140
Project: Hadoop YARN
Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli

Today, there are so many corner cases, specifically when the AM fails to
start, when users will see that the tracking URL is broken or redirected to
the per-app page. I am thinking of removing the tracking URL completely from
the landing web-page and always force users to first jump on to the
application-page. That way, there is consistency and there will always be one
page that users can go to for their app information and then subsequently
navigate to the AM page if all went well.
Thoughts?

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

2013-09-04 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758037#comment-13758037
]

Bikas Saha commented on YARN-540:
-

1) or 2) are basically the same thing. 1) will block the unregister call until
it succeeds. 2) requires the AM to keep looping on unregister until it
succeeds. 2) just enables the RM to make the store operation asynchronously and
prevent RPC threads from getting blocked.
The core issue is that the RM can crash before removing the app from the store.
Thus when it restarts it thinks that the app is still running and tries to
re-launch it. This is the core issue in this jira and should be a rare event.
The MR app master sleeps for 5s before unregistering with the RM and reports
success meanwhile to the client. This exacerbates the above rare issue and
makes it possible to repro it more often.

Race condition causing RM to potentially relaunch already unregistered AMs on
RM restart

Key: YARN-540
URL: https://issues.apache.org/jira/browse/YARN-540
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch,
YARN-540.patch

When job succeeds and successfully call finishApplicationMaster, RM shutdown
and restart-dispatcher is stopped before it can process REMOVE_APP event. The
next time RM comes back, it will reload the existing state files even though
the job is succeeded

[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.

[
https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758044#comment-13758044
]

Omkar Vinit Joshi commented on YARN-957:

Thanks vinod. addressed the comments.
bq. Use Resource.newInstance instead of RecordFactory.
Fixed

bq. The Log message in LeafQueue should be at WARN level
fixed

bq. The test looks good, but let's not have hard-coded waits like the following
in the test
Yes changed it.

Capacity Scheduler tries to reserve the memory more than what node manager
reports.
---

Key: YARN-957
URL: https://issues.apache.org/jira/browse/YARN-957
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch,
YARN-957-20130730.3.patch, YARN-957-20130731.1.patch,
YARN-957-20130830.1.patch, YARN-957-20130904.1.patch

I have 2 node managers.
* one with 1024 MB memory.(nm1)
* second with 2048 MB memory.(nm2)
I am submitting simple map reduce application with 1 mapper and one reducer
with 1024mb each. The steps to reproduce this are
* stop nm2 with 2048MB memory.( This I am doing to make sure that this node's
heartbeat doesn't reach RM first).
* now submit application. As soon as it receives first node's (nm1) heartbeat
it will try to reserve memory for AM-container (2048MB). However it has only
1024MB of memory.
* now start nm2 with 2048 MB memory.
It hangs forever... Ideally this has two potential issues.
* It should not try to reserve memory on a node manager which is never going
to give requested memory. i.e. Current max capability of node manager is
1024MB but 2048MB is reserved on it. But it still does that.
* Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available
memory. In this case if the original request was made without any locality
then scheduler should unreserve memory on nm1 and allocate requested 2048MB
container on nm2.

[jira] [Updated] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.

[
https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Omkar Vinit Joshi updated YARN-957:
---

Attachment: YARN-957-20130904.1.patch

Capacity Scheduler tries to reserve the memory more than what node manager
reports.
---

[jira] [Commented] (YARN-1145) Potential file handle leak in aggregated logs web ui


[ 
https://issues.apache.org/jira/browse/YARN-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758047#comment-13758047
 ] 

Jason Lowe commented on YARN-1145:
--

I think the logic behind calling close on the TFile.Reader is consistency -- if 
an object has a close() method, probably prudent to call as it may not always 
do nothing in the future.

The real fix with this patch is in AggregatedLogsBlock which will call close() 
on the LogReader which will in turn close the data stream and release the 
associated socket.

 Potential file handle leak in aggregated logs web ui
 

 Key: YARN-1145
 URL: https://issues.apache.org/jira/browse/YARN-1145
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha, 0.23.9, 2.1.1-beta
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: MAPREDUCE-5486.patch


 Any problem in getting aggregated logs for rendering on web ui, then 
 LogReader is not closed. 
 Now, it reader is not closed which causing many connections in close_wait 
 state.
 hadoopuser@hadoopuser: jps
 *27909* JobHistoryServer
 DataNode port is 50010. When greped with DataNode port, many connections are 
 in CLOSE_WAIT from JHS.
 hadoopuser@hadoopuser: netstat -tanlp |grep 50010
 tcp0  0 10.18.40.48:50010   0.0.0.0:*   LISTEN
   21453/java  
 tcp1  0 10.18.40.48:20596   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19667   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:20593   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:12290   10.18.40.48:50010   
 CLOSE_WAIT  *27909*/java  
 tcp1  0 10.18.40.48:19662   10.18.40.152:50010  
 CLOSE_WAIT  *27909*/java  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758096#comment-13758096
 ] 

Jason Lowe commented on YARN-540:
-

Yes, I realize that 1) and 2) are at a high level accomplishing the same thing. 
 However 2) requires cooperation from the AM which is user code and therefore 
harder to control while 1) does not.  There is the issue of RPC threads getting 
blocked which may necessitate 2), but otherwise 1) would be preferable since it 
requires less cooperation/coordination with the AMs.

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch, 
 YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.


[ 
https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758115#comment-13758115
 ] 

Hadoop QA commented on YARN-957:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12601417/YARN-957-20130904.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1833//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1833//console

This message is automatically generated.

 Capacity Scheduler tries to reserve the memory more than what node manager 
 reports.
 ---

 Key: YARN-957
 URL: https://issues.apache.org/jira/browse/YARN-957
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, 
 YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, 
 YARN-957-20130830.1.patch, YARN-957-20130904.1.patch


 I have 2 node managers.
 * one with 1024 MB memory.(nm1)
 * second with 2048 MB memory.(nm2)
 I am submitting simple map reduce application with 1 mapper and one reducer 
 with 1024mb each. The steps to reproduce this are
 * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's 
 heartbeat doesn't reach RM first).
 * now submit application. As soon as it receives first node's (nm1) heartbeat 
 it will try to reserve memory for AM-container (2048MB). However it has only 
 1024MB of memory.
 * now start nm2 with 2048 MB memory.
 It hangs forever... Ideally this has two potential issues.
 * It should not try to reserve memory on a node manager which is never going 
 to give requested memory. i.e. Current max capability of node manager is 
 1024MB but 2048MB is reserved on it. But it still does that.
 * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
 memory. In this case if the original request was made without any locality 
 then scheduler should unreserve memory on nm1 and allocate requested 2048MB 
 container on nm2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken

[
https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758209#comment-13758209
]

Jason Lowe commented on YARN-707:
-

Thanks for the review, Daryn.

bq. ClientToAMTokenIdentifier#getUser() doesn't do a null check on the client
name (because it can't be null) but should perhaps still check isEmpty()?

Will fix that.

bq. Is ResourceManager#clientToAMSecretManager still needed now that it's in
the context?

Technically no, but all the other pieces of the context are also fields of
ResourceManager so it's consistent with those.

bq. Now that the client token is generated in RMAppAttemptImpl - should it
contain the attemptId, not the appId?

The original client tokens in 0.23 were per-app and not per-app-attempt, and I
didn't want to change that association as part of this change.

Add user info in the YARN ClientToken
-

Key: YARN-707
URL: https://issues.apache.org/jira/browse/YARN-707
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Jason Lowe
Priority: Blocker
Fix For: 3.0.0, 2.1.1-beta

Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt,
YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt,
YARN-707-20130830.branch-0.23.txt

If user info is present in the client token then it can be used to do limited
authz in the AM.

[jira] [Updated] (YARN-707) Add user info in the YARN ClientToken


 [ 
https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-707:


Attachment: YARN-707-20130904.branch-0.23.txt

Updated patch for branch-0.23 to add isEmpty() check on client token username.

 Add user info in the YARN ClientToken
 -

 Key: YARN-707
 URL: https://issues.apache.org/jira/browse/YARN-707
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, 
 YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, 
 YARN-707-20130830.branch-0.23.txt, YARN-707-20130904.branch-0.23.txt


 If user info is present in the client token then it can be used to do limited 
 authz in the AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1065) NM should provide AuxillaryService data to the container


[ 
https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758231#comment-13758231
 ] 

Hadoop QA commented on YARN-1065:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601436/YARN-1065.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1834//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1834//console

This message is automatically generated.

 NM should provide AuxillaryService data to the container
 

 Key: YARN-1065
 URL: https://issues.apache.org/jira/browse/YARN-1065
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1065.1.patch, YARN-1065.2.patch, YARN-1065.3.patch, 
 YARN-1065.4.patch, YARN-1065.5.patch, YARN-1065.6.patch, YARN-1065.7.patch, 
 YARN-1065.8.patch


 Start container returns auxillary service data to the AM but does not provide 
 the same information to the task itself. It could add that information to the 
 container env with key=service_name and value=service_data. This allows the 
 container to start using the service without having to depend on the AM to 
 send the info to it indirectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1001) YARN should provide per application-type and state statistics

2013-09-04 Thread Srimanth Gunturi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758182#comment-13758182
 ] 

Srimanth Gunturi commented on YARN-1001:


[~zjshen], we are expecting {{/ws/v1/cluster/appscount}} to provide all 
app-types/state-counts in 1 call.
We are expecting {{/ws/v1/cluster/appscount?types=mapreduce}} to provide all 
mapreduce state-counts in 1 call.

Apart from that, we need {{/ws/v1/cluster/appscount}} information pushed to 
Ganglia. Or else Ambari will not be able to show various graphs which are 
important. We currently populate 
{{/etc/hadoop/conf/hadoop-metrics2.properties}} file telling RM to push to 
Ganglia (resourcemanager.sink.ganglia.servers). 



 YARN should provide per application-type and state statistics
 -

 Key: YARN-1001
 URL: https://issues.apache.org/jira/browse/YARN-1001
 Project: Hadoop YARN
  Issue Type: Task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
 Attachments: YARN-1001.1.patch


 In Ambari we plan to show for MR2 the number of applications finished, 
 running, waiting, etc. It would be efficient if YARN could provide per 
 application-type and state aggregated counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1146) RM DTSM and RMStateStore mismanage sequence number


[ 
https://issues.apache.org/jira/browse/YARN-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758184#comment-13758184
 ] 

Daryn Sharp commented on YARN-1146:
---

[~vinodkv] I'm desynch'ing the ADTSM on HADOOP-9930.  Is it ok for me to 
exasperate this seq number handling?

 RM DTSM and RMStateStore mismanage sequence number
 --

 Key: YARN-1146
 URL: https://issues.apache.org/jira/browse/YARN-1146
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp

 {{RMDelegationTokenSecretManager}} implements {{storeNewToken}} and 
 {{updateStoredToken}} (renew) to pass the token and its sequence number to 
 {{RMStateStore#storeRMDelegationTokenAndSequenceNumber}}.
 There are two problems:
 # The assumption is that new tokens will be synchronously stored in-order.  
 With an async secret manager this may not hold true and the state's sequence 
 number may be incorrect.
 # A token renewal will reset the state's sequence number to _that token's_ 
 sequence number.
 Bug #2 is generally masked.  Creating a new token (with the first caveat) 
 will bump the state's sequence number back up.  Restoring the dtsm will first 
 set the state's stored sequence number, then re-add all the tokens which will 
 update the sequence number if the token's sequence number is greater than the 
 dtsm's current sequence number.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1065) NM should provide AuxillaryService data to the container


 [ 
https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1065:


Attachment: YARN-1065.8.patch

create function getPrefixServiceName in AuxiliaryServiceHelper to eliminate 
duplicate code

 NM should provide AuxillaryService data to the container
 

 Key: YARN-1065
 URL: https://issues.apache.org/jira/browse/YARN-1065
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1065.1.patch, YARN-1065.2.patch, YARN-1065.3.patch, 
 YARN-1065.4.patch, YARN-1065.5.patch, YARN-1065.6.patch, YARN-1065.7.patch, 
 YARN-1065.8.patch


 Start container returns auxillary service data to the AM but does not provide 
 the same information to the task itself. It could add that information to the 
 container env with key=service_name and value=service_data. This allows the 
 container to start using the service without having to depend on the AM to 
 send the info to it indirectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.

[
https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated YARN-957:
-

Attachment: YARN-957-20130904.2.patch

Same patch with trailing white spaces removed. Will commit when Jenkins says
okay.

Capacity Scheduler tries to reserve the memory more than what node manager
reports.
---

[jira] [Updated] (YARN-1119) Add ClusterMetrics checks to tho TestRMNodeTransitions tests

2013-09-04 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1119:


Attachment: YARN-1119-v1-b23.patch

Patch posted for branch 0.23.

 Add ClusterMetrics checks to tho TestRMNodeTransitions tests
 

 Key: YARN-1119
 URL: https://issues.apache.org/jira/browse/YARN-1119
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 3.0.0, 0.23.9, 2.0.6-alpha
Reporter: Robert Parker
 Attachments: YARN-1119-v1-b23.patch


 YARN-1101 identified an issue where UNHEALTHY nodes could double decrement 
 the active nodes. We should add checks for RUNNING node transitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.


[ 
https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758268#comment-13758268
 ] 

Hadoop QA commented on YARN-957:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12601443/YARN-957-20130904.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1835//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1835//console

This message is automatically generated.

 Capacity Scheduler tries to reserve the memory more than what node manager 
 reports.
 ---

 Key: YARN-957
 URL: https://issues.apache.org/jira/browse/YARN-957
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch, 
 YARN-957-20130730.3.patch, YARN-957-20130731.1.patch, 
 YARN-957-20130830.1.patch, YARN-957-20130904.1.patch, 
 YARN-957-20130904.2.patch


 I have 2 node managers.
 * one with 1024 MB memory.(nm1)
 * second with 2048 MB memory.(nm2)
 I am submitting simple map reduce application with 1 mapper and one reducer 
 with 1024mb each. The steps to reproduce this are
 * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's 
 heartbeat doesn't reach RM first).
 * now submit application. As soon as it receives first node's (nm1) heartbeat 
 it will try to reserve memory for AM-container (2048MB). However it has only 
 1024MB of memory.
 * now start nm2 with 2048 MB memory.
 It hangs forever... Ideally this has two potential issues.
 * It should not try to reserve memory on a node manager which is never going 
 to give requested memory. i.e. Current max capability of node manager is 
 1024MB but 2048MB is reserved on it. But it still does that.
 * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
 memory. In this case if the original request was made without any locality 
 then scheduler should unreserve memory on nm1 and allocate requested 2048MB 
 container on nm2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1068) Add admin support for HA operations


 [ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1068:
---

Summary: Add admin support for HA operations  (was: Implement YarnHAAdmin 
for HA specific admin operations)

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 Implement YarnHAAdmin along the lines of DFSHAAdmin for HA-specific admin 
 operations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1068) Add admin support for HA operations


 [ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1068:
---

Target Version/s: 2.3.0  (was: 2.1.1-beta)

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 To transitionTo{Active,Standby} etc. we should support admin operations the 
 same way DFS does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart


[ 
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758303#comment-13758303
 ] 

Omkar Vinit Joshi commented on YARN-1107:
-

Thanks vinod..
bq. In DelegationTokenRenewer, you are breaking the following assumption, which 
was put in via YARN-280

yeah fixed it.

bq. Leave a comment in DelegationTokenRenewer.serviceStart() as to what we are 
really doing w.r.t pendingTokenForRenewal.
Yes added one.

bq. Not just in the test-code, can you move the token-short-circuit setting 
from ClientRMService into DelegationTokenRenewer?

fixed. moved the code from ClientRMService to DelegationTokenRenewer.

 Job submitted with Delegation token in secured environment causes RM to fail 
 during RM restart
 --

 Key: YARN-1107
 URL: https://issues.apache.org/jira/browse/YARN-1107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Attachments: rm.log, YARN-1107.20130828.1.patch, 
 YARN-1107.20130829.1.patch


 If secure RM with recovery enabled is restarted while oozie jobs are running 
 rm fails to come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1148) NM should only return requested auxillary service data to the AM

Xuan Gong created YARN-1148:
---

 Summary: NM should only return requested auxillary service data to 
the AM
 Key: YARN-1148
 URL: https://issues.apache.org/jira/browse/YARN-1148
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Xuan Gong


Right now, Start container returns all auxillary service data to the AM. AM can 
set request through ContainerLauchContext, and NM should only return the 
request auxillary service data to AM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1148) NM should only return requested auxillary service data to the AM


 [ 
https://issues.apache.org/jira/browse/YARN-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1148:
---

Assignee: Xuan Gong

 NM should only return requested auxillary service data to the AM
 

 Key: YARN-1148
 URL: https://issues.apache.org/jira/browse/YARN-1148
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Xuan Gong
Assignee: Xuan Gong

 Right now, Start container returns all auxillary service data to the AM. AM 
 can set request through ContainerLauchContext, and NM should only return the 
 request auxillary service data to AM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1065) NM should provide AuxillaryService data to the container


[ 
https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758348#comment-13758348
 ] 

Hudson commented on YARN-1065:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4367 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4367/])
YARN-1065. NM should provide AuxillaryService data to the container (Xuan Gong 
via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1520135)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AuxiliaryServiceHelper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


 NM should provide AuxillaryService data to the container
 

 Key: YARN-1065
 URL: https://issues.apache.org/jira/browse/YARN-1065
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1065.1.patch, YARN-1065.2.patch, YARN-1065.3.patch, 
 YARN-1065.4.patch, YARN-1065.5.patch, YARN-1065.6.patch, YARN-1065.7.patch, 
 YARN-1065.8.patch


 Start container returns auxillary service data to the AM but does not provide 
 the same information to the task itself. It could add that information to the 
 container env with key=service_name and value=service_data. This allows the 
 container to start using the service without having to depend on the AM to 
 send the info to it indirectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken


[ 
https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758375#comment-13758375
 ] 

Daryn Sharp commented on YARN-707:
--

+1 Looks good enough to me.

 Add user info in the YARN ClientToken
 -

 Key: YARN-707
 URL: https://issues.apache.org/jira/browse/YARN-707
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, 
 YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, 
 YARN-707-20130830.branch-0.23.txt, YARN-707-20130904.branch-0.23.txt


 If user info is present in the client token then it can be used to do limited 
 authz in the AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1098) Separate out RM services into Always On and Active

2013-09-04 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758308#comment-13758308
]

Bikas Saha commented on YARN-1098:
--

Not a big fan of anonymous classes. We should probably create an ActiveServices
that extends CompositeService. We can later add transitionToActive() and
transitionToStandby() method to this object.

Dispatcher can actually also go into ActiveServices for now. We can move it
into the main service later on because it looks like that the HAProtocol
service will be the only always on service to start with.

Jenkins is not happy with the patch.

Separate out RM services into Always On and Active
--

Key: YARN-1098
URL: https://issues.apache.org/jira/browse/YARN-1098
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Labels: ha
Attachments: yarn-1098-1.patch, yarn-1098-approach.patch,
yarn-1098-approach.patch

From discussion on YARN-1027, it makes sense to separate out services that
are stateful and stateless. The stateless services can run perennially
irrespective of whether the RM is in Active/Standby state, while the stateful
services need to be started on transitionToActive() and completely shutdown
on transitionToStandby().
The external-facing stateless services should respond to the client/AM/NM
requests depending on whether the RM is Active/Standby.

[jira] [Updated] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1107:


Attachment: YARN-1107.20130904.1.patch

 Job submitted with Delegation token in secured environment causes RM to fail 
 during RM restart
 --

 Key: YARN-1107
 URL: https://issues.apache.org/jira/browse/YARN-1107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Attachments: rm.log, YARN-1107.20130828.1.patch, 
 YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch


 If secure RM with recovery enabled is restarted while oozie jobs are running 
 rm fails to come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-09-04 Thread Ramya Sunil (JIRA)

Ramya Sunil created YARN-1149:
-

 Summary: NM throws InvalidStateTransitonException: Invalid event: 
APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
 Fix For: 2.1.1-beta


When nodemanager receives a kill signal when an application has finished 
execution but log aggregation has not kicked in, 
InvalidStateTransitonException: Invalid event: 
APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown

{noformat}
2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
finished : application_1377459190746_0118
2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
log-file for app application_1377459190746_0118 at 
/app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
(LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to 
complete for application_1377459190746_0118
2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
container container_1377459190746_0118_01_04. Current good log dirs are 
/tmp/yarn/local
2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
log-file for app application_1377459190746_0118
2013-08-25 20:45:00,925 WARN  application.Application 
(ApplicationImpl.java:handle(427)) - Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
APPLICATION_LOG_HANDLING_FINISHED at RUNNING
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
at java.lang.Thread.run(Thread.java:662)
2013-08-25 20:45:00,926 INFO  application.Application 
(ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 
transitioned from RUNNING to null
2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(463)) - 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 is interrupted. Exiting.
2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
server on 8040
{noformat}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-890) The roundup for memory values on resource manager UI is misleading


 [ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-890:
--

Assignee: Xuan Gong

 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1107) Job submitted with Delegation token in secured environment causes RM to fail during RM restart


[ 
https://issues.apache.org/jira/browse/YARN-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758419#comment-13758419
 ] 

Hadoop QA commented on YARN-1107:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12601476/YARN-1107.20130904.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1836//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1836//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1836//console

This message is automatically generated.

 Job submitted with Delegation token in secured environment causes RM to fail 
 during RM restart
 --

 Key: YARN-1107
 URL: https://issues.apache.org/jira/browse/YARN-1107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Omkar Vinit Joshi
Priority: Blocker
 Fix For: 2.1.1-beta

 Attachments: rm.log, YARN-1107.20130828.1.patch, 
 YARN-1107.20130829.1.patch, YARN-1107.20130904.1.patch


 If secure RM with recovery enabled is restarted while oozie jobs are running 
 rm fails to come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-707) Add user info in the YARN ClientToken


 [ 
https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-707:


Fix Version/s: 0.23.10

I committed this to branch-0.23.

 Add user info in the YARN ClientToken
 -

 Key: YARN-707
 URL: https://issues.apache.org/jira/browse/YARN-707
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 3.0.0, 0.23.10, 2.1.1-beta

 Attachments: YARN-707-20130822.txt, YARN-707-20130827.txt, 
 YARN-707-20130828-2.txt, YARN-707-20130828.txt, YARN-707-20130829.txt, 
 YARN-707-20130830.branch-0.23.txt, YARN-707-20130904.branch-0.23.txt


 If user info is present in the client token then it can be used to do limited 
 authz in the AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1068) Add admin support for HA operations


 [ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1068:
---

Attachment: yarn-1068-prelim.patch

I am uploading a preliminary patch adds admin support for HA operations for any 
feedback on the approach.

The patch is very much along the lines of HDFS admin implementation and reuses 
the common code for the same. Outline:
# RMHAProtocolService starts an RPC server for HA commands.
# yarn rmhaadmin command invokes RMHAdminCLI which extends HAAdmin

I haven't figured out how to use the ClientRMProxy while using HAAdmin yet. 
Would love to hear any thoughts/inputs on that.

Pending tasks: (1) yarn-site, (2) RPC server instantiation through YARNRPC 
calls like in AdminService.

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1068-prelim.patch


 To transitionTo{Active,Standby} etc. we should support admin operations the 
 same way DFS does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1134) Add support for zipping/unzipping logs while in transit for the NM logs web-service

2013-09-04 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758494#comment-13758494
]

Chris Nauroth commented on YARN-1134:
-

Is the intent to serve actual compressed files (i.e. it has a .gz extension),
or is the intent to layer compression over the HTTP transfer (i.e. the
Transfer-Encoding: gzip HTTP header). The original comment about how it will
take a long time to download makes me think the latter is appropriate.

Add support for zipping/unzipping logs while in transit for the NM logs
web-service
---

Key: YARN-1134
URL: https://issues.apache.org/jira/browse/YARN-1134
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli

As [~zjshen] pointed out at
[YARN-649|https://issues.apache.org/jira/browse/YARN-649?focusedCommentId=13698415page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13698415],
{quote}
For the long running applications, they may have a big log file, such that it
will take a long time to download the log file via the RESTful API.
Consequently, HTTP connection may timeout before downloading before
downloading a complete log file. Maybe it is good to zip the log file before
sending it, and unzip it after receiving it.
{quote}

[jira] [Commented] (YARN-1134) Add support for zipping/unzipping logs while in transit for the NM logs web-service

2013-09-04 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758497#comment-13758497
 ] 

Chris Nauroth commented on YARN-1134:
-

Also, if the latter is appropriate, then you may want to test the existing code 
by sending an HTTP request with the header Accept-Encoding: gzip to see if 
the response comes back compressed.  Many web servers support this out of the 
box, though I'm not sure about Jetty specifically.

 Add support for zipping/unzipping logs while in transit for the NM logs 
 web-service
 ---

 Key: YARN-1134
 URL: https://issues.apache.org/jira/browse/YARN-1134
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli

 As [~zjshen] pointed out at 
 [YARN-649|https://issues.apache.org/jira/browse/YARN-649?focusedCommentId=13698415page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13698415],
 {quote}
 For the long running applications, they may have a big log file, such that it 
 will take a long time to download the log file via the RESTful API. 
 Consequently, HTTP connection may timeout before downloading before 
 downloading a complete log file. Maybe it is good to zip the log file before 
 sending it, and unzip it after receiving it.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1098) Separate out RM services into Always On and Active


[ 
https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758499#comment-13758499
 ] 

Karthik Kambatla commented on YARN-1098:


Thanks Bikas. Will create an ActiveServices inner class and move everything 
there. TestRMRestart#testAppAttemptTokensRestoredOnRMRestart is flakey with the 
patch - will investigate further.

 Separate out RM services into Always On and Active
 --

 Key: YARN-1098
 URL: https://issues.apache.org/jira/browse/YARN-1098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1098-1.patch, yarn-1098-approach.patch, 
 yarn-1098-approach.patch


 From discussion on YARN-1027, it makes sense to separate out services that 
 are stateful and stateless. The stateless services can  run perennially 
 irrespective of whether the RM is in Active/Standby state, while the stateful 
 services need to  be started on transitionToActive() and completely shutdown 
 on transitionToStandby().
 The external-facing stateless services should respond to the client/AM/NM 
 requests depending on whether the RM is Active/Standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading


[ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758585#comment-13758585
 ] 

Xuan Gong commented on YARN-890:


When we get totalMemory from ClusterMetricInfo, it had already been rounded up. 
So, use clusterResource from ResourceScheduler to get real memory from Cluster.

 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-890) The roundup for memory values on resource manager UI is misleading


 [ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-890:
---

Attachment: YARN-890.1.patch

 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


 [ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1149:
---

Assignee: Xuan Gong

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.1-beta


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

2013-09-04 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758611#comment-13758611
 ] 

Jian He commented on YARN-540:
--

Finally come to the conclusion that removeApplicationState immediately after 
attempt unregister. This combined with MAPREDUCE-5497 can significantly reduce 
the race here. Once work-preserving restart is implemented, this jira should 
not be a problem as there's no notion of relaunching a new AM in 
work-preserving restart, the old AM will just spin and resync with RM after RM 
restarts.

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.patch, 
 YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

2013-09-04 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-540:
-

Attachment: YARN-540.3.patch

upload a new patch that

- removeApplicationState in RMAppAttempt.AMUnregisteredTransistion and 
RMApp.FinalTransition
- rename RMAppEventType.ATTEMPT_FINISHING to ATTEMPT_UNREGISTERED

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-09-04 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhijie Shen updated YARN-1070:
--

Attachment: YARN-1070.3.patch

Thanks Vinod for your review. I've updated the patch accordingly. The important
change in this patch is that I removed the logic of canceling
ContainerLaunch.call(), and in call(), I checked the container state first,
returned immediately if the container is not at LOCALIZED, and send
CONTAINER_KILLED_ON_REQUEST if necessary.

The rationale of checking the container state is that the thread of
ContainerLaunch.call() is scheduled and should be executed after the container
enters LOCALIZED. As this thread can run parallel with the thread of
ContainerImpl, the container is free to move on to some other state, which can
be either RUNNING, EXIT_WITH_FAILURE or KILLING. The first two should be
triggered by the event send from ContainerLaunch.call(), while KILLING is
caused by a kill event.

Therefore, when ContainerLaunch.call() is started, we check the container
state. If it is KILLING, ContainerLaunch.call() can stop immediately, which is
equivalent to the cancel operation which is removed in ContainersLauncher.
Actually, it should even be better, because Future.cancel will not terminate
call() immediately.

On the other side, if at this point the container state is still LOCALIZED,
call() will move on. Then, if the container state changes to KILLING in the
midway, we just ignore it let call() finish as usual. It does no harm because
when the container reaches KILLING, CLEANUP_CONTAINER is scheduled or is
started.

ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at
CONTAINER_CLEANEDUP_AFTER_KILL
-

Key: YARN-1070
URL: https://issues.apache.org/jira/browse/YARN-1070
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch

[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


 [ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1149:


Attachment: YARN-1149.1.patch

Add a new AppShutDownTransition to handle 
ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED.

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-1149.1.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL


[ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758637#comment-13758637
 ] 

Hadoop QA commented on YARN-1070:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601531/YARN-1070.3.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1840//console

This message is automatically generated.

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-957) Capacity Scheduler tries to reserve the memory more than what node manager reports.

[
https://issues.apache.org/jira/browse/YARN-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758642#comment-13758642
]

Hudson commented on YARN-957:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4369 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/4369/])
YARN-957. Fixed a bug in CapacityScheduler because of which requests that need
more than a node's total capability were incorrectly allocated on that node
causing apps to hang. Contributed by Omkar Vinit Joshi. (vinodkv:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1520187)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java

Capacity Scheduler tries to reserve the memory more than what node manager
reports.
---

Key: YARN-957
URL: https://issues.apache.org/jira/browse/YARN-957
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
Fix For: 2.1.1-beta

Attachments: YARN-957-20130730.1.patch, YARN-957-20130730.2.patch,
YARN-957-20130730.3.patch, YARN-957-20130731.1.patch,
YARN-957-20130830.1.patch, YARN-957-20130904.1.patch,
YARN-957-20130904.2.patch

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart


[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758651#comment-13758651
 ] 

Hadoop QA commented on YARN-540:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12601527/YARN-540.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1838//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1838//console

This message is automatically generated.

 Race condition causing RM to potentially relaunch already unregistered AMs on 
 RM restart
 

 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
 YARN-540.patch, YARN-540.patch


 When job succeeds and successfully call finishApplicationMaster, RM shutdown 
 and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
 next time RM comes back, it will reload the existing state files even though 
 the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

[
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13758666#comment-13758666
]

Vinod Kumar Vavilapalli commented on YARN-1070:
---

The argument is reasonable.

bq. On the other side, if at this point the container state is still LOCALIZED,
call() will move on. Then, if the container state changes to KILLING in the
midway, we just ignore it let call() finish as usual. It does no harm because
when the container reaches KILLING, CLEANUP_CONTAINER is scheduled or is
started.
We do have one more check just before we launch the process. We should do the
same stack-check there too.

Also, as part of ContainerLaunch.cleanupContainer(), we should try to cancel
the Callable.

Taking a step back, this approach will work, though the code is hard to read
for me. A very simple state machine should make this code a lot cleaner.

ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at
CONTAINER_CLEANEDUP_AFTER_KILL
-

[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt