[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-14 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602080#comment-13602080
 ] 

Vinod Kumar Vavilapalli commented on YARN-71:
-

Some more comments:
 - In case of errors, can you say  need to be manually deleted instead of 
just  need to be deleted?
 - Please add tests to
-- verify fileCache and NM_PRIVATE_DIR deletion
-- verify deleteHistoricalLocalDirs by rebooting NM multiple times when a 
previous deletion was in progress

 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-141) NodeManager shuts down if it can't find the ResourceManager

2013-03-14 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-141.
--

Resolution: Duplicate

Duplicated by YARN-196.

 NodeManager shuts down if it can't find the ResourceManager
 ---

 Key: YARN-141
 URL: https://issues.apache.org/jira/browse/YARN-141
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan

 When starting yarn services, and if the NodeManager is started but the 
 ResourceManager is not. The NodeManager tries 10 times (the default setting), 
 and then shuts down.
 I understand that this default setting can be changed and possibly wait for a 
 longer period, but I think it is better to keep the NodeManager trying 
 without shutting down. This can accomodate cases where the ResourceManager is 
 late to start for any problems and it will preserve the same behavior to what 
 the DataNode does when it cannot find the NameNode at startup as it keeps 
 trying without shutting down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602139#comment-13602139
 ] 

Zhijie Shen commented on YARN-378:
--

{quote}
Env vars are brittle..
{quote}

Does the env method work with other applications? I can see the merit of 
embedding maxAppAttempts into AM registration response is that the number can 
be read by AM of other applications in the same way. I think we can begin to 
discuss the issue related to informing AM of maxAppAttempts in MAPREDUCE-5062.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602154#comment-13602154
 ] 

Hadoop QA commented on YARN-378:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573698/YARN-378_7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/512//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/512//console

This message is automatically generated.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-478) fix coverage org.apache.hadoop.yarn.webapp.log

2013-03-14 Thread Aleksey Gorshkov (JIRA)
Aleksey Gorshkov created YARN-478:
-

 Summary: fix coverage org.apache.hadoop.yarn.webapp.log
 Key: YARN-478
 URL: https://issues.apache.org/jira/browse/YARN-478
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-478-trunk.patch

fix coverage org.apache.hadoop.yarn.webapp.log

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.webapp.log

2013-03-14 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-468:
--

Summary: coverage fix for org.apache.hadoop.yarn.webapp.log  (was: coverage 
fix for org.apache.hadoop.yarn.server.webproxy.amfilter )

 coverage fix for org.apache.hadoop.yarn.webapp.log
 --

 Key: YARN-468
 URL: https://issues.apache.org/jira/browse/YARN-468
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-468-trunk.patch


 coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter 
 patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.webapp.log

2013-03-14 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-468:
--

Description: 
coverage fix for org.apache.hadoop.yarn.webapp.log

patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23

  was:
coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter 

patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23


 coverage fix for org.apache.hadoop.yarn.webapp.log
 --

 Key: YARN-468
 URL: https://issues.apache.org/jira/browse/YARN-468
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-468-trunk.patch


 coverage fix for org.apache.hadoop.yarn.webapp.log
 patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter

2013-03-14 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-468:
--

Summary: coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter   
(was: coverage fix for org.apache.hadoop.yarn.webapp.log)

 coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter 
 -

 Key: YARN-468
 URL: https://issues.apache.org/jira/browse/YARN-468
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-468-trunk.patch


 coverage fix for org.apache.hadoop.yarn.webapp.log
 patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter

2013-03-14 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-468:
--

Description: 
coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter

patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23

  was:
coverage fix for org.apache.hadoop.yarn.webapp.log

patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23


 coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter 
 -

 Key: YARN-468
 URL: https://issues.apache.org/jira/browse/YARN-468
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-468-trunk.patch


 coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter
 patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter

2013-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602307#comment-13602307
 ] 

Hadoop QA commented on YARN-468:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573308/YARN-468-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/513//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/513//console

This message is automatically generated.

 coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter 
 -

 Key: YARN-468
 URL: https://issues.apache.org/jira/browse/YARN-468
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-468-trunk.patch


 coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter
 patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-478) fix coverage org.apache.hadoop.yarn.webapp.log

2013-03-14 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-478:
--

Attachment: (was: YARN-478-trunk.patch)

 fix coverage org.apache.hadoop.yarn.webapp.log
 --

 Key: YARN-478
 URL: https://issues.apache.org/jira/browse/YARN-478
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-478-trunk.patch


 fix coverage org.apache.hadoop.yarn.webapp.log
 one patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-478) fix coverage org.apache.hadoop.yarn.webapp.log

2013-03-14 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-478:
--

Attachment: YARN-478-trunk.patch

 fix coverage org.apache.hadoop.yarn.webapp.log
 --

 Key: YARN-478
 URL: https://issues.apache.org/jira/browse/YARN-478
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-478-trunk.patch


 fix coverage org.apache.hadoop.yarn.webapp.log
 one patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-478) fix coverage org.apache.hadoop.yarn.webapp.log

2013-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602319#comment-13602319
 ] 

Hadoop QA commented on YARN-478:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573718/YARN-478-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/514//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/514//console

This message is automatically generated.

 fix coverage org.apache.hadoop.yarn.webapp.log
 --

 Key: YARN-478
 URL: https://issues.apache.org/jira/browse/YARN-478
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Attachments: YARN-478-trunk.patch


 fix coverage org.apache.hadoop.yarn.webapp.log
 one patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-468) coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter

2013-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602326#comment-13602326
 ] 

Hudson commented on YARN-468:
-

Integrated in Hadoop-trunk-Commit #3469 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3469/])
YARN-468. coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter 
(Aleksey Gorshkov via bobby) (Revision 1456458)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1456458
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java


 coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter 
 -

 Key: YARN-468
 URL: https://issues.apache.org/jira/browse/YARN-468
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
 Fix For: 3.0.0, 0.23.7, 2.0.5-beta

 Attachments: YARN-468-trunk.patch


 coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter
 patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602333#comment-13602333
 ] 

Robert Joseph Evans commented on YARN-378:
--

Using the environment variables works for other applications too.  That is the 
only way to get some pieces of critical information that are needed for 
registration with the RM.  

On Windows there are limits 

http://msdn.microsoft.com/en-us/library/windows/desktop/ms682653%28v=vs.85%29.aspx

But they should not cause too much of an issue on Windows Server 2008 and above.

I would prefer for us to only return the information to the AM one way.  Either 
though thrift or through the environment variable just so there is less 
confusion, but I am not adamant about it.



 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602341#comment-13602341
 ] 

Robert Joseph Evans commented on YARN-378:
--

Looking at the code too I am fine with renaming retries to attempts.  But we 
need to mark this JIRA as an incompatible change or put in a deprecated config 
mapping.  We are early enough in YARN that deprecating it seems like a waste.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables

2013-03-14 Thread jian he (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602435#comment-13602435
 ] 

jian he commented on YARN-237:
--

Thanks, Robert !

 Refreshing the RM page forgets how many rows I had in my Datatables
 ---

 Key: YARN-237
 URL: https://issues.apache.org/jira/browse/YARN-237
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0
Reporter: Ravi Prakash
Assignee: jian he
  Labels: usability
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: YARN-237.patch, YARN-237.v2.patch, YARN-237.v3.patch, 
 YARN-237.v4.patch


 If I choose a 100 rows, and then refresh the page, DataTables goes back to 
 showing me 20 rows.
 This user preference should be stored in a cookie.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables

2013-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602437#comment-13602437
 ] 

Hudson commented on YARN-237:
-

Integrated in Hadoop-trunk-Commit #3471 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3471/])
YARN-237. Refreshing the RM page forgets how many rows I had in my 
Datatables (jian he via bobby) (Revision 1456536)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1456536
Files : 
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/CountersBlock.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/webapp/HsTasksBlock.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/webapp/HsTasksPage.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java


 Refreshing the RM page forgets how many rows I had in my Datatables
 ---

 Key: YARN-237
 URL: https://issues.apache.org/jira/browse/YARN-237
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0
Reporter: Ravi Prakash
Assignee: jian he
  Labels: usability
 Fix For: 3.0.0, 2.0.5-beta

 Attachments: YARN-237.patch, YARN-237.v2.patch, YARN-237.v3.patch, 
 YARN-237.v4.patch


 If I choose a 100 rows, and then refresh the page, DataTables goes back to 
 showing me 20 rows.
 This user preference should be stored in a cookie.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602481#comment-13602481
 ] 

Bikas Saha commented on YARN-378:
-

env vars are brittle from an api point of view. windows supports such use cases 
fine. the point being for a application developers the information should come 
from the api, and not from a combination of api and env. env requires an agent 
on the other side to set the env apart from the info coming from the api 
itself. here is works because the agent on the other side happens to be the NM 
which is in our control.
To summarize, lets agree to keep this in the API as it exists in the patch. For 
the MR AM's sake, we could additionally add the information in the env also.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602516#comment-13602516
 ] 

Vinod Kumar Vavilapalli commented on YARN-378:
--

Bikas, as of today, env is also part of the API, see the env vars in the public 
class ApplicationConstants.

The correct way to avoid env vars if at all is to pass in another named 
file/resource before container launch, so that AMs/Containers can load them for 
initial settings. We need that anyways, so let's continue to put it in env for 
now (and not introduce multiple ways of access), and fix it (if need be) 
separately.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602542#comment-13602542
 ] 

Vinod Kumar Vavilapalli commented on YARN-378:
--

bq. But we need to mark this JIRA as an incompatible change or put in a 
deprecated config mapping. We are early enough in YARN that deprecating it 
seems like a waste.
+1, unfortunately YARN JIRA setup is messed up, so cannot set the incompatible 
field for now, will file an INFRA ticket. Will put this in INCOMPATIBLE section 
of CHANGES.txt.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1

2013-03-14 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602614#comment-13602614
 ] 

Eli Reisman commented on YARN-226:
--

Hmm, that stirs up some trouble. Giraph tasks may or may not need to be
contiguous ID's but will need a task 0 for at least one of the reserved
containers (so container 2 right now is the one) in order to bootstrap our
master election process. I am using container Id's to translate into giraph
task Id's rigth now by just subtracting two from container Id! It works in
all my tests, but the reservation thing could kick in on big asks (1000
container ask etc.) is that what you're saying? How big can the ask be?

Perhaps I can move this bootstrap stuff from Giraph into my app master if
this is a big problem. Good to know, thanks!






 Log aggregation should not assume an AppMaster will have containerId 1
 --

 Key: YARN-226
 URL: https://issues.apache.org/jira/browse/YARN-226
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth

 In case of reservcations, etc - AppMasters may not get container id 1. We 
 likely need additional info in the CLC / tokens indicating whether a 
 container is an AM or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1

2013-03-14 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602661#comment-13602661
 ] 

Robert Joseph Evans commented on YARN-226:
--

Big means amount of memory/CPU relative to the minimum allocation size.  For 
example you ask for a 4 GB container with a min allocation size of 500MB.

 Log aggregation should not assume an AppMaster will have containerId 1
 --

 Key: YARN-226
 URL: https://issues.apache.org/jira/browse/YARN-226
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth

 In case of reservcations, etc - AppMasters may not get container id 1. We 
 likely need additional info in the CLC / tokens indicating whether a 
 container is an AM or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-226) Log aggregation should not assume an AppMaster will have containerId 1

2013-03-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602667#comment-13602667
 ] 

Hitesh Shah commented on YARN-226:
--

@Sid, nevermind - reservations could effectively increment the id and never 
assign 1 to anything. @Eli, this will occur on clusters when the AM resource 
ask is greater than a single slot and it requires multiple scheduling cycles 
before a large free slot is available to launch the AM.
 

 Log aggregation should not assume an AppMaster will have containerId 1
 --

 Key: YARN-226
 URL: https://issues.apache.org/jira/browse/YARN-226
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth

 In case of reservcations, etc - AppMasters may not get container id 1. We 
 likely need additional info in the CLC / tokens indicating whether a 
 container is an AM or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-378:
-

Attachment: YARN-378_8.patch

Make the patch be restricted in the scope of YARN only. In addition, 
maxAppRetries is set in the environment when launching AM. The global 
maxAppRetries is validated when RM is initiated, if it is non-positive, RM will 
crash. Note that MRAppMaster and TestStagingCleanup have reference to 
YarnConfiguration.RM_AM_MAX_RETRIES, which has been changed to 
YarnConfiguration.RM_AM_MAX_ATTEMPTS. Therefore, the build of mapreduce will be 
broken temporally.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch, YARN-378_8.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602716#comment-13602716
 ] 

Hadoop QA commented on YARN-378:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573761/YARN-378_8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/515//console

This message is automatically generated.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch, YARN-378_8.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM

2013-03-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602719#comment-13602719
 ] 

Hitesh Shah commented on YARN-196:
--

@Xuan, latest patch looks good. Addressing a very minor nit and uploading it. 
Will commit as soon as jenkins does a +1.

 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.10.patch, 
 YARN-196.11.patch, YARN-196.12.patch, YARN-196.1.patch, YARN-196.2.patch, 
 YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, YARN-196.6.patch, 
 YARN-196.7.patch, YARN-196.8.patch, YARN-196.9.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
   ... 7 more
 Caused by: java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
   at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
   at org.apache.hadoop.ipc.Client.call(Client.java:1117)
   ... 9 more
 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
 AsyncDispatcher thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
   at 
 

[jira] [Updated] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM.

2013-03-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-196:
-

Attachment: YARN-196.12.1.patch

 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.10.patch, 
 YARN-196.11.patch, YARN-196.12.1.patch, YARN-196.12.patch, YARN-196.1.patch, 
 YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, 
 YARN-196.6.patch, YARN-196.7.patch, YARN-196.8.patch, YARN-196.9.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
   ... 7 more
 Caused by: java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
   at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
   at org.apache.hadoop.ipc.Client.call(Client.java:1117)
   ... 9 more
 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
 AsyncDispatcher thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
   at java.lang.Thread.run(Thread.java:619)
 2012-01-16 

[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications

2013-03-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-473:
-

Labels: usability  (was: )

 Capacity Scheduler webpage and REST API not showing correct number of pending 
 applications
 --

 Key: YARN-473
 URL: https://issues.apache.org/jira/browse/YARN-473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
  Labels: usability

 The Capacity Scheduler REST API 
 (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API)
  is not returning the correct number of pending applications.  
 numPendingApplications is almost always zero, even if there are dozens of 
 pending apps.
 In investigating this, I discovered that the Resource Manager's Scheduler 
 webpage is also showing an incorrect but different number of pending 
 applications.  For example, the cluster I'm looking at right now currently 
 has 15 applications in the ACCEPTED state, but the Cluster Metrics table near 
 the top of the page says there are only 2 pending apps.  The REST API says 
 there are zero pending apps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM

2013-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602767#comment-13602767
 ] 

Hadoop QA commented on YARN-196:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573764/YARN-196.12.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/516//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/516//console

This message is automatically generated.

 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.10.patch, 
 YARN-196.11.patch, YARN-196.12.1.patch, YARN-196.12.patch, YARN-196.1.patch, 
 YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, 
 YARN-196.6.patch, YARN-196.7.patch, YARN-196.8.patch, YARN-196.9.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602788#comment-13602788
 ] 

Bikas Saha commented on YARN-378:
-

I am in favor of setting the value in env in addition to the api. I want it in 
the api to encourage other app developers to do the desired thing and obtain 
such (and other) information from the RM upon registration. This is different 
from the use case of the application attempt id where we need something before 
contacting the RM.

I also took a quick look at the MR AM code. Its currently reading the value 
from config and the only use is setting the isLastAMRetry value. The 
isLastRetry value is later used during job shutdown. Job shutdown will happen 
after services.start(). So it should not be a terribly large change to get and 
use the retry value after registration. registration happens during 
services.start().

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch, YARN-378_8.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-479:


 Summary: NM retry behavior for connection to RM should be similar 
for lost heartbeats
 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Regardless of connection loss at the start or at an intermediate point, NM's 
retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602817#comment-13602817
 ] 

Hadoop QA commented on YARN-378:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573777/YARN-378_9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/517//console

This message is automatically generated.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, 
 YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, 
 YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-474) CapacityScheduler does not activate applications when configuration is refreshed

2013-03-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-474:
-

Assignee: Zhijie Shen

 CapacityScheduler does not activate applications when configuration is 
 refreshed
 

 Key: YARN-474
 URL: https://issues.apache.org/jira/browse/YARN-474
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Hitesh Shah
Assignee: Zhijie Shen

 Submit 3 applications to a cluster where capacity scheduler limits allow only 
 1 running application. Modify capacity scheduler config to increase value of 
 yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh 
 queues. 
 The 2 applications not yet in running state do not get launched even though 
 limits are increased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-275) Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not.

2013-03-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah resolved YARN-275.
--

Resolution: Duplicate

 Make NodeManagers to NOT blindly heartbeat irrespective of whether previous 
 heartbeat is processed or not.
 --

 Key: YARN-275
 URL: https://issues.apache.org/jira/browse/YARN-275
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
 Attachments: Prototype.txt, YARN-270.1.patch


 Update HeartBeat info on RMNode Side, and CS read the info directly from each 
 RMNode

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-477) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job

2013-03-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-477:
-

Assignee: Zhijie Shen

 When default container executor fails right away, at the CLI launching our 
 App Master, Client doesn't always get the signal to kill the job
 ---

 Key: YARN-477
 URL: https://issues.apache.org/jira/browse/YARN-477
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eli Reisman
Assignee: Zhijie Shen

 I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch 
 my App Master, if the container command line runs it successfully, any 
 failure in the App Master or my launched Giraph Tasks promptly reports to 
 Client and ends my job run. However, if the command line sent to the app 
 master container fails to launch it at all, the error exit code is not 
 propagating. My client hangs with the job at containersUsed == 1 and state == 
 ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way 
 out.
 Disclaimer: this could be my fault. But I wanted to throw it out there in 
 case its not. I also (when this happens) not getting error logs since the app 
 master never launched, so I really have no visibility into why it failed to 
 launch. I am sure its not launching, but the client IS sending the app 
 request, getting a container for my AM, and I see the command line run on the 
 container in my logs. Thats all.
 Thanks! If this is a dup or won't fix for some reason, let me know and 
 sorry for wasting your time!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-477) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job

2013-03-14 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602892#comment-13602892
 ] 

Eli Reisman commented on YARN-477:
--

Duh. Too many issues fixed all at once, they are all running together in my 
mind. OK, going over this again, this is happening during my integration tests 
with MiniYARNCluster, not on the real cluster.

So perhaps the real YARN implementation handles propagating the error to the 
client and RM (etc) when the command line the client tries to use to launch the 
container for the AM fails. I think its the MiniYARNCluster that is not 
handling this situation correctly.

Again, the issue is:

Client starts fine. Creates AMContainerSpec stuff and tries to request AM 
container. This request includes the shell command to launch our AM in the 
container. Container shows up as being granted and provisioned by RM, but from 
there the client hangs waiting for job success/fail, saying it has 1 container 
used the whole time (the AM failed container.) What seems to be happening is 
this shell script fails in launching the AM in its container, so the container 
just sits there forever. Lets check this in MiniYARNCluster and see.

I will try to break the Giraph MiniYARNCluster test again and recreate some 
decent log traces leading up to the event and I will post here. Thanks!

 When default container executor fails right away, at the CLI launching our 
 App Master, Client doesn't always get the signal to kill the job
 ---

 Key: YARN-477
 URL: https://issues.apache.org/jira/browse/YARN-477
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eli Reisman
Assignee: Zhijie Shen

 I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch 
 my App Master, if the container command line runs it successfully, any 
 failure in the App Master or my launched Giraph Tasks promptly reports to 
 Client and ends my job run. However, if the command line sent to the app 
 master container fails to launch it at all, the error exit code is not 
 propagating. My client hangs with the job at containersUsed == 1 and state == 
 ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way 
 out.
 Disclaimer: this could be my fault. But I wanted to throw it out there in 
 case its not. I also (when this happens) not getting error logs since the app 
 master never launched, so I really have no visibility into why it failed to 
 launch. I am sure its not launching, but the client IS sending the app 
 request, getting a container for my AM, and I see the command line run on the 
 container in my logs. Thats all.
 Thanks! If this is a dup or won't fix for some reason, let me know and 
 sorry for wasting your time!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-477) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job

2013-03-14 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602895#comment-13602895
 ] 

Eli Reisman commented on YARN-477:
--

nodemanager log for MiniYARNCluster DID get a log report for app master that 
could only come from the shell command failing:

{code}
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/giraph/yarn/GiraphApplicationMaster
Caused by: java.lang.ClassNotFoundException: 
org.apache.giraph.yarn.GiraphApplicationMaster
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
{code}

So thats good. But I don't think its propagating this to the MiniYARNCluster's 
RM or my Client. From my Client's end, the logs are endless heartbeat msg's 
with -1000 exitCode until I ctrl-c out of the test suite.

FYI, this is not a priority or blocker for my Giraph on YARN, it all works now 
(including the test) in case I wasn't clear. But it should probably get 
investigated/fixed soon if I've really found something here ;)



 When default container executor fails right away, at the CLI launching our 
 App Master, Client doesn't always get the signal to kill the job
 ---

 Key: YARN-477
 URL: https://issues.apache.org/jira/browse/YARN-477
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eli Reisman
Assignee: Zhijie Shen

 I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch 
 my App Master, if the container command line runs it successfully, any 
 failure in the App Master or my launched Giraph Tasks promptly reports to 
 Client and ends my job run. However, if the command line sent to the app 
 master container fails to launch it at all, the error exit code is not 
 propagating. My client hangs with the job at containersUsed == 1 and state == 
 ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way 
 out.
 Disclaimer: this could be my fault. But I wanted to throw it out there in 
 case its not. I also (when this happens) not getting error logs since the app 
 master never launched, so I really have no visibility into why it failed to 
 launch. I am sure its not launching, but the client IS sending the app 
 request, getting a container for my AM, and I see the command line run on the 
 container in my logs. Thats all.
 Thanks! If this is a dup or won't fix for some reason, let me know and 
 sorry for wasting your time!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives

2013-03-14 Thread omkar vinit joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602898#comment-13602898
 ] 

omkar vinit joshi commented on YARN-109:


[~mayank_bansal] Are you still looking into this issue? or else I would like to 
take over.

 .tmp file is not deleted for localized archives
 ---

 Key: YARN-109
 URL: https://issues.apache.org/jira/browse/YARN-109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Mayank Bansal

 When archives are localized they are initially created as a .tmp file and 
 unpacked from that file.  However the .tmp file is not deleted afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-477) MiniYARNCluster: When container executor script fails to launch App Master, NM logs error, but Client doesn't get signaled to kill the job

2013-03-14 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated YARN-477:
-

Summary: MiniYARNCluster: When container executor script fails to launch 
App Master, NM logs error, but Client doesn't get signaled to kill the job  
(was: When default container executor fails right away, at the CLI launching 
our App Master, Client doesn't always get the signal to kill the job)

 MiniYARNCluster: When container executor script fails to launch App Master, 
 NM logs error, but Client doesn't get signaled to kill the job
 --

 Key: YARN-477
 URL: https://issues.apache.org/jira/browse/YARN-477
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eli Reisman
Assignee: Zhijie Shen

 I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch 
 my App Master, if the container command line runs it successfully, any 
 failure in the App Master or my launched Giraph Tasks promptly reports to 
 Client and ends my job run. However, if the command line sent to the app 
 master container fails to launch it at all, the error exit code is not 
 propagating. My client hangs with the job at containersUsed == 1 and state == 
 ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way 
 out.
 Disclaimer: this could be my fault. But I wanted to throw it out there in 
 case its not. I also (when this happens) not getting error logs since the app 
 master never launched, so I really have no visibility into why it failed to 
 launch. I am sure its not launching, but the client IS sending the app 
 request, getting a container for my AM, and I see the command line run on the 
 container in my logs. Thats all.
 Thanks! If this is a dup or won't fix for some reason, let me know and 
 sorry for wasting your time!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-14 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602904#comment-13602904
 ] 

Xuan Gong commented on YARN-71:
---

Uploaded new patch:
1. move out the timestamps, so all local dirs will use the same timestamps.
2. Rewrite the rename and deletion block, create two new functions, 
renameLocalDir() to rename the dirs, deleteLocalDir() to delete dirs
3. change to unit test to cover:
   a: verify the correct user who is used by deletionService
   b. verify fileCache and NM_PRIVATE_DIR deletion
4. Use Records instead of RecordFatory

 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, 
 YARN-71.8.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602909#comment-13602909
 ] 

Hadoop QA commented on YARN-71:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573803/YARN-71.8.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/518//console

This message is automatically generated.

 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, 
 YARN-71.8.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-14 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-71:
--

Attachment: YARN-71.9.patch

 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, 
 YARN-71.8.patch, YARN-71.9.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602916#comment-13602916
 ] 

Hadoop QA commented on YARN-71:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573806/YARN-71.9.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/519//console

This message is automatically generated.

 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, 
 YARN-71.8.patch, YARN-71.9.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-71) Ensure/confirm that the NodeManager cleans up local-dirs on restart

2013-03-14 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602972#comment-13602972
 ] 

Xuan Gong commented on YARN-71:
---

Test the patch in a single cluster running at Centos 6:
1. config LinuxContainerExecutor
2. start namenode,datanode,resourcemanager,nodemanager
3. run a pi example
4. manually kill the nodemanager
5. found the localFiles under localDir which need to be deleted
6. restart nodemanager
7. verify the localFiles have been deleted.

 Ensure/confirm that the NodeManager cleans up local-dirs on restart
 ---

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-71.1.patch, YARN-71.2.patch, YARN-71.3.patch, 
 YARN.71.4.patch, YARN-71.5.patch, YARN-71.6.patch, YARN-71.7.patch, 
 YARN-71.8.patch, YARN-71.9.patch


 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

2013-03-14 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603164#comment-13603164
 ] 

Bikas Saha commented on YARN-417:
-

What is the deadlock here? Its late night and I cant see it :P
Is it related to the exception being thrown when stop() is called on the 
handler thread? Is this guaranteed bad behavior and so we need to throw a 
runtime exception immediately?
I think we need to call client.stop() after the heartbeat thread has stopped. 
otherwise, the heartbeat thread can call client.allocate() in between the 
current client.stop() and keepRunning=false, right?
{code}
+  /**
+   * Tells the heartbeat and handler threads to stop and waits for them to
+   * terminate.  Calling this method from the callback handler thread would 
cause
+   * deadlock, and thus should be avoided.
+   */
+  @Override
+  public void stop() {
+if (Thread.currentThread() == handlerThread) {
+  throw new YarnException(Cannot call stop from callback handler 
thread!);
+}
+client.stop();
+keepRunning = false;
+try {
+  heartbeatThread.join();
{code}

Didnt quite get the assert inside the loop. Perhaps you meant 
takeCompletedContainers()?
{code}
+// wait for the allocated containers from the first heartbeat's response
+while (callbackHandler.takeAllocatedContainers() == null) {
+  Assert.assertEquals(null, callbackHandler.takeAllocatedContainers());
+  Thread.sleep(10);
+}
{code}

I think updating progress needs to be its own callback since its possible that 
no container allocations and completions happen for a long time and thus the 
heartbeats show no progress to the RM.

 Add a poller that allows the AM to receive notifications when it is assigned 
 containers
 ---

 Key: YARN-417
 URL: https://issues.apache.org/jira/browse/YARN-417
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, applications
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, 
 YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417-4.patch, 
 YARN-417-4.patch, YARN-417.patch, YarnAppMaster.java, 
 YarnAppMasterListener.java


 Writing AMs would be easier for some if they did not have to handle 
 heartbeating to the RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira