[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number

2013-05-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647425#comment-13647425
 ] 

Hudson commented on YARN-618:
-

Integrated in Hadoop-Yarn-trunk #201 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/201/])
YARN-618. Modified RM_INVALID_IDENTIFIER to be -1 instead of zero. 
Contributed by Jian He. (Revision 1478230)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1478230
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java


 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-618.1.patch, YARN-618.2.patch, 
 YARN-618.3-branch-2.patch, YARN-618.3.patch, YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number

2013-05-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647489#comment-13647489
 ] 

Hudson commented on YARN-618:
-

Integrated in Hadoop-Hdfs-trunk #1390 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1390/])
YARN-618. Modified RM_INVALID_IDENTIFIER to be -1 instead of zero. 
Contributed by Jian He. (Revision 1478230)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1478230
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java


 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-618.1.patch, YARN-618.2.patch, 
 YARN-618.3-branch-2.patch, YARN-618.3.patch, YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number

2013-05-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647544#comment-13647544
 ] 

Hudson commented on YARN-618:
-

Integrated in Hadoop-Mapreduce-trunk #1417 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1417/])
YARN-618. Modified RM_INVALID_IDENTIFIER to be -1 instead of zero. 
Contributed by Jian He. (Revision 1478230)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1478230
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestEventFlow.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java


 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-618.1.patch, YARN-618.2.patch, 
 YARN-618.3-branch-2.patch, YARN-618.3.patch, YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-621) RM triggers web auth failure before first job

2013-05-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647687#comment-13647687
 ] 

Allen Wittenauer commented on YARN-621:
---

Some more details:

* With the exception of fixing the broken start-dfs.sh, this is a pure Apache 
2.0.4 deploy on RHEL 6.3.

* We configure logical nics and bind services to them.  In order to work around 
HADOOP-9520, we have hard coded all the service names in the configuration.

* After the first job, authentication works and the system works as expected.  
If that jobs rolls off the page, the replay errors return.

* Same realm or cross-realm does not appear to make a difference.

* The only stack trace I'm able to find is the one generated by the filter for 
the replay error itself.

* Hit this in both Firefox and Safari (which also triggers HADOOP-9521).

 RM triggers web auth failure before first job
 -

 Key: YARN-621
 URL: https://issues.apache.org/jira/browse/YARN-621
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Allen Wittenauer
Assignee: Vinod Kumar Vavilapalli
Priority: Critical

 On a secure YARN setup, before the first job is executed, going to the web 
 interface of the resource manager triggers authentication errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-02 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-614:
-

Attachment: YARN-614-1.patch

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
 Attachments: YARN-614-0.patch, YARN-614-1.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-02 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647750#comment-13647750
 ] 

Chris Riccomini commented on YARN-614:
--

Added a new patch. Resolves 1 (switch justFinishedContainers to a map for O(1) 
container status look) and 3 (added a shouldIgnoreFailures method) in my list 
above.

Bikas, I think we should leave recovery for another ticket.

Do you want me to update RMAppManager.recover() to have the same if 
(app.attempts.size() - app.ignoredFailures = app.maxAppAttempts) logic as 
RMAppImpl.AttemptFailedTransition?

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
 Attachments: YARN-614-0.patch, YARN-614-1.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-636) Restore clientToken for app attempt after RM restart

2013-05-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647769#comment-13647769
 ] 

Bikas Saha commented on YARN-636:
-

If this is being fully covered in YARN-582 then please resolve this as 
duplicate.

 Restore clientToken for app attempt after RM restart
 

 Key: YARN-636
 URL: https://issues.apache.org/jira/browse/YARN-636
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-02 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647770#comment-13647770
 ] 

Chris Riccomini commented on YARN-614:
--

Hey Bikas,

Looking into the recovery stuff a bit more. As far as I can tell (still 
wrapping my head around this stuff), the RMApp's RECOVER transition moves from 
NEW to SUBMITTED right now. This transition is triggered by the 
RMAppManager.recover - RMAppManager.submitApplication, which sends the RECOVER 
event. The submitApplication call happens directly before appImpl.recover() in 
RMAppManager:

bq.  if(shouldRecover) {
LOG.info(Recovering application  + appState.getAppId());
submitApplication(appState.getApplicationSubmissionContext(), 
appState.getSubmitTime(), true);
// re-populate attempt information in application
RMAppImpl appImpl = (RMAppImpl) rmContext.getRMApps().get(
appState.getAppId());
appImpl.recover(state);
  }

This means that the RECOVER transition (StartAppAttemptTransition) happens 
before we have any state in the RMAppImpl. As a result, we can't add any logic 
to StartAppAttemptTransition to determine whether we should transition to 
FAILED at this point (since the attempts variable will be empty at this point). 
I think this means that we can't do your second suggestion (Another solution 
could be to make the RMApp go from NEW to FAILED in the recover transition 
based on failed counts etc.).

Am I understanding this correctly?

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
 Attachments: YARN-614-0.patch, YARN-614-1.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-641) Make AMLauncher in RM Use NMClient

2013-05-02 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned YARN-641:
--

Assignee: Chris Nauroth  (was: Zhijie Shen)

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Chris Nauroth

 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-641) Make AMLauncher in RM Use NMClient

2013-05-02 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned YARN-641:
--

Assignee: Zhijie Shen  (was: Chris Nauroth)

Accidentally assigned this to myself.  Giving it back to Zhijie.

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-638) Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart

2013-05-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647914#comment-13647914
 ] 

Jian He commented on YARN-638:
--

Simply adding RMDelegationTokens back to DelegationTokenSecretManager is not 
enough. We also need to store the master keys, since renewToken method is using 
corresponding key of token to generate new password and verify the client is 
renewing token with correct password.
The current solution for restoring RMDelegationTokens is to add a separate 
RMDelegationSecrectManagerStore in RMStateStore. What it does is to save the 
token and the master key whenever they are generated, and remove the states 
when token expires and key is rolled over

 Add RMDelegationTokens back to DelegationTokenSecretManager after RM Restart
 

 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-638.1.patch


 This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
 added both in DelegationTokenRenewer (addressed in YARN-581), and 
 delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-638) Restore RMDelegationTokens after RM Restart

2013-05-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-638:
-

Summary: Restore RMDelegationTokens after RM Restart  (was: Add 
RMDelegationTokens back to DelegationTokenSecretManager after RM Restart)

 Restore RMDelegationTokens after RM Restart
 ---

 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-638.1.patch


 This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
 added both in DelegationTokenRenewer (addressed in YARN-581), and 
 delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-581) Test and verify that app delegation tokens are added to tokenNewer after RM restart

2013-05-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-581:
-

Summary: Test and verify that app delegation tokens are added to tokenNewer 
after RM restart  (was: Test and verify that app delegation tokens are restored 
after RM restart)

 Test and verify that app delegation tokens are added to tokenNewer after RM 
 restart
 ---

 Key: YARN-581
 URL: https://issues.apache.org/jira/browse/YARN-581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-581.1.patch, YARN-581.2.patch


 The code already saves the delegation tokens in AppSubmissionContext. Upon 
 restart the AppSubmissionContext is used to submit the application again and 
 so restores the delegation tokens. This jira tracks testing and verifying 
 this functionality in a secure setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-581) Test and verify that app delegation tokens are added to tokenNewer after RM restart

2013-05-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647927#comment-13647927
 ] 

Jian He commented on YARN-581:
--

changed the title, this patch handles restoring delegationTokens in tokenNewer. 
restoring delegationTokens in delegationTokenSecretManager is addressed in 
YARN-638

 Test and verify that app delegation tokens are added to tokenNewer after RM 
 restart
 ---

 Key: YARN-581
 URL: https://issues.apache.org/jira/browse/YARN-581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-581.1.patch, YARN-581.2.patch


 The code already saves the delegation tokens in AppSubmissionContext. Upon 
 restart the AppSubmissionContext is used to submit the application again and 
 so restores the delegation tokens. This jira tracks testing and verifying 
 this functionality in a secure setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-636) Restore clientToken for app attempt after RM restart

2013-05-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647941#comment-13647941
 ] 

Jian He commented on YARN-636:
--

Currently, this is not covered in YARN-582, I separated the patch

 Restore clientToken for app attempt after RM restart
 

 Key: YARN-636
 URL: https://issues.apache.org/jira/browse/YARN-636
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-581) Test and verify that app delegation tokens are added to tokenRenewer after RM restart

2013-05-02 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-581:
-

Summary: Test and verify that app delegation tokens are added to 
tokenRenewer after RM restart  (was: Test and verify that app delegation tokens 
are added to tokenNewer after RM restart)

 Test and verify that app delegation tokens are added to tokenRenewer after RM 
 restart
 -

 Key: YARN-581
 URL: https://issues.apache.org/jira/browse/YARN-581
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-581.1.patch, YARN-581.2.patch


 The code already saves the delegation tokens in AppSubmissionContext. Upon 
 restart the AppSubmissionContext is used to submit the application again and 
 so restores the delegation tokens. This jira tracks testing and verifying 
 this functionality in a secure setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-629) Make YarnRemoteException not be rooted at IOException

2013-05-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-629:
---

Attachment: YARN-629.3.patch

 Make YarnRemoteException not be rooted at IOException
 -

 Key: YARN-629
 URL: https://issues.apache.org/jira/browse/YARN-629
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-629.1.patch, YARN-629.2.patch, YARN-629.3.patch


 After HADOOP-9343, it should be possible for YarnException to not be rooted 
 at IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException

2013-05-02 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648021#comment-13648021
 ] 

Xuan Gong commented on YARN-629:


Drop test code to verify NMNotYetReadyException and InvalidContainerException, 
need to add them back after Yarn-142

 Make YarnRemoteException not be rooted at IOException
 -

 Key: YARN-629
 URL: https://issues.apache.org/jira/browse/YARN-629
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-629.1.patch, YARN-629.2.patch, YARN-629.3.patch


 After HADOOP-9343, it should be possible for YarnException to not be rooted 
 at IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException

2013-05-02 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648022#comment-13648022
 ] 

Xuan Gong commented on YARN-629:


Uploaded new patch to do all the MR change at MAPREDUCE-5204

 Make YarnRemoteException not be rooted at IOException
 -

 Key: YARN-629
 URL: https://issues.apache.org/jira/browse/YARN-629
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-629.1.patch, YARN-629.2.patch, YARN-629.3.patch


 After HADOOP-9343, it should be possible for YarnException to not be rooted 
 at IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-05-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-326:


Attachment: YARN-326-2.patch

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-05-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-326:


Attachment: (was: YARN-326-2.patch)

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-05-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648025#comment-13648025
 ] 

Sandy Ryza commented on YARN-326:
-

Uploaded a patch that fixes a couple bugs, includes more tests, and supports 
min and max share cpu configurations.  If it passes Jenkins, it's ready for 
review.

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart

2013-05-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648037#comment-13648037
 ] 

Bikas Saha commented on YARN-582:
-

Is the null check necessary? the underlying protobuf handles the null properly.
{code}
  ByteBuffer appAttemptTokens = attemptState.getAppAttemptTokens();
  if(appAttemptTokens != null){
attemptStateData.setAppAttemptTokens(appAttemptTokens);
  }
{code}

New public method necessary? RMAppAttemptImpl.recoverAppAttemptTokens()

Looks like all changes in RMAppImpl are unnecessary.

Bug in existing testDelegationTokenRestoredOnRMrestart(). The assert check 
should be made for rm1 and also for rm2. Right?
{code}
// start new RM
MockRM rm2 = new TestSecurityMockRM(conf, memStore);
rm2.start();

// verify tokens are properly populated back to DelegationTokenRenewer
Assert.assertEquals(tokenSet, rm1.getRMContext()
  .getDelegationTokenRenewer().getDelegationTokens());
{code}

 Restore appToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch, YARN-582.3.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart

2013-05-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648040#comment-13648040
 ] 

Bikas Saha commented on YARN-582:
-

Also, we could be better off storing credentials in rmappattemptimpl and only 
converting to bytebuffer inside rmstore. currently, because of this in 
amlauncher we end up converting bytebuffer back to credentials.

 Restore appToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch, YARN-582.3.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-642) Fix up RMWebServices#getNodes

2013-05-02 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-642:
---

 Summary: Fix up RMWebServices#getNodes
 Key: YARN-642
 URL: https://issues.apache.org/jira/browse/YARN-642
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza


The code behind the /nodes RM REST API is unnecessarily muddled, logs the same 
misspelled INFO message repeatedly, and does not return unhealthy nodes, even 
when asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition

2013-05-02 Thread Jian He (JIRA)
Jian He created YARN-643:


 Summary: WHY appToken is removed both in BaseFinalTransition and 
AMUnregisteredTransition AND clientToken is removed in FinalTransition and not 
BaseFinalTransition
 Key: YARN-643
 URL: https://issues.apache.org/jira/browse/YARN-643
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM

2013-05-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648075#comment-13648075
 ] 

Bikas Saha commented on YARN-513:
-



This will not work since for different protocols we have different ports on the 
RM. rmAddress cannot be passed into the methdo. Also, for failover case, the 
rmAddress needs to be determined internally. Based on protocol we need to find 
the correct address etc from conf and create the correct proxy object.
{code}
+  public static T T createRMProxy(final Configuration conf,
+  final ClassT protocol, final InetSocketAddress rmAddress)
{code}

Can this code be written in the formate if(waitForever) {} else {}? It may be 
simpler
{code}
+RetryPolicy retryPolicy =
+(waitForEver) ? RetryPolicies.RETRY_FOREVER :
+RetryPolicies.retryUpToMaximumTimeWithFixedSleep(rmConnectWaitMS,
+rmConnectionRetryIntervalMS,
+TimeUnit.MILLISECONDS);
+MapClass? extends Exception, RetryPolicy exceptionToPolicyMap =
+new HashMapClass? extends Exception, RetryPolicy();
+exceptionToPolicyMap.put(java.net.ConnectException.class, retryPolicy);
+exceptionToPolicyMap.put(java.io.EOFException.class, retryPolicy);
+return (waitForEver) ? RetryPolicies.RETRY_FOREVER :
+RetryPolicies.retryByException(
+retryPolicy, exceptionToPolicyMap);
{code}


Same retryPolicy is being passed into exceptionmap and as the default value. 
Whats the use of the exceptionmap then?
{code}
+RetryPolicies.retryByException(
+retryPolicy, exceptionToPolicyMap);
{code}

Any way to keep diagnostic error messages?

I think if we dont rename NMStatusUpdater.getRMClient to createRMPRoxy then we 
dont need LocalRMProxy and most of the test code changes will also disappear.
{code}
-  protected ResourceTracker getRMClient() {
-Configuration conf = getConfig();
-YarnRPC rpc = YarnRPC.create(conf);
-return (ResourceTracker) rpc.getProxy(ResourceTracker.class, rmAddress,
-conf);
+  @VisibleForTesting
+  protected ResourceTracker createRMProxy(Configuration conf)
+  throws IOException {
+return RMProxy.createRMProxy(conf, ResourceTracker.class, rmAddress);
   }
{code}

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-642) Fix up RMWebServices#getNodes

2013-05-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-642:


Labels: incompatible  (was: )

 Fix up RMWebServices#getNodes
 -

 Key: YARN-642
 URL: https://issues.apache.org/jira/browse/YARN-642
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
  Labels: incompatible

 The code behind the /nodes RM REST API is unnecessarily muddled, logs the 
 same misspelled INFO message repeatedly, and does not return unhealthy nodes, 
 even when asked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer

2013-05-02 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-644:
--

 Summary: Basic null check is not performed on passed in arguments 
before using them in ContainerManagerImpl.startContainer
 Key: YARN-644
 URL: https://issues.apache.org/jira/browse/YARN-644
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Priority: Minor


I see that validation/ null check is not performed on passed in parameters. 

Ex. tokenId.getContainerID().getApplicationAttemptId() inside 
ContainerManagerImpl.authorizeRequest()

I guess we should add these checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-617) In unsercure mode, AM can fake resource requirements

2013-05-02 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-617:
---

Attachment: YARN-617.20130502.patch

Updating existing test cases.

 In unsercure mode, AM can fake resource requirements 
 -

 Key: YARN-617
 URL: https://issues.apache.org/jira/browse/YARN-617
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch, 
 YARN-617.20130502.patch


 Without security, it is impossible to completely avoid AMs faking resources. 
 We can at the least make it as difficult as possible by using the same 
 container tokens and the RM-NM shared key mechanism over unauthenticated 
 RM-NM channel.
 In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-645) Move RMDelegationTokenSecretManager from yarn-server-common to yarn-server-resourcemanager

2013-05-02 Thread Jian He (JIRA)
Jian He created YARN-645:


 Summary: Move RMDelegationTokenSecretManager from 
yarn-server-common to yarn-server-resourcemanager
 Key: YARN-645
 URL: https://issues.apache.org/jira/browse/YARN-645
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


RMDelegationTokenSecretManager is specific to resource manager, should not 
belong to server-common

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-645) Move RMDelegationTokenSecretManager from yarn-server-common to yarn-server-resourcemanager

2013-05-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-645:
-

Attachment: YARN-645.patch

this patch moved RMDelegationTokenSecretManager

 Move RMDelegationTokenSecretManager from yarn-server-common to 
 yarn-server-resourcemanager
 --

 Key: YARN-645
 URL: https://issues.apache.org/jira/browse/YARN-645
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-645.patch


 RMDelegationTokenSecretManager is specific to resource manager, should not 
 belong to server-common

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-05-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648197#comment-13648197
 ] 

Bikas Saha commented on YARN-614:
-

Unfortunately that is not how it happens. The RECOVER event is enqueued but not 
sent since the event dispatcher starts after recovery is completed. So by the 
time RECOVER is reached all the state has been populated. The catch is that 
currently recovery stores the attempt state before launching the attempt. At 
that time it does not have the container status because that is obtained when 
the attempt finishes. So I am of the opinion of leaving recovery for a 
different jira.

Now for the patch itself.
Minor style thing. All this code could go inside app.updateFailureCount() and 
let it do whatever it wants because the app has access to all the data and 
more. That method can evolve separately without bloating the transition method.
We need to check if the justFinished containers would always have an entry 
for the master container. Specially the case when the node is lost because it 
went down.
{code}
+// If the failure was not the AM's fault (e.g. node lost, or disk
+// failure), then increment ignored failures, so we don't count the
+// failure when determining whether to restart the app or not.
+RMAppAttempt appMasterAttempt = app.attempts.get(app.currentAttempt
+.getAppAttemptId());
+Container appMasterContainer = appMasterAttempt.getMasterContainer();
+ContainerStatus status = appMasterAttempt.getJustFinishedContainers()
+.get(appMasterContainer.getId());
+
+app.updateFailureCount(status.getExitStatus());
{code}

I am assuming aborted implies node lost in the patch. We need to make sure that 
aborted is not being used as a generic catch all. Else we may need to add a new 
specific exit status NODE_LOST for the specific case.
{code}
+  private boolean shouldCountFailureToAttemptLimit(int 
masterContainerExitStatus) {
+return masterContainerExitStatus != ContainerExitStatus.DISKS_FAILED
+ masterContainerExitStatus != ContainerExitStatus.ABORTED;
+  }
{code}

I am not in favor of changing the List to a Map. The search is performed only 
once at the end of the life of the attempt and also if it has failed. So I am 
not sure perf is an issue here if we iterate once through this list. List is 
cheaper wrt memory and also maintains the order of completion of containers as 
received by the RM. Its cheap for the ApplicationMasterService to pull when it 
populates the allocate response. This code probably wont compile because 
ApplicationMasterService expects a list and not a map.
{code}
-  private final ListContainerStatus justFinishedContainers =
-new ArrayListContainerStatus();
+  private final MapContainerId, ContainerStatus justFinishedContainers =
+new HashMapContainerId, ContainerStatus();
{code}

Not quite sure why this method needs to be public. If its private then it need 
not be part of the RMApp interface and thus MockAsm or MockRMapp need not 
change.
{code}
   @Override
+  public int getIgnoredFailures() {
+this.readLock.lock();
+
+try {
+  return this.ignoredFailures;
+} finally {
+  this.readLock.unlock();
+}
+  }
+
{code}

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
 Attachments: YARN-614-0.patch, YARN-614-1.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM

2013-05-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648204#comment-13648204
 ] 

Vinod Kumar Vavilapalli commented on YARN-513:
--

While Bikas continues to review the patch, I just wanted to say that the patch 
*and* the code overall is so much cleaner now, thanks!

Will look at the patch again once these comments are addressed.

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-645) Move RMDelegationTokenSecretManager from yarn-server-common to yarn-server-resourcemanager

2013-05-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-645:
-

Attachment: YARN-645.1.patch

 Move RMDelegationTokenSecretManager from yarn-server-common to 
 yarn-server-resourcemanager
 --

 Key: YARN-645
 URL: https://issues.apache.org/jira/browse/YARN-645
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-645.1.patch, YARN-645.patch


 RMDelegationTokenSecretManager is specific to resource manager, should not 
 belong to server-common

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira