date:20130430


[ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645307#comment-13645307
 ] 

Vinod Kumar Vavilapalli commented on YARN-422:
--

bq. 1. Semantically, it is a bit strange RM use AMNMClient.
Agreed. May be we should just call it NMClient?

bq. 2. Technically, hadoop-yarn-client has dependency on 
hadoop-yarn-server-resourcemanager in test scope. If we want to use AMNMClient 
in AMLauncher, hadoop-yarn-server-resourcemanager needs to add the dependency 
on hadoop-yarn-client, forming a circular dependency.
The dependencies are per scope, so there is not circular dependency either in 
test scope or non-test scope.

Is this patch ready for review? Or just a definition file? Doesn't seem so.

In any case, I think we need to have either
 - separate call-backs for failures on startContainer() and failure on 
stopContainer()
 - or may be just one call-back with the original event-type?

 Add AM-NM client library
 

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf


 Create a simple wrapper over the AM-NM container protocol to provide hide the 
 details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager


[ 
https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645451#comment-13645451
 ] 

Hudson commented on YARN-599:
-

Integrated in Hadoop-Yarn-trunk #199 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/199/])
YARN-599. Refactoring submitApplication in ClientRMService and RMAppManager 
to separate out various validation checks depending on whether they rely on RM 
configuration or not. Contributed by Zhijie Shen. (Revision 1477478)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477478
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerSubmitEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


 Refactoring submitApplication in ClientRMService and RMAppManager
 -

 Key: YARN-599
 URL: https://issues.apache.org/jira/browse/YARN-599
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.0.5-beta

 Attachments: YARN-599.1.patch, YARN-599.2.patch


 Currently, ClientRMService#submitApplication call RMAppManager#handle, and 
 consequently call RMAppMangager#submitApplication directly, though the code 
 looks like scheduling an APP_SUBMIT event.
 In addition, the validation code before creating an RMApp instance is not 
 well organized. Ideally, the dynamic validation, which depends on the RM's 
 configuration, should be put in RMAppMangager#submitApplication. 
 RMAppMangager#submitApplication is called by 
 ClientRMService#submitApplication and RMAppMangager#recover. Since the 
 configuration may be changed after RM restarts, the validation needs to be 
 done again even in recovery mode. Therefore, resource request validation, 
 which based on min/max resource limits, should be moved from 
 ClientRMService#submitApplication to RMAppMangager#submitApplication. On the 
 other hand, the static validation, which is independent of the RM's 
 configuration should be put in ClientRMService#submitApplication, because it 
 is only need to be done once during the first submission.
 Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. 
 RMAppMangager#submitApplication has a flaw is not synchronized. If two 
 application submissions with the same application ID enter the function, and 
 one progresses to the completion of RMApp instantiation, and the other 
 progresses the completion of putting the RMApp instance into rmContext, the 
 slower submission will cause an exception due to the duplicate application 
 ID. However, the exception will cause the RMApp instance already in rmContext 
 (belongs to the faster submission) being rejected with the current code flow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute


[ 
https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645454#comment-13645454
 ] 

Hudson commented on YARN-506:
-

Integrated in Hadoop-Yarn-trunk #199 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/199/])
YARN-506. Move to common utils FileUtil#setReadable/Writable/Executable and 
FileUtil#canRead/Write/Execute. Contributed by Ivan Mitic. (Revision 1477408)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477408
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


 Move to common utils FileUtil#setReadable/Writable/Executable and 
 FileUtil#canRead/Write/Execute
 

 Key: YARN-506
 URL: https://issues.apache.org/jira/browse/YARN-506
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 3.0.0

 Attachments: YARN-506.commonfileutils.2.patch, 
 YARN-506.commonfileutils.patch


 Move to common utils described in HADOOP-9413 that work well cross-platform.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager


[ 
https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645550#comment-13645550
 ] 

Hudson commented on YARN-599:
-

Integrated in Hadoop-Hdfs-trunk #1388 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1388/])
YARN-599. Refactoring submitApplication in ClientRMService and RMAppManager 
to separate out various validation checks depending on whether they rely on RM 
configuration or not. Contributed by Zhijie Shen. (Revision 1477478)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477478
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerSubmitEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


 Refactoring submitApplication in ClientRMService and RMAppManager
 -

 Key: YARN-599
 URL: https://issues.apache.org/jira/browse/YARN-599
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.0.5-beta

 Attachments: YARN-599.1.patch, YARN-599.2.patch


 Currently, ClientRMService#submitApplication call RMAppManager#handle, and 
 consequently call RMAppMangager#submitApplication directly, though the code 
 looks like scheduling an APP_SUBMIT event.
 In addition, the validation code before creating an RMApp instance is not 
 well organized. Ideally, the dynamic validation, which depends on the RM's 
 configuration, should be put in RMAppMangager#submitApplication. 
 RMAppMangager#submitApplication is called by 
 ClientRMService#submitApplication and RMAppMangager#recover. Since the 
 configuration may be changed after RM restarts, the validation needs to be 
 done again even in recovery mode. Therefore, resource request validation, 
 which based on min/max resource limits, should be moved from 
 ClientRMService#submitApplication to RMAppMangager#submitApplication. On the 
 other hand, the static validation, which is independent of the RM's 
 configuration should be put in ClientRMService#submitApplication, because it 
 is only need to be done once during the first submission.
 Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. 
 RMAppMangager#submitApplication has a flaw is not synchronized. If two 
 application submissions with the same application ID enter the function, and 
 one progresses to the completion of RMApp instantiation, and the other 
 progresses the completion of putting the RMApp instance into rmContext, the 
 slower submission will cause an exception due to the duplicate application 
 ID. However, the exception will cause the RMApp instance already in rmContext 
 (belongs to the faster submission) being rejected with the current code flow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute


[ 
https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645553#comment-13645553
 ] 

Hudson commented on YARN-506:
-

Integrated in Hadoop-Hdfs-trunk #1388 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1388/])
YARN-506. Move to common utils FileUtil#setReadable/Writable/Executable and 
FileUtil#canRead/Write/Execute. Contributed by Ivan Mitic. (Revision 1477408)

 Result = FAILURE
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477408
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


 Move to common utils FileUtil#setReadable/Writable/Executable and 
 FileUtil#canRead/Write/Execute
 

 Key: YARN-506
 URL: https://issues.apache.org/jira/browse/YARN-506
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 3.0.0

 Attachments: YARN-506.commonfileutils.2.patch, 
 YARN-506.commonfileutils.patch


 Move to common utils described in HADOOP-9413 that work well cross-platform.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute


[ 
https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645592#comment-13645592
 ] 

Hudson commented on YARN-506:
-

Integrated in Hadoop-Mapreduce-trunk #1415 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1415/])
YARN-506. Move to common utils FileUtil#setReadable/Writable/Executable and 
FileUtil#canRead/Write/Execute. Contributed by Ivan Mitic. (Revision 1477408)

 Result = FAILURE
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477408
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


 Move to common utils FileUtil#setReadable/Writable/Executable and 
 FileUtil#canRead/Write/Execute
 

 Key: YARN-506
 URL: https://issues.apache.org/jira/browse/YARN-506
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 3.0.0

 Attachments: YARN-506.commonfileutils.2.patch, 
 YARN-506.commonfileutils.patch


 Move to common utils described in HADOOP-9413 that work well cross-platform.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-04-30 Thread Chris Riccomini (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-614:
-

Attachment: YARN-614-0.patch

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
 Attachments: YARN-614-0.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-04-30 Thread Chris Riccomini (JIRA)

[
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645741#comment-13645741
]

Chris Riccomini commented on YARN-614:
--

I've taken an initial stab at this. Looking for feedback. I added an
ignoredFailures variable to RMAppImpl, which keeps a count of AM failures that
should be ignored when figuring out whether to retry the AM with a new app
attempt. Right now, the failures that are ignored are: DISK_FAILURE and
ABORTED. Since the ignoredFailures variable is completely derivable from the
app attempt state, I simply start the ignoredFailures at 0, and increment
whenever a failure happens that should be ignored. When recover() is called on
an app, we recover all attempts (and all of their justFinishedContainers), and
then update the ignoredFailures variable accordingly.

Potential areas for improvement:

1. Switch RMAppAttemptImpl to have a map of ContainerId to ContainerStatus, so
we can do an O(1) lookup instead of traversing the justFinishedContainers list
every time we want to look for the master container's status.
2. Add tests.
3. Add an shouldIgnoreFailure method in RMAppImpl, and move the DISK_FAILURE
and ABORTED checks there.

Any other thoughts?

Retry attempts automatically for hardware failures or YARN issues and set
default app retries to 1
--

Key: YARN-614
URL: https://issues.apache.org/jira/browse/YARN-614
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Bikas Saha
Attachments: YARN-614-0.patch

Attempts can fail due to a large number of user errors and they should not be
retried unnecessarily. The only reason YARN should retry an attempt is when
the hardware fails or YARN has an error. NM failing, lost NM and NM disk
errors are the hardware errors that come to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-625) Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil

Siddharth Seth created YARN-625:
---

 Summary: Move unwrapAndThrowException from 
YarnRemoteExceptionPBImpl to RPCUtil
 Key: YARN-625
 URL: https://issues.apache.org/jira/browse/YARN-625
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-579) Make ApplicationToken part of Container's token list to help RM-restart


 [ 
https://issues.apache.org/jira/browse/YARN-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-579.
--

Resolution: Fixed

I can consistently reproduce this on branch-2 but trunk is fine.

Not sure if this was caused indeed by this ticket itself or not. Still 
investigating. Closing this and continuing investigation on YARN-626.

 Make ApplicationToken part of Container's token list to help RM-restart
 ---

 Key: YARN-579
 URL: https://issues.apache.org/jira/browse/YARN-579
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.4-alpha
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.0.5-beta

 Attachments: YARN-579-20130422.1.txt, 
 YARN-579-20130422.1_YARNChanges.txt


 Container is already persisted for helping RM restart. Instead of explicitly 
 setting ApplicationToken in AM's env, if we change it to be in Container, we 
 can avoid env and can also help restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-626) Apps fail in secure cluster setup on branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-626:
-

Description: 
Found at YARN-579 by [~daryn]. Need to investigate if it was caused by YARN-579 
itself or something else.

Secure setup on trunk passes though.

  was:
Found at YARN-579 by [~aw]. Need to investigate if it was caused by YARN-579 
itself or something else.

Secure setup on trunk passes though.


Daryn posted logs at YARN-579, see 
[here|https://issues.apache.org/jira/browse/YARN-579?focusedCommentId=13644790page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13644790].

 Apps fail in secure cluster setup on branch-2
 -

 Key: YARN-626
 URL: https://issues.apache.org/jira/browse/YARN-626
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 Found at YARN-579 by [~daryn]. Need to investigate if it was caused by 
 YARN-579 itself or something else.
 Secure setup on trunk passes though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-627) Simplify YarnRemoteException

Siddharth Seth created YARN-627:
---

 Summary: Simplify YarnRemoteException
 Key: YARN-627
 URL: https://issues.apache.org/jira/browse/YARN-627
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siddharth Seth
Assignee: Siddharth Seth


This does not need to be a PB backed record.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements

2013-04-30 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645817#comment-13645817
 ] 

Daryn Sharp commented on YARN-617:
--

If a token is available, the RPC client will attempt SASL DIGEST-MD5 regardless 
of the client's security conf.  A RPC server also enables SASL DIGEST-MD5 if a 
secret manager is active.  Isn't this sufficient to allow container tokens to 
always be used for authentication?

 In unsercure mode, AM can fake resource requirements 
 -

 Key: YARN-617
 URL: https://issues.apache.org/jira/browse/YARN-617
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Minor

 Without security, it is impossible to completely avoid AMs faking resources. 
 We can at the least make it as difficult as possible by using the same 
 container tokens and the RM-NM shared key mechanism over unauthenticated 
 RM-NM channel.
 In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-628) Fix YarnException unwrapping

Siddharth Seth created YARN-628:
---

 Summary: Fix YarnException unwrapping
 Key: YARN-628
 URL: https://issues.apache.org/jira/browse/YARN-628
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth


Unwrapping of YarnRemoteExceptions (currently in YarnRemoteExceptionPBImpl, 
RPCUtil post YARN-625) is broken, and often ends up throwin 
UndeclaredThrowableException. This needs to be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-625) Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil


[ 
https://issues.apache.org/jira/browse/YARN-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645827#comment-13645827
 ] 

Hadoop QA commented on YARN-625:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581203/YARN-625.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/845//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/845//console

This message is automatically generated.

 Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil
 --

 Key: YARN-625
 URL: https://issues.apache.org/jira/browse/YARN-625
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: YARN-625.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-142) Change YARN APIs to throw IOException


[ 
https://issues.apache.org/jira/browse/YARN-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645833#comment-13645833
 ] 

Xuan Gong commented on YARN-142:


Split it into several sub-tasks:
1. Make YarnRemoteException not be rooted at IOException
2. change AMRMProtocol api to throw IOException and YarnRemoteException
3. change ClientRMProtocol api to throw IOException and YarnRemoteException
4. change ContainerManager api to throw IOException and YarnRemoteException
5. change RMAdminProtocol api to throw IOException and YarnRemoteException 

 Change YARN APIs to throw IOException
 -

 Key: YARN-142
 URL: https://issues.apache.org/jira/browse/YARN-142
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-142.1.patch, YARN-142.2.patch, YARN-142.3.patch, 
 YARN-142.4.patch


 Ref: MAPREDUCE-4067
 All YARN APIs currently throw YarnRemoteException.
 1) This cannot be extended in it's current form.
 2) The RPC layer can throw IOExceptions. These end up showing up as 
 UndeclaredThrowableExceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-629) Make YarnRemoteException not be rooted at IOException

Xuan Gong created YARN-629:
--

 Summary: Make YarnRemoteException not be rooted at IOException
 Key: YARN-629
 URL: https://issues.apache.org/jira/browse/YARN-629
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


After HADOOP-9343, it should be possible for YarnException to not be rooted at 
IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-632) Change ContainerManager api to throw IOException and YarnRemoteException

Xuan Gong created YARN-632:
--

 Summary: Change ContainerManager api to throw IOException and 
YarnRemoteException
 Key: YARN-632
 URL: https://issues.apache.org/jira/browse/YARN-632
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-633) Change RMAdminProtocol api to throw IOException and YarnRemoteException

Xuan Gong created YARN-633:
--

 Summary: Change RMAdminProtocol api to throw IOException and 
YarnRemoteException
 Key: YARN-633
 URL: https://issues.apache.org/jira/browse/YARN-633
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-04-30 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645852#comment-13645852
 ] 

Daryn Sharp commented on YARN-613:
--

I assumed that was the implementation.  Does a global AM secret degrade the 
security of yarn by allowing one rogue node to begin fabricating tokens?

 Create NM proxy per NM instead of per container
 ---

 Key: YARN-613
 URL: https://issues.apache.org/jira/browse/YARN-613
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Vinod Kumar Vavilapalli

 Currently a new NM proxy has to be created per container since the secure 
 authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-634) Introduce a SerializedException

Siddharth Seth created YARN-634:
---

 Summary: Introduce a SerializedException
 Key: YARN-634
 URL: https://issues.apache.org/jira/browse/YARN-634
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth


LocalizationProtocol sends an exception over the wire. This currently uses 
YarnRemoteException. Post YARN-627, this needs to be changed and a new 
serialized exception is required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container


[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645890#comment-13645890
 ] 

Vinod Kumar Vavilapalli commented on YARN-613:
--

bq. I assumed that was the implementation. Does a global AM secret degrade the 
security of yarn by allowing one rogue node to begin fabricating tokens?
NMs are trusted. They are kerberos authenticated, and we also have the service 
level authorization to enforce only some principals. Is that not enough?

The better argument perhaps is crunching through a lot of AMTokens to figure 
out the key, but we rollover keys every so often to avoid that case.

 Create NM proxy per NM instead of per container
 ---

 Key: YARN-613
 URL: https://issues.apache.org/jira/browse/YARN-613
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Vinod Kumar Vavilapalli

 Currently a new NM proxy has to be created per container since the secure 
 authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-635) Rename YarnRemoteException to YarnException

Xuan Gong created YARN-635:
--

 Summary: Rename YarnRemoteException to YarnException
 Key: YARN-635
 URL: https://issues.apache.org/jira/browse/YARN-635
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements

[
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645896#comment-13645896
]

Vinod Kumar Vavilapalli commented on YARN-617:
--

bq. If a token is available, the RPC client will attempt SASL DIGEST-MD5
regardless of the client's security conf. Isn't this sufficient to allow
container tokens to always be used for authentication?
Agreed, I should have been more clearer. At YARN-613, we are trying to change
the auth to use AMTokens and authorization will continue to be via
ContainerTokens. In that sense, yes, we don't need this separation, but
YARN-613 will do that anyways, so we may as well do it here.

bq. A RPC server also enables SASL DIGEST-MD5 if a secret manager is active.
Off topic, but this is what I guessed is the reason underlying YARN-626, do you
know when this got merged into branch-2?

In unsercure mode, AM can fake resource requirements
-

Key: YARN-617
URL: https://issues.apache.org/jira/browse/YARN-617
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Minor

Without security, it is impossible to completely avoid AMs faking resources.
We can at the least make it as difficult as possible by using the same
container tokens and the RM-NM shared key mechanism over unauthenticated
RM-NM channel.
In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

[jira] [Commented] (YARN-528) Make IDs read only

2013-04-30 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645991#comment-13645991
 ] 

Robert Joseph Evans commented on YARN-528:
--

The approach seems OK to me, but I would rather have the impl be an even 
thinner wrapper.

{code}
  private ApplicationIdProto proto = null;
  private ApplicationIdProto.Builder builder = null;

  ApplicationIdPBImpl(ApplicationIdProto proto) {
this.proto = proto;
  }

  public ApplicationIdPBImpl() {
this.builder = ApplicationIdProto.newBuilder();
  }
 
  public ApplicationIdProto getProto() {
assert (proto != null);
return proto;
  }
 
  @Override
  public int getId() {
assert (proto != null);
return proto.getId();
  }
 
  @Override
  protected void setId(int id) {
assert (builder != null);
builder.setId((id));
  }

  @Override
  public long getClusterTimestamp() {
assert(proto != null);
return proto.getClusterTimestamp();
  }
 
  @Override
  protected void setClusterTimestamp(long clusterTimestamp) {
assert(builder != null);
builder.setClusterTimestamp((clusterTimestamp));
  }

  @Override
  protected void build() {
assert(builder != null);
proto = builder.build();
builder = null;
  }
{code}

 Make IDs read only
 --

 Key: YARN-528
 URL: https://issues.apache.org/jira/browse/YARN-528
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: y528_AppIdPart_01_Refactor.txt, 
 y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, 
 y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt


 I really would like to rip out most if not all of the abstraction layer that 
 sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
 no plans to support any other serialization type, and the abstraction layer 
 just, makes it more difficult to change protocols, makes changing them more 
 error prone, and slows down the objects themselves.  
 Completely doing that is a lot of work.  This JIRA is a first step towards 
 that.  It makes the various ID objects immutable.  If this patch is wel 
 received I will try to go through other objects/classes of objects and update 
 them in a similar way.
 This is probably the last time we will be able to make a change like this 
 before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-582) Restore appToken and clientToken for app attempt after RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-582:
-

Attachment: YARN-582.2.patch

This patch restores appToken, clientToken will be addressed in another jira

 Restore appToken and clientToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-582) Restore appToken for app attempt after RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-582:
-

Summary: Restore appToken for app attempt after RM restart  (was: Restore 
appToken and clientToken for app attempt after RM restart)

 Restore appToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-636) Restore clientToken for app attempt after RM restart

Jian He created YARN-636:


 Summary: Restore clientToken for app attempt after RM restart
 Key: YARN-636
 URL: https://issues.apache.org/jira/browse/YARN-636
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM

2013-04-30 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646048#comment-13646048
]

Bikas Saha commented on YARN-513:
-

I like RMProxy.
Do we see this being useful to user code. If yes, then we should put it in yarn
client package.

Why is this called defaultPolicy?
{code}
+ private final RetryPolicy defaultPolicy;
{code}

Can these be inferred from Protocol type? Else we cannot use this for other
protocols.
{code}
+this.rmAddress = conf.getSocketAddr(
+YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,
+YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,
+YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_PORT);
{code}
With this the code will start looking like NameNodeProxies with its create*
methods.

I looked at NNProxies class. It looks we need an almost exact version of that
for RM. The main difference being that we dont have HA yet and RM protocols run
on different ports.
The neat thing about NNProxies looks like they create a transparent proxy using
java.reflect.proxy that ends up calling realImpl.invoke(method) internally when
the client calls calls proxy.method(). Thus it looks better than our approach
where our client explicitly call proxy.invoke(method). Perhaps we should try
to change our impl to something similar to NNProxies. What do folks think?

Create common proxy client for communicating with RM

Key: YARN-513
URL: https://issues.apache.org/jira/browse/YARN-513
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong
Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch,
YARN-513.4.patch

When the RM is restarting, the NM, AM and Clients should wait for some time
for the RM to come back up.

[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number


 [ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-618:
-

Attachment: YARN-618.2.patch

new patch addressed last comments

 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number


[ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646094#comment-13646094
 ] 

Hadoop QA commented on YARN-618:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581248/YARN-618.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/846//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/846//console

This message is automatically generated.

 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart


[ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646095#comment-13646095
 ] 

Hadoop QA commented on YARN-582:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581243/YARN-582.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/847//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/847//console

This message is automatically generated.

 Restore appToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-637) FS: maxAssign is not honored

2013-04-30 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza moved MAPREDUCE-5200 to YARN-637:


  Component/s: (was: scheduler)
   scheduler
 Target Version/s:   (was: 2.0.5-beta)
Affects Version/s: (was: 2.0.4-alpha)
   2.0.4-alpha
  Key: YARN-637  (was: MAPREDUCE-5200)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 FS: maxAssign is not honored
 

 Key: YARN-637
 URL: https://issues.apache.org/jira/browse/YARN-637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 maxAssign limits the number of containers that can be assigned in a single 
 heartbeat. Currently, FS doesn't keep track of number of assigned containers 
 to check this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException


[ 
https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646171#comment-13646171
 ] 

Xuan Gong commented on YARN-629:


1. use extends Exception instead of extends IOException at 
YarnRemoteException.java
2. Change the YarnRemoteExceptionPBImpl::unwrapAndThrowException() to 
explicitly return YarnRemoteException object. 
3. throw YarnRemoteException and IOException instead of only throwing 
IOException, because YarnRemoteException is not subclass of IOException.

 Make YarnRemoteException not be rooted at IOException
 -

 Key: YARN-629
 URL: https://issues.apache.org/jira/browse/YARN-629
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong

 After HADOOP-9343, it should be possible for YarnException to not be rooted 
 at IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-629) Make YarnRemoteException not be rooted at IOException


 [ 
https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-629:
---

Attachment: YARN-629.1.patch

 Make YarnRemoteException not be rooted at IOException
 -

 Key: YARN-629
 URL: https://issues.apache.org/jira/browse/YARN-629
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-629.1.patch


 After HADOOP-9343, it should be possible for YarnException to not be rooted 
 at IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-04-30 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646188#comment-13646188
]

Bikas Saha commented on YARN-614:
-

Agree about a method that encapsulates if an rmappattempt failed with an error
we want to ignore. Perhaps rename it to countFailureToAttemptLimit() or
something like that. We can then add hysteresis logic later on for perpetual
apps for which we want to count failures only in the last hour say.

I am afraid allowing appattempt.size to exceed maxAttempts might break code
somewhere else that did not expect this to happen. Need to check thoroughly for
this.
The recovery code wont work since right now, the RM does not recover attempts
where appattempts.size() maxattempts. eg. of above case. Look at
RMAppManager.recover(). One solution could be to move the check from
finishAttempt() to createAttempt(). finishAttempt() always enqueues a new
attempt. the new attempt creation checks if one can still be created based on
failed count etc. Another solution could be to make the RMApp go from NEW to
FAILED in the recover transition based on failed counts etc.

Having said that, recovery wont work because the mastercontainer is saved
before launching the attempt and as such does not have the exit status
populated in it. We could leave recovery for a different jira and focus on the
regular code path in this one perhaps.

Retry attempts automatically for hardware failures or YARN issues and set
default app retries to 1
--

Key: YARN-614
URL: https://issues.apache.org/jira/browse/YARN-614
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Bikas Saha
Attachments: YARN-614-0.patch

[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart

2013-04-30 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646282#comment-13646282
 ] 

Bikas Saha commented on YARN-582:
-

Why is this inside the try instead of where the existing fields are set. 
Isnt is safe to pass null to setAppAttemptTokens?
{code}
   try {
+if(appAttemptTokens != null){
+  attemptStateData.setAppAttemptTokens(appAttemptTokens);
+}
{code}

Why create new token secret manager for every generateToken() call?
{code}
+  private ByteBuffer generateTokens(ApplicationAttemptId attemptId,
+  Configuration conf) {
+ApplicationTokenSecretManager appTokenMgr =
+new ApplicationTokenSecretManager(conf);
+ApplicationTokenIdentifier appTokenId =
+new ApplicationTokenIdentifier(attemptId);
{code}

This check should be performed after restart also.
{code}
+// assert application Token is saved
+Assert.assertEquals(attempt1Token, attemptState.getAppAttemptTokens());
{code}

This check should be performed before restart also since we changed this code 
path.
{code}
+// assert ApplicationTokenSecretManager has the password populated
+Assert.assertTrue(rm2.getApplicationTokenSecretManager().hasPassword(
+  newAttempt.getAppAttemptId()));
{code}

This is wrong because the new app should be creating its own tokens.
{code}
 }
+app.createNewAttempt(true);
+break;
+  case RECOVER:
+RMAppAttempt attempt = app.createNewAttempt(true);
+
+// reuse the appToken from previous attempt
+if (UserGroupInformation.isSecurityEnabled()) {
+  ApplicationAttemptId previousAttempt =
+  Records.newRecord(ApplicationAttemptId.class);
+  previousAttempt.setApplicationId(app.getApplicationId());
+  previousAttempt.setAttemptId(app.getAppAttempts().size() - 1);
+  ApplicationState appState = app.getRMState().getApplicationState()
+  .get(app.getApplicationId());
+  ApplicationAttemptState attemptState =
+  appState.getAttempt(previousAttempt);
+  assert attemptState != null;
+  ((RMAppAttemptImpl) attempt).recoverAppAttemptTokens(attemptState
+.getAppAttemptTokens());
+}
+break;
+  default:
{code}
Hence this is wrong
{code}
+// assert the new Attempt id is the same as the desired new attempt id
+Assert.assertEquals(desiredNewAttemptId, newAttempt.getAppAttemptId());
+
+// assert new attempt reuse previous attempt tokens
+Assert.assertEquals(attempt1Token, newAttempt.getAppAttemptTokens());
{code}

Need to check for securityEnabled when recovering tokens and populating the 
secret manager?

Can we move token creation from constructor of RMAppAttemptImpl to 
AttemptStartedTransition? That way we will not end up creating new tokens in 
constructor and overriding them in recover(). Also in recover(), lets just 
populate the tokens but not add them to the secret managers. Later in work 
preserving restart we need to create a NEW-RUNNING transition in which the 
restored tokens will be added to the secret manager.

Things to check
1) Why is this code in FinalTransition and not BaseFinalTransition?
{code}
  // Unregister from the ClientTokenSecretManager
  if (UserGroupInformation.isSecurityEnabled()) {
appAttempt.rmContext.getClientToAMTokenSecretManager()
  .unRegisterApplication(appAttempt.getAppAttemptId());
  }
{code}
2) why is this duplicated in both BaseFinalTransition and 
AMUnregisteredTransition?
{code}
  // Remove the AppAttempt from the ApplicationTokenSecretManager
  appAttempt.rmContext.getApplicationTokenSecretManager()
.applicationMasterFinished(appAttemptId);
{code}

 Restore appToken for app attempt after RM restart
 -

 Key: YARN-582
 URL: https://issues.apache.org/jira/browse/YARN-582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-582.1.patch, YARN-582.2.patch


 These need to be saved and restored on a per app attempt basis. This is 
 required only when work preserving restart is implemented for secure 
 clusters. In non-preserving restart app attempts are killed and so this does 
 not matter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-638) Add DelegationTokens back to DelegationTokenSecretManager after RM Restart

Jian He created YARN-638:


 Summary: Add DelegationTokens back to DelegationTokenSecretManager 
after RM Restart
 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


This is missed in YARN-581. After RM restart, delegation tokens need to be 
added both in DelegationTokenRenewer (addressed in YARN-581), and 
delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-638) Add DelegationTokens back to DelegationTokenSecretManager after RM Restart


 [ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-638:
-

Attachment: YARN-638.1.patch

This patch adds delegationTokens to DelegationTokenSecretManager after RM 
restarts, and adds test cases for that

 Add DelegationTokens back to DelegationTokenSecretManager after RM Restart
 --

 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-638.1.patch


 This is missed in YARN-581. After RM restart, delegation tokens need to be 
 added both in DelegationTokenRenewer (addressed in YARN-581), and 
 delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-629) Make YarnRemoteException not be rooted at IOException


 [ 
https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-629:
---

Attachment: YARN-629.2.patch

Fix testcase failures: org.apache.hadoop.mapred.TestClientServiceDelegate
org.apache.hadoop.mapreduce.TestMRJobClient.

Another four test case failures can be tracked at MAPREDUCE-5193

 Make YarnRemoteException not be rooted at IOException
 -

 Key: YARN-629
 URL: https://issues.apache.org/jira/browse/YARN-629
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-629.1.patch, YARN-629.2.patch


 After HADOOP-9343, it should be possible for YarnException to not be rooted 
 at IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number


 [ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-618:
-

Attachment: YARN-618.3.patch

address last comments

 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3.patch, 
 YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number


[ 
https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646337#comment-13646337
 ] 

Hadoop QA commented on YARN-618:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581325/YARN-618.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/851//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/851//console

This message is automatically generated.

 Modify RM_INVALID_IDENTIFIER to  a -ve number
 -

 Key: YARN-618
 URL: https://issues.apache.org/jira/browse/YARN-618
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3.patch, 
 YARN-618.patch


 RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. 
 Probably a -ve number is what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-638) Add DelegationTokens back to DelegationTokenSecretManager after RM Restart


[ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646343#comment-13646343
 ] 

Hadoop QA commented on YARN-638:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581322/YARN-638.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/849//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/849//console

This message is automatically generated.

 Add DelegationTokens back to DelegationTokenSecretManager after RM Restart
 --

 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-638.1.patch


 This is missed in YARN-581. After RM restart, delegation tokens need to be 
 added both in DelegationTokenRenewer (addressed in YARN-581), and 
 delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException