[jira] [Commented] (YARN-422) Add AM-NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645307#comment-13645307 ] Vinod Kumar Vavilapalli commented on YARN-422: -- bq. 1. Semantically, it is a bit strange RM use AMNMClient. Agreed. May be we should just call it NMClient? bq. 2. Technically, hadoop-yarn-client has dependency on hadoop-yarn-server-resourcemanager in test scope. If we want to use AMNMClient in AMLauncher, hadoop-yarn-server-resourcemanager needs to add the dependency on hadoop-yarn-client, forming a circular dependency. The dependencies are per scope, so there is not circular dependency either in test scope or non-test scope. Is this patch ready for review? Or just a definition file? Doesn't seem so. In any case, I think we need to have either - separate call-backs for failures on startContainer() and failure on stopContainer() - or may be just one call-back with the original event-type? Add AM-NM client library Key: YARN-422 URL: https://issues.apache.org/jira/browse/YARN-422 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: AMNMClient_Defination.txt, AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf Create a simple wrapper over the AM-NM container protocol to provide hide the details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645451#comment-13645451 ] Hudson commented on YARN-599: - Integrated in Hadoop-Yarn-trunk #199 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/199/]) YARN-599. Refactoring submitApplication in ClientRMService and RMAppManager to separate out various validation checks depending on whether they rely on RM configuration or not. Contributed by Zhijie Shen. (Revision 1477478) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477478 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerSubmitEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java Refactoring submitApplication in ClientRMService and RMAppManager - Key: YARN-599 URL: https://issues.apache.org/jira/browse/YARN-599 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.0.5-beta Attachments: YARN-599.1.patch, YARN-599.2.patch Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645454#comment-13645454 ] Hudson commented on YARN-506: - Integrated in Hadoop-Yarn-trunk #199 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/199/]) YARN-506. Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute. Contributed by Ivan Mitic. (Revision 1477408) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477408 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: YARN-506 URL: https://issues.apache.org/jira/browse/YARN-506 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 3.0.0 Attachments: YARN-506.commonfileutils.2.patch, YARN-506.commonfileutils.patch Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645550#comment-13645550 ] Hudson commented on YARN-599: - Integrated in Hadoop-Hdfs-trunk #1388 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1388/]) YARN-599. Refactoring submitApplication in ClientRMService and RMAppManager to separate out various validation checks depending on whether they rely on RM configuration or not. Contributed by Zhijie Shen. (Revision 1477478) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477478 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerSubmitEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java Refactoring submitApplication in ClientRMService and RMAppManager - Key: YARN-599 URL: https://issues.apache.org/jira/browse/YARN-599 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.0.5-beta Attachments: YARN-599.1.patch, YARN-599.2.patch Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645553#comment-13645553 ] Hudson commented on YARN-506: - Integrated in Hadoop-Hdfs-trunk #1388 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1388/]) YARN-506. Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute. Contributed by Ivan Mitic. (Revision 1477408) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477408 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: YARN-506 URL: https://issues.apache.org/jira/browse/YARN-506 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 3.0.0 Attachments: YARN-506.commonfileutils.2.patch, YARN-506.commonfileutils.patch Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645592#comment-13645592 ] Hudson commented on YARN-506: - Integrated in Hadoop-Mapreduce-trunk #1415 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1415/]) YARN-506. Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute. Contributed by Ivan Mitic. (Revision 1477408) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1477408 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: YARN-506 URL: https://issues.apache.org/jira/browse/YARN-506 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 3.0.0 Attachments: YARN-506.commonfileutils.2.patch, YARN-506.commonfileutils.patch Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated YARN-614: - Attachment: YARN-614-0.patch Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1 -- Key: YARN-614 URL: https://issues.apache.org/jira/browse/YARN-614 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Attachments: YARN-614-0.patch Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645741#comment-13645741 ] Chris Riccomini commented on YARN-614: -- I've taken an initial stab at this. Looking for feedback. I added an ignoredFailures variable to RMAppImpl, which keeps a count of AM failures that should be ignored when figuring out whether to retry the AM with a new app attempt. Right now, the failures that are ignored are: DISK_FAILURE and ABORTED. Since the ignoredFailures variable is completely derivable from the app attempt state, I simply start the ignoredFailures at 0, and increment whenever a failure happens that should be ignored. When recover() is called on an app, we recover all attempts (and all of their justFinishedContainers), and then update the ignoredFailures variable accordingly. Potential areas for improvement: 1. Switch RMAppAttemptImpl to have a map of ContainerId to ContainerStatus, so we can do an O(1) lookup instead of traversing the justFinishedContainers list every time we want to look for the master container's status. 2. Add tests. 3. Add an shouldIgnoreFailure method in RMAppImpl, and move the DISK_FAILURE and ABORTED checks there. Any other thoughts? Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1 -- Key: YARN-614 URL: https://issues.apache.org/jira/browse/YARN-614 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Attachments: YARN-614-0.patch Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-625) Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil
Siddharth Seth created YARN-625: --- Summary: Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil Key: YARN-625 URL: https://issues.apache.org/jira/browse/YARN-625 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-579) Make ApplicationToken part of Container's token list to help RM-restart
[ https://issues.apache.org/jira/browse/YARN-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-579. -- Resolution: Fixed I can consistently reproduce this on branch-2 but trunk is fine. Not sure if this was caused indeed by this ticket itself or not. Still investigating. Closing this and continuing investigation on YARN-626. Make ApplicationToken part of Container's token list to help RM-restart --- Key: YARN-579 URL: https://issues.apache.org/jira/browse/YARN-579 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.0.5-beta Attachments: YARN-579-20130422.1.txt, YARN-579-20130422.1_YARNChanges.txt Container is already persisted for helping RM restart. Instead of explicitly setting ApplicationToken in AM's env, if we change it to be in Container, we can avoid env and can also help restart. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-626) Apps fail in secure cluster setup on branch-2
[ https://issues.apache.org/jira/browse/YARN-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-626: - Description: Found at YARN-579 by [~daryn]. Need to investigate if it was caused by YARN-579 itself or something else. Secure setup on trunk passes though. was: Found at YARN-579 by [~aw]. Need to investigate if it was caused by YARN-579 itself or something else. Secure setup on trunk passes though. Daryn posted logs at YARN-579, see [here|https://issues.apache.org/jira/browse/YARN-579?focusedCommentId=13644790page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13644790]. Apps fail in secure cluster setup on branch-2 - Key: YARN-626 URL: https://issues.apache.org/jira/browse/YARN-626 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Found at YARN-579 by [~daryn]. Need to investigate if it was caused by YARN-579 itself or something else. Secure setup on trunk passes though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-627) Simplify YarnRemoteException
Siddharth Seth created YARN-627: --- Summary: Simplify YarnRemoteException Key: YARN-627 URL: https://issues.apache.org/jira/browse/YARN-627 Project: Hadoop YARN Issue Type: Task Reporter: Siddharth Seth Assignee: Siddharth Seth This does not need to be a PB backed record. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645817#comment-13645817 ] Daryn Sharp commented on YARN-617: -- If a token is available, the RPC client will attempt SASL DIGEST-MD5 regardless of the client's security conf. A RPC server also enables SASL DIGEST-MD5 if a secret manager is active. Isn't this sufficient to allow container tokens to always be used for authentication? In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Minor Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-628) Fix YarnException unwrapping
Siddharth Seth created YARN-628: --- Summary: Fix YarnException unwrapping Key: YARN-628 URL: https://issues.apache.org/jira/browse/YARN-628 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Unwrapping of YarnRemoteExceptions (currently in YarnRemoteExceptionPBImpl, RPCUtil post YARN-625) is broken, and often ends up throwin UndeclaredThrowableException. This needs to be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-625) Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil
[ https://issues.apache.org/jira/browse/YARN-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645827#comment-13645827 ] Hadoop QA commented on YARN-625: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581203/YARN-625.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/845//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/845//console This message is automatically generated. Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil -- Key: YARN-625 URL: https://issues.apache.org/jira/browse/YARN-625 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: YARN-625.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-142) Change YARN APIs to throw IOException
[ https://issues.apache.org/jira/browse/YARN-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645833#comment-13645833 ] Xuan Gong commented on YARN-142: Split it into several sub-tasks: 1. Make YarnRemoteException not be rooted at IOException 2. change AMRMProtocol api to throw IOException and YarnRemoteException 3. change ClientRMProtocol api to throw IOException and YarnRemoteException 4. change ContainerManager api to throw IOException and YarnRemoteException 5. change RMAdminProtocol api to throw IOException and YarnRemoteException Change YARN APIs to throw IOException - Key: YARN-142 URL: https://issues.apache.org/jira/browse/YARN-142 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-142.1.patch, YARN-142.2.patch, YARN-142.3.patch, YARN-142.4.patch Ref: MAPREDUCE-4067 All YARN APIs currently throw YarnRemoteException. 1) This cannot be extended in it's current form. 2) The RPC layer can throw IOExceptions. These end up showing up as UndeclaredThrowableExceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-629) Make YarnRemoteException not be rooted at IOException
Xuan Gong created YARN-629: -- Summary: Make YarnRemoteException not be rooted at IOException Key: YARN-629 URL: https://issues.apache.org/jira/browse/YARN-629 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-632) Change ContainerManager api to throw IOException and YarnRemoteException
Xuan Gong created YARN-632: -- Summary: Change ContainerManager api to throw IOException and YarnRemoteException Key: YARN-632 URL: https://issues.apache.org/jira/browse/YARN-632 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-633) Change RMAdminProtocol api to throw IOException and YarnRemoteException
Xuan Gong created YARN-633: -- Summary: Change RMAdminProtocol api to throw IOException and YarnRemoteException Key: YARN-633 URL: https://issues.apache.org/jira/browse/YARN-633 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container
[ https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645852#comment-13645852 ] Daryn Sharp commented on YARN-613: -- I assumed that was the implementation. Does a global AM secret degrade the security of yarn by allowing one rogue node to begin fabricating tokens? Create NM proxy per NM instead of per container --- Key: YARN-613 URL: https://issues.apache.org/jira/browse/YARN-613 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Vinod Kumar Vavilapalli Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-634) Introduce a SerializedException
Siddharth Seth created YARN-634: --- Summary: Introduce a SerializedException Key: YARN-634 URL: https://issues.apache.org/jira/browse/YARN-634 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth LocalizationProtocol sends an exception over the wire. This currently uses YarnRemoteException. Post YARN-627, this needs to be changed and a new serialized exception is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container
[ https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645890#comment-13645890 ] Vinod Kumar Vavilapalli commented on YARN-613: -- bq. I assumed that was the implementation. Does a global AM secret degrade the security of yarn by allowing one rogue node to begin fabricating tokens? NMs are trusted. They are kerberos authenticated, and we also have the service level authorization to enforce only some principals. Is that not enough? The better argument perhaps is crunching through a lot of AMTokens to figure out the key, but we rollover keys every so often to avoid that case. Create NM proxy per NM instead of per container --- Key: YARN-613 URL: https://issues.apache.org/jira/browse/YARN-613 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Vinod Kumar Vavilapalli Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-635) Rename YarnRemoteException to YarnException
Xuan Gong created YARN-635: -- Summary: Rename YarnRemoteException to YarnException Key: YARN-635 URL: https://issues.apache.org/jira/browse/YARN-635 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645896#comment-13645896 ] Vinod Kumar Vavilapalli commented on YARN-617: -- bq. If a token is available, the RPC client will attempt SASL DIGEST-MD5 regardless of the client's security conf. Isn't this sufficient to allow container tokens to always be used for authentication? Agreed, I should have been more clearer. At YARN-613, we are trying to change the auth to use AMTokens and authorization will continue to be via ContainerTokens. In that sense, yes, we don't need this separation, but YARN-613 will do that anyways, so we may as well do it here. bq. A RPC server also enables SASL DIGEST-MD5 if a secret manager is active. Off topic, but this is what I guessed is the reason underlying YARN-626, do you know when this got merged into branch-2? In unsercure mode, AM can fake resource requirements - Key: YARN-617 URL: https://issues.apache.org/jira/browse/YARN-617 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Minor Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645991#comment-13645991 ] Robert Joseph Evans commented on YARN-528: -- The approach seems OK to me, but I would rather have the impl be an even thinner wrapper. {code} private ApplicationIdProto proto = null; private ApplicationIdProto.Builder builder = null; ApplicationIdPBImpl(ApplicationIdProto proto) { this.proto = proto; } public ApplicationIdPBImpl() { this.builder = ApplicationIdProto.newBuilder(); } public ApplicationIdProto getProto() { assert (proto != null); return proto; } @Override public int getId() { assert (proto != null); return proto.getId(); } @Override protected void setId(int id) { assert (builder != null); builder.setId((id)); } @Override public long getClusterTimestamp() { assert(proto != null); return proto.getClusterTimestamp(); } @Override protected void setClusterTimestamp(long clusterTimestamp) { assert(builder != null); builder.setClusterTimestamp((clusterTimestamp)); } @Override protected void build() { assert(builder != null); proto = builder.build(); builder = null; } {code} Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: y528_AppIdPart_01_Refactor.txt, y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-582: - Attachment: YARN-582.2.patch This patch restores appToken, clientToken will be addressed in another jira Restore appToken and clientToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch, YARN-582.2.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-582) Restore appToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-582: - Summary: Restore appToken for app attempt after RM restart (was: Restore appToken and clientToken for app attempt after RM restart) Restore appToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch, YARN-582.2.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-636) Restore clientToken for app attempt after RM restart
Jian He created YARN-636: Summary: Restore clientToken for app attempt after RM restart Key: YARN-636 URL: https://issues.apache.org/jira/browse/YARN-636 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646048#comment-13646048 ] Bikas Saha commented on YARN-513: - I like RMProxy. Do we see this being useful to user code. If yes, then we should put it in yarn client package. Why is this called defaultPolicy? {code} + private final RetryPolicy defaultPolicy; {code} Can these be inferred from Protocol type? Else we cannot use this for other protocols. {code} +this.rmAddress = conf.getSocketAddr( +YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS, +YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS, +YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_PORT); {code} With this the code will start looking like NameNodeProxies with its create* methods. I looked at NNProxies class. It looks we need an almost exact version of that for RM. The main difference being that we dont have HA yet and RM protocols run on different ports. The neat thing about NNProxies looks like they create a transparent proxy using java.reflect.proxy that ends up calling realImpl.invoke(method) internally when the client calls calls proxy.method(). Thus it looks better than our approach where our client explicitly call proxy.invoke(method). Perhaps we should try to change our impl to something similar to NNProxies. What do folks think? Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-618: - Attachment: YARN-618.2.patch new patch addressed last comments Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646094#comment-13646094 ] Hadoop QA commented on YARN-618: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581248/YARN-618.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/846//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/846//console This message is automatically generated. Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646095#comment-13646095 ] Hadoop QA commented on YARN-582: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581243/YARN-582.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/847//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/847//console This message is automatically generated. Restore appToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch, YARN-582.2.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-637) FS: maxAssign is not honored
[ https://issues.apache.org/jira/browse/YARN-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza moved MAPREDUCE-5200 to YARN-637: Component/s: (was: scheduler) scheduler Target Version/s: (was: 2.0.5-beta) Affects Version/s: (was: 2.0.4-alpha) 2.0.4-alpha Key: YARN-637 (was: MAPREDUCE-5200) Project: Hadoop YARN (was: Hadoop Map/Reduce) FS: maxAssign is not honored Key: YARN-637 URL: https://issues.apache.org/jira/browse/YARN-637 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.4-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla maxAssign limits the number of containers that can be assigned in a single heartbeat. Currently, FS doesn't keep track of number of assigned containers to check this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException
[ https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646171#comment-13646171 ] Xuan Gong commented on YARN-629: 1. use extends Exception instead of extends IOException at YarnRemoteException.java 2. Change the YarnRemoteExceptionPBImpl::unwrapAndThrowException() to explicitly return YarnRemoteException object. 3. throw YarnRemoteException and IOException instead of only throwing IOException, because YarnRemoteException is not subclass of IOException. Make YarnRemoteException not be rooted at IOException - Key: YARN-629 URL: https://issues.apache.org/jira/browse/YARN-629 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-629) Make YarnRemoteException not be rooted at IOException
[ https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-629: --- Attachment: YARN-629.1.patch Make YarnRemoteException not be rooted at IOException - Key: YARN-629 URL: https://issues.apache.org/jira/browse/YARN-629 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-629.1.patch After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1
[ https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646188#comment-13646188 ] Bikas Saha commented on YARN-614: - Agree about a method that encapsulates if an rmappattempt failed with an error we want to ignore. Perhaps rename it to countFailureToAttemptLimit() or something like that. We can then add hysteresis logic later on for perpetual apps for which we want to count failures only in the last hour say. I am afraid allowing appattempt.size to exceed maxAttempts might break code somewhere else that did not expect this to happen. Need to check thoroughly for this. The recovery code wont work since right now, the RM does not recover attempts where appattempts.size() maxattempts. eg. of above case. Look at RMAppManager.recover(). One solution could be to move the check from finishAttempt() to createAttempt(). finishAttempt() always enqueues a new attempt. the new attempt creation checks if one can still be created based on failed count etc. Another solution could be to make the RMApp go from NEW to FAILED in the recover transition based on failed counts etc. Having said that, recovery wont work because the mastercontainer is saved before launching the attempt and as such does not have the exit status populated in it. We could leave recovery for a different jira and focus on the regular code path in this one perhaps. Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1 -- Key: YARN-614 URL: https://issues.apache.org/jira/browse/YARN-614 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Attachments: YARN-614-0.patch Attempts can fail due to a large number of user errors and they should not be retried unnecessarily. The only reason YARN should retry an attempt is when the hardware fails or YARN has an error. NM failing, lost NM and NM disk errors are the hardware errors that come to mind. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646282#comment-13646282 ] Bikas Saha commented on YARN-582: - Why is this inside the try instead of where the existing fields are set. Isnt is safe to pass null to setAppAttemptTokens? {code} try { +if(appAttemptTokens != null){ + attemptStateData.setAppAttemptTokens(appAttemptTokens); +} {code} Why create new token secret manager for every generateToken() call? {code} + private ByteBuffer generateTokens(ApplicationAttemptId attemptId, + Configuration conf) { +ApplicationTokenSecretManager appTokenMgr = +new ApplicationTokenSecretManager(conf); +ApplicationTokenIdentifier appTokenId = +new ApplicationTokenIdentifier(attemptId); {code} This check should be performed after restart also. {code} +// assert application Token is saved +Assert.assertEquals(attempt1Token, attemptState.getAppAttemptTokens()); {code} This check should be performed before restart also since we changed this code path. {code} +// assert ApplicationTokenSecretManager has the password populated +Assert.assertTrue(rm2.getApplicationTokenSecretManager().hasPassword( + newAttempt.getAppAttemptId())); {code} This is wrong because the new app should be creating its own tokens. {code} } +app.createNewAttempt(true); +break; + case RECOVER: +RMAppAttempt attempt = app.createNewAttempt(true); + +// reuse the appToken from previous attempt +if (UserGroupInformation.isSecurityEnabled()) { + ApplicationAttemptId previousAttempt = + Records.newRecord(ApplicationAttemptId.class); + previousAttempt.setApplicationId(app.getApplicationId()); + previousAttempt.setAttemptId(app.getAppAttempts().size() - 1); + ApplicationState appState = app.getRMState().getApplicationState() + .get(app.getApplicationId()); + ApplicationAttemptState attemptState = + appState.getAttempt(previousAttempt); + assert attemptState != null; + ((RMAppAttemptImpl) attempt).recoverAppAttemptTokens(attemptState +.getAppAttemptTokens()); +} +break; + default: {code} Hence this is wrong {code} +// assert the new Attempt id is the same as the desired new attempt id +Assert.assertEquals(desiredNewAttemptId, newAttempt.getAppAttemptId()); + +// assert new attempt reuse previous attempt tokens +Assert.assertEquals(attempt1Token, newAttempt.getAppAttemptTokens()); {code} Need to check for securityEnabled when recovering tokens and populating the secret manager? Can we move token creation from constructor of RMAppAttemptImpl to AttemptStartedTransition? That way we will not end up creating new tokens in constructor and overriding them in recover(). Also in recover(), lets just populate the tokens but not add them to the secret managers. Later in work preserving restart we need to create a NEW-RUNNING transition in which the restored tokens will be added to the secret manager. Things to check 1) Why is this code in FinalTransition and not BaseFinalTransition? {code} // Unregister from the ClientTokenSecretManager if (UserGroupInformation.isSecurityEnabled()) { appAttempt.rmContext.getClientToAMTokenSecretManager() .unRegisterApplication(appAttempt.getAppAttemptId()); } {code} 2) why is this duplicated in both BaseFinalTransition and AMUnregisteredTransition? {code} // Remove the AppAttempt from the ApplicationTokenSecretManager appAttempt.rmContext.getApplicationTokenSecretManager() .applicationMasterFinished(appAttemptId); {code} Restore appToken for app attempt after RM restart - Key: YARN-582 URL: https://issues.apache.org/jira/browse/YARN-582 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-582.1.patch, YARN-582.2.patch These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-638) Add DelegationTokens back to DelegationTokenSecretManager after RM Restart
Jian He created YARN-638: Summary: Add DelegationTokens back to DelegationTokenSecretManager after RM Restart Key: YARN-638 URL: https://issues.apache.org/jira/browse/YARN-638 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He This is missed in YARN-581. After RM restart, delegation tokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-638) Add DelegationTokens back to DelegationTokenSecretManager after RM Restart
[ https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-638: - Attachment: YARN-638.1.patch This patch adds delegationTokens to DelegationTokenSecretManager after RM restarts, and adds test cases for that Add DelegationTokens back to DelegationTokenSecretManager after RM Restart -- Key: YARN-638 URL: https://issues.apache.org/jira/browse/YARN-638 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-638.1.patch This is missed in YARN-581. After RM restart, delegation tokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-629) Make YarnRemoteException not be rooted at IOException
[ https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-629: --- Attachment: YARN-629.2.patch Fix testcase failures: org.apache.hadoop.mapred.TestClientServiceDelegate org.apache.hadoop.mapreduce.TestMRJobClient. Another four test case failures can be tracked at MAPREDUCE-5193 Make YarnRemoteException not be rooted at IOException - Key: YARN-629 URL: https://issues.apache.org/jira/browse/YARN-629 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-629.1.patch, YARN-629.2.patch After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-618: - Attachment: YARN-618.3.patch address last comments Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646337#comment-13646337 ] Hadoop QA commented on YARN-618: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581325/YARN-618.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/851//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/851//console This message is automatically generated. Modify RM_INVALID_IDENTIFIER to a -ve number - Key: YARN-618 URL: https://issues.apache.org/jira/browse/YARN-618 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-618.1.patch, YARN-618.2.patch, YARN-618.3.patch, YARN-618.patch RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-638) Add DelegationTokens back to DelegationTokenSecretManager after RM Restart
[ https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646343#comment-13646343 ] Hadoop QA commented on YARN-638: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581322/YARN-638.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/849//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/849//console This message is automatically generated. Add DelegationTokens back to DelegationTokenSecretManager after RM Restart -- Key: YARN-638 URL: https://issues.apache.org/jira/browse/YARN-638 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-638.1.patch This is missed in YARN-581. After RM restart, delegation tokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-629) Make YarnRemoteException not be rooted at IOException
[ https://issues.apache.org/jira/browse/YARN-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646361#comment-13646361 ] Hadoop QA commented on YARN-629: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581324/YARN-629.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 24 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat org.apache.hadoop.mapred.TestFileInputFormat org.apache.hadoop.mapred.lib.TestDelegatingInputFormat org.apache.hadoop.mapreduce.lib.input.TestDelegatingInputFormat {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/850//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/850//console This message is automatically generated. Make YarnRemoteException not be rooted at IOException - Key: YARN-629 URL: https://issues.apache.org/jira/browse/YARN-629 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-629.1.patch, YARN-629.2.patch After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira