[jira] [Commented] (YARN-684) ContainerManager.startContainer needs to only have ContainerTokenIdentifier instead of the whole Container
[ https://issues.apache.org/jira/browse/YARN-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671328#comment-13671328 ] Hudson commented on YARN-684: - Integrated in Hadoop-Yarn-trunk #226 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/226/]) MAPREDUCE-5286. Change MapReduce to use ContainerTokenIdentifier instead of the entire Container in the startContainer call - YARN-684. Contributed by Vinod Kumar Vavilapalli. (Revision 1488087) YARN-684. ContainerManager.startContainer should use ContainerTokenIdentifier instead of the entire Container. Contributed by Vinod Kumar Vavilapalli. (Revision 1488085) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1488087 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1488085 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/NMClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationContainerInitEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java *
[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671572#comment-13671572 ] Hadoop QA commented on YARN-530: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585627/YARN-530-012.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1048//console This message is automatically generated. Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf, YARN-530-005.patch, YARN-530-008.patch, YARN-530-009.patch, YARN-530-010.patch, YARN-530-011.patch, YARN-530-012.patch, YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-692) Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat.
[ https://issues.apache.org/jira/browse/YARN-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671631#comment-13671631 ] Hadoop QA commented on YARN-692: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585639/YARN-692.20130531.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1049//console This message is automatically generated. Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat. -- Key: YARN-692 URL: https://issues.apache.org/jira/browse/YARN-692 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: YARN-692.20130530.1.patch, YARN-692.20130530.2.patch, YARN-692.20130531.patch This is related to YARN-613 . Here we will be implementing NMToken generation on RM side and sharing it with NM during RM-NM heartbeat. As a part of this JIRA mater key will only be made available to NM but there will be no validation done until AM-NM communication is fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-733) TestNMClient fails occasionally
[ https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-733: - Attachment: YARN-733.1.patch In the patch: 1. Update the tests to wait until the expected container status occur 2. In NMClientImpl, add a piece of javadoc to describe that startContainer/stopContainer returns doesn't mean container is actually started/stopped. There could be a transit container status. Have run the test for tens of times, and no failure occurs. TestNMClient fails occasionally --- Key: YARN-733 URL: https://issues.apache.org/jira/browse/YARN-733 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-733.1.patch The problem happens at: {code} // getContainerStatus can be called after stopContainer try { ContainerStatus status = nmClient.getContainerStatus( container.getId(), container.getNodeId(), container.getContainerToken()); assertEquals(container.getId(), status.getContainerId()); assertEquals(ContainerState.RUNNING, status.getState()); assertTrue( + i, status.getDiagnostics().contains( Container killed by the ApplicationMaster.)); assertEquals(-1000, status.getExitStatus()); } catch (YarnRemoteException e) { fail(Exception is not expected); } {code} NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one. There will be the similar problem wrt NMClientImpl#startContainer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-737) ContainerManagerImpl can directly throw NMNotYetReadyException and InvalidContainerException after YARN-142
[ https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671707#comment-13671707 ] Jian He commented on YARN-737: -- bq.Why are we throwing RuntimeException instead of Exception? any reason? Theses two exceptions actually extend from YarnRemoteException which extend from Exception, what do you mean? ContainerManagerImpl can directly throw NMNotYetReadyException and InvalidContainerException after YARN-142 Key: YARN-737 URL: https://issues.apache.org/jira/browse/YARN-737 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-737.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671715#comment-13671715 ] Trevor Lorimer commented on YARN-696: - Patch submitted, built against trunk and unit test included. Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Priority: Trivial Attachments: 0001-YARN-696.patch Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-733) TestNMClient fails occasionally
[ https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671712#comment-13671712 ] Omkar Vinit Joshi commented on YARN-733: [~zjshen] small nit bq. may still need some time to make the container actually started or stopped because of its asynchronous may still need some time to either start or stop the container because of its asynchronous I hope we are not doing getContainerStatus after Application is finished in which case we won't have tokens at NM side for authentication. TestNMClient fails occasionally --- Key: YARN-733 URL: https://issues.apache.org/jira/browse/YARN-733 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-733.1.patch The problem happens at: {code} // getContainerStatus can be called after stopContainer try { ContainerStatus status = nmClient.getContainerStatus( container.getId(), container.getNodeId(), container.getContainerToken()); assertEquals(container.getId(), status.getContainerId()); assertEquals(ContainerState.RUNNING, status.getState()); assertTrue( + i, status.getDiagnostics().contains( Container killed by the ApplicationMaster.)); assertEquals(-1000, status.getExitStatus()); } catch (YarnRemoteException e) { fail(Exception is not expected); } {code} NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one. There will be the similar problem wrt NMClientImpl#startContainer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671738#comment-13671738 ] Sandy Ryza commented on YARN-326: - Attached a rebased patch Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, YARN-326-6.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable
[ https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671739#comment-13671739 ] Siddharth Seth commented on YARN-735: - bq. with a local copy, getAppId can save from every time calling covertFromPrortoFormat I think Fair enough. Can the setter be simplified a bit. After a build(), can we unset the builder and disallow any subsequent sets. Also, in ApplicationId. The applicationId / appAttemptId instances could be initialized as part of the proto constructor itself - which would allow simpler getter code. Make ApplicationAttemptID, ContainerID, NodeID immutable Key: YARN-735 URL: https://issues.apache.org/jira/browse/YARN-735 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-735.1.patch, YARN-735.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-710) Add to ser/deser methods to RecordFactory
[ https://issues.apache.org/jira/browse/YARN-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671758#comment-13671758 ] Siddharth Seth commented on YARN-710: - bq. On the 3rd one, that would require assuming that I could remove 'PBImpl' from the class, add 'Proto' and then I get the proto implementation. Plus package swithing. While the 'PBImpl' postfix is used as a convention already, the 'Proto' and the package are not used as convention, thus I'd leave it as it is. Is it possible to just use the return type on the getProto method, instead of creating an instance ? (Message message = getProto(newRecordInstance(clazz));) Otherwise, patch looks good to me. Add to ser/deser methods to RecordFactory - Key: YARN-710 URL: https://issues.apache.org/jira/browse/YARN-710 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-710.patch, YARN-710.patch I order to do things like AMs failover and checkpointing I need to serialize app IDs, app attempt IDs, containers and/or IDs, resource requests, etc. Because we are wrapping/hiding the PB implementation from the APIs, we are hiding the built in PB ser/deser capabilities. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671762#comment-13671762 ] Hadoop QA commented on YARN-326: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585656/YARN-326-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1053//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1053//console This message is automatically generated. Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, YARN-326-6.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-743) AMRMClient should have a separate setProgress instead of sending progress as part of allocate
Vinod Kumar Vavilapalli created YARN-743: Summary: AMRMClient should have a separate setProgress instead of sending progress as part of allocate Key: YARN-743 URL: https://issues.apache.org/jira/browse/YARN-743 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Progress updates are independent of allocations and so should be set explicitly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671805#comment-13671805 ] Alejandro Abdelnur commented on YARN-326: - LGTM, 3 nits: * the parseResourceConfigValue(String v) should require ###mb and ###vcores present in the config, else fail. As currently the config is only a ### (for mb), we have to mark this as an incompat change. * the DRFPoliciy#compare() does not need to do a Math.signum(s1.start - s2.start), it can be just s1.start - s2.start. * the DRFPoliciy#compare() should not use the name of the job to determine order, in the unlikely care 2 jobs are started at the same sime, the return should be zero. Things should work just fine. Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, YARN-326-6.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable
[ https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-735: - Attachment: YARN-735.2.patch New patch fixed above comments Make ApplicationAttemptID, ContainerID, NodeID immutable Key: YARN-735 URL: https://issues.apache.org/jira/browse/YARN-735 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-735.1.patch, YARN-735.2.patch, YARN-735.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-694) AM uses the AMNMToken to authenticate all communication with NM. NM remembers and updates token across RM restart
[ https://issues.apache.org/jira/browse/YARN-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671819#comment-13671819 ] Omkar Vinit Joshi commented on YARN-694: Need to fix NodeId check as a part of this at NM side. (YARN-739) AM uses the AMNMToken to authenticate all communication with NM. NM remembers and updates token across RM restart - Key: YARN-694 URL: https://issues.apache.org/jira/browse/YARN-694 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi AM uses the AMNMToken to authenticate all the AM-NM communication. NM will validate AMNMToken in below manner * If AMNMToken is using current or previous master key then the AMNMToken is valid. In this case it will update its cache with this key corresponding to appId. * If AMNMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this. * If AMNMToken is invalid then NM will reject AM calls. Modification for ContainerToken * At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with AMNMToken. Also now onwards AM will use AMNMToken per NM (replacing earlier behavior of ContainerToken per container per NM). * startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container). * ContainerToken will exist and it will only be used to validate the AM's container start request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-694) AM uses the NMToken to authenticate all communication with NM. NM remembers and updates token across RM restart
[ https://issues.apache.org/jira/browse/YARN-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-694: --- Summary: AM uses the NMToken to authenticate all communication with NM. NM remembers and updates token across RM restart (was: AM uses the AMNMToken to authenticate all communication with NM. NM remembers and updates token across RM restart) AM uses the NMToken to authenticate all communication with NM. NM remembers and updates token across RM restart --- Key: YARN-694 URL: https://issues.apache.org/jira/browse/YARN-694 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi AM uses the AMNMToken to authenticate all the AM-NM communication. NM will validate AMNMToken in below manner * If AMNMToken is using current or previous master key then the AMNMToken is valid. In this case it will update its cache with this key corresponding to appId. * If AMNMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this. * If AMNMToken is invalid then NM will reject AM calls. Modification for ContainerToken * At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with AMNMToken. Also now onwards AM will use AMNMToken per NM (replacing earlier behavior of ContainerToken per container per NM). * startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container). * ContainerToken will exist and it will only be used to validate the AM's container start request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-694) AM uses the NMToken to authenticate all communication with NM. NM remembers and updates token across RM restart
[ https://issues.apache.org/jira/browse/YARN-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-694: --- Description: AM uses the NMToken to authenticate all the AM-NM communication. NM will validate NMToken in below manner * If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId. * If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this. * If NMToken is invalid then NM will reject AM calls. Modification for ContainerToken * At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM). * startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container). * ContainerToken will exist and it will only be used to validate the AM's container start request. was: AM uses the AMNMToken to authenticate all the AM-NM communication. NM will validate AMNMToken in below manner * If AMNMToken is using current or previous master key then the AMNMToken is valid. In this case it will update its cache with this key corresponding to appId. * If AMNMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this. * If AMNMToken is invalid then NM will reject AM calls. Modification for ContainerToken * At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with AMNMToken. Also now onwards AM will use AMNMToken per NM (replacing earlier behavior of ContainerToken per container per NM). * startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container). * ContainerToken will exist and it will only be used to validate the AM's container start request. AM uses the NMToken to authenticate all communication with NM. NM remembers and updates token across RM restart --- Key: YARN-694 URL: https://issues.apache.org/jira/browse/YARN-694 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi AM uses the NMToken to authenticate all the AM-NM communication. NM will validate NMToken in below manner * If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId. * If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this. * If NMToken is invalid then NM will reject AM calls. Modification for ContainerToken * At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM). * startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container). * ContainerToken will exist and it will only be used to validate the AM's container start request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable
[ https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671828#comment-13671828 ] Hadoop QA commented on YARN-735: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585665/YARN-735.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 41 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1054//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1054//console This message is automatically generated. Make ApplicationAttemptID, ContainerID, NodeID immutable Key: YARN-735 URL: https://issues.apache.org/jira/browse/YARN-735 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-735.1.patch, YARN-735.2.patch, YARN-735.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671833#comment-13671833 ] Hadoop QA commented on YARN-117: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585628/YARN-117-012.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1055//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1055//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1055//console This message is automatically generated. Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-007.patch, YARN-117-008.patch, YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, YARN-117-012.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner
[jira] [Assigned] (YARN-720) container-log4j.properties should not refer to mapreduce properties
[ https://issues.apache.org/jira/browse/YARN-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-720: Assignee: Zhijie Shen container-log4j.properties should not refer to mapreduce properties --- Key: YARN-720 URL: https://issues.apache.org/jira/browse/YARN-720 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Zhijie Shen This refers to yarn.app.mapreduce.container.log.dir and yarn.app.mapreduce.container.log.filesize. This should either be moved into the MR codebase. Alternately the parameters should be renamed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-326: Labels: incompatible (was: ) Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: incompatible Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, YARN-326-6.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-744) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha moved MAPREDUCE-3899 to YARN-744: Component/s: (was: resourcemanager) (was: mrv2) resourcemanager Fix Version/s: (was: 0.23.0) Affects Version/s: (was: 0.23.0) Key: YARN-744 (was: MAPREDUCE-3899) Project: Hadoop YARN (was: Hadoop Map/Reduce) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request) --- Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: MAPREDUCE-3899-branch-0.23.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-326: Attachment: YARN-326-7.patch Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: incompatible Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, YARN-326-6.patch, YARN-326-7.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671862#comment-13671862 ] Sandy Ryza commented on YARN-326: - Uploaded a patch that addresses Alejandro's comments Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: incompatible Attachments: FairSchedulerDRFDesignDoc-1.pdf, FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, YARN-326-6.patch, YARN-326-7.patch, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.3.patch CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it) # overall amount of
[jira] [Commented] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable
[ https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671909#comment-13671909 ] Siddharth Seth commented on YARN-735: - +1. Committing this. Thanks Jian. Make ApplicationAttemptID, ContainerID, NodeID immutable Key: YARN-735 URL: https://issues.apache.org/jira/browse/YARN-735 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-735.1.patch, YARN-735.2.patch, YARN-735.2.patch, YARN-735.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671916#comment-13671916 ] Hadoop QA commented on YARN-569: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585682/YARN-569.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1058//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1058//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1058//console This message is automatically generated. CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.patch, YARN-569.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues
[jira] [Resolved] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved YARN-528. - Resolution: Fixed Fix Version/s: 2.1.0-beta Closing since all the sub-jiras are done. Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Assignee: Siddharth Seth Fix For: 2.1.0-beta Attachments: y528_AppIdPart_01_Refactor.txt, y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reopened YARN-528: -- Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Assignee: Siddharth Seth Fix For: 2.1.0-beta Attachments: y528_AppIdPart_01_Refactor.txt, y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-528. -- Resolution: Duplicate Fix Version/s: (was: 2.1.0-beta) Make IDs read only -- Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Assignee: Siddharth Seth Attachments: y528_AppIdPart_01_Refactor.txt, y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable
[ https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671926#comment-13671926 ] Hudson commented on YARN-735: - Integrated in Hadoop-trunk-Commit #3824 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3824/]) YARN-735. Make ApplicationAttemptId, ContainerId and NodeId immutable. Contributed by Jian He. (Revision 1488439) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1488439 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/local/LocalContainerAllocator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRAppBenchmark.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MockJobs.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRuntimeEstimators.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebApp.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServices.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServicesAttempts.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServicesJobConf.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServicesJobs.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServicesTasks.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHSWebApp.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServices.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesAttempts.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobConf.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobs.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesTasks.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationAttemptIdPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationIdPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerIdPBImpl.java *
[jira] [Commented] (YARN-717) Copy BuilderUtil methods into token-related records
[ https://issues.apache.org/jira/browse/YARN-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671950#comment-13671950 ] Hadoop QA commented on YARN-717: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585689/YARN-717.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 28 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 4 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1060//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1060//console This message is automatically generated. Copy BuilderUtil methods into token-related records --- Key: YARN-717 URL: https://issues.apache.org/jira/browse/YARN-717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-717.1.patch, YARN-717.2.patch, YARN-717.3.patch This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA. We may remove the ClientToken/ContainerToken/DelegationToken interface and just use the common Token interface -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671954#comment-13671954 ] Vinod Kumar Vavilapalli commented on YARN-713: -- That, and we should do a sweep of YARN RM (and NM too?) to see where else we depend on DNS.. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Priority: Critical Attachments: YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-733) TestNMClient fails occasionally
[ https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-733: - Attachment: YARN-733.2.patch Fix the javadoc and refactor the test. Thank Omkar and Vinod for your review. TestNMClient fails occasionally --- Key: YARN-733 URL: https://issues.apache.org/jira/browse/YARN-733 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-733.1.patch, YARN-733.2.patch The problem happens at: {code} // getContainerStatus can be called after stopContainer try { ContainerStatus status = nmClient.getContainerStatus( container.getId(), container.getNodeId(), container.getContainerToken()); assertEquals(container.getId(), status.getContainerId()); assertEquals(ContainerState.RUNNING, status.getState()); assertTrue( + i, status.getDiagnostics().contains( Container killed by the ApplicationMaster.)); assertEquals(-1000, status.getExitStatus()); } catch (YarnRemoteException e) { fail(Exception is not expected); } {code} NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one. There will be the similar problem wrt NMClientImpl#startContainer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-733) TestNMClient fails occasionally
[ https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671964#comment-13671964 ] Hadoop QA commented on YARN-733: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585696/YARN-733.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1061//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1061//console This message is automatically generated. TestNMClient fails occasionally --- Key: YARN-733 URL: https://issues.apache.org/jira/browse/YARN-733 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-733.1.patch, YARN-733.2.patch The problem happens at: {code} // getContainerStatus can be called after stopContainer try { ContainerStatus status = nmClient.getContainerStatus( container.getId(), container.getNodeId(), container.getContainerToken()); assertEquals(container.getId(), status.getContainerId()); assertEquals(ContainerState.RUNNING, status.getState()); assertTrue( + i, status.getDiagnostics().contains( Container killed by the ApplicationMaster.)); assertEquals(-1000, status.getExitStatus()); } catch (YarnRemoteException e) { fail(Exception is not expected); } {code} NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one. There will be the similar problem wrt NMClientImpl#startContainer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated YARN-713: - Attachment: YARN-713.patch In the attached patch, the exception is handled in RMContainerTokenSecretManager#createContainerToken by returning null. The null values are supposed to trigger a try, as in FifoScheduler#assignContainer: {code:java} if (containerToken == null) { return i; // Try again later. } {code} Regarding the sweep of RM to find other places that a DNS failure should be handled properly, I guess a cleaner approach is to directly throw UnknownHostException instead of hiding it in a InvalidArgumentException, which is also semantically confusing. This however would result in widespread changes allover the project, as each user of SecurityUtil must either handle the exception or declare it to be caught by its callers. If this approach is fine with you guys, I can give it a go. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Priority: Critical Attachments: YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira