[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618417#comment-13618417 ] Hadoop QA commented on YARN-309: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576294/YARN-309-20130331.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/632//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/632//console This message is automatically generated. Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618430#comment-13618430 ] Vinod Kumar Vavilapalli commented on YARN-475: -- +1 this looks good. No tests needed. Checking this in. Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-475.1.patch AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-157) The option shell_command and shell_script have conflict
[ https://issues.apache.org/jira/browse/YARN-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-157: Assignee: rainy Yu Rainy, assigning this to you.. The option shell_command and shell_script have conflict --- Key: YARN-157 URL: https://issues.apache.org/jira/browse/YARN-157 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.0.1-alpha Reporter: Li Ming Assignee: rainy Yu Labels: patch Attachments: hadoop_yarn.patch The DistributedShell has an option shell_script to let user specify a shell script which will be executed in containers. But the issue is that the shell_command option is a must, so if both options are set, then every container executor will end with exitCode=1. This is because DistributedShell executes the shell_command and shell_script together. For example, if shell_command is 'date' then the final command to be executed in container is date `ExecShellScript.sh`, so the date command will treat the result of ExecShellScript.sh as its parameter, then there will be an error. To solve this, the DistributedShell should not use the value of shell_command option when the shell_script option is set, and the shell_command option also should not be mandatory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-412: --- Assignee: Roger Hoover (was: Arun C Murthy) FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch, YARN-412.patch, YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618433#comment-13618433 ] Arun C Murthy commented on YARN-412: [~theduderog] - No, you deserve credit for finding and fixing this. Thanks! FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Assignee: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch, YARN-412.patch, YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-291) Dynamic resource configuration on NM
[ https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618443#comment-13618443 ] Arun C Murthy commented on YARN-291: I commented on one of the sub-tasks, but I'm not comfortable with JMX based tricks. We fundamentally need an explicit manner to inform RM of changes to a node and that needs to flow down to schedulers etc. Dynamic resource configuration on NM Key: YARN-291 URL: https://issues.apache.org/jira/browse/YARN-291 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: Elastic Resources for YARN-v0.2.pdf, YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, YARN-291-all-v1.patch, YARN-291-core-HeartBeatAndScheduler-01.patch, YARN-291-JMXInterfaceOnNM-02.patch, YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, YARN-291-YARNClientCommandline-04.patch The current Hadoop YARN resource management logic assumes per node resource is static during the lifetime of the NM process. Allowing run-time configuration on per node resource will give us finer granularity of resource elasticity. This allows Hadoop workloads to coexist with other workloads on the same hardware efficiently, whether or not the environment is virtualized. About more background and design details, please refer: HADOOP-9165. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618444#comment-13618444 ] Hudson commented on YARN-475: - Integrated in Hadoop-trunk-Commit #3541 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3541/]) YARN-475. Remove a unused constant in the public API - ApplicationConstants.AM_APP_ATTEMPT_ID_ENV. Contributed by Hitesh Shah. (Revision 1463033) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463033 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Hitesh Shah Fix For: 2.0.5-beta Attachments: YARN-475.1.patch AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-447: --- Assignee: nemon lou applicationComparator improvement for CS Key: YARN-447 URL: https://issues.apache.org/jira/browse/YARN-447 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: nemon lou Assignee: nemon lou Priority: Minor Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch Now the compare code is : return a1.getApplicationId().getId() - a2.getApplicationId().getId(); Will be replaced with : return a1.getApplicationId().compareTo(a2.getApplicationId()); This will bring some benefits: 1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618447#comment-13618447 ] Arun C Murthy commented on YARN-447: +1, lgtm! applicationComparator improvement for CS Key: YARN-447 URL: https://issues.apache.org/jira/browse/YARN-447 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: nemon lou Assignee: nemon lou Priority: Minor Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch Now the compare code is : return a1.getApplicationId().getId() - a2.getApplicationId().getId(); Will be replaced with : return a1.getApplicationId().compareTo(a2.getApplicationId()); This will bring some benefits: 1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-444) Move special container exit codes from YarnConfiguration to API
[ https://issues.apache.org/jira/browse/YARN-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618470#comment-13618470 ] Bikas Saha commented on YARN-444: - Patch looks mostly good. I wonder why the plural has been used in the name of the class when thats not the common pattern. The comments in ContainerStatus.java mix the plural and singular. Move special container exit codes from YarnConfiguration to API --- Key: YARN-444 URL: https://issues.apache.org/jira/browse/YARN-444 Project: Hadoop YARN Issue Type: Sub-task Components: api, applications/distributed-shell Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-444.patch YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101. These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants. Per discussion on YARN-417, appmaster writers need to be able to provide special handling for them, so it might make sense to move these to their own user-facing class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618475#comment-13618475 ] Bikas Saha commented on YARN-392: - How about calling is disableLocalityRelaxation as thats what it is basically doing. When specified on a node it would mean do not relax locality to rack or ANY. Potentially, we could also say that when specified on a rack then it would mean do not relax locality to ANY. Do you think it could be used to specify either exact nodes or exact racks. In that case, we would need to check that if this flag is set then either nodes or racks but not both are specified. I am not sure how to make sense of 2 different asks to the RM (at the same priority) that say 1) allocate at specific node A (ie do not relax locality to rackA) 2) allocate at specific rack rackA (ie do not relax locality to ANY) where node A is contained in rackA Make it possible to schedule to specific nodes without dropping locality Key: YARN-392 URL: https://issues.apache.org/jira/browse/YARN-392 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Sandy Ryza Attachments: YARN-392-1.patch, YARN-392.patch Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618486#comment-13618486 ] Xuan Gong commented on YARN-309: 1.Fix the testcase org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater never finished error. Because of this, we will get binding error when we run the testcase org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot. 2.Add the NodeHeartBeatResponse.setNextHeartBeatInterval() for all the override function NodeHeartbeatResponse nodeHeartbeat() Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-309: --- Attachment: YARN-309.9.patch Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618487#comment-13618487 ] Bikas Saha commented on YARN-193: - This and others like it are back-incompatible but might be ok since we are still in alpha {code} - public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_CORES = 32; + public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 32; {code} It should be disabled. Same for other places. {code} +maximum allocation is disable./description {code} This and other places, a LOG in the catch would be good. Also, I am not warming up to the idea of having to put a try catch around every validate. {code} + // sanity check + try { +SchedulerUtils.validateResourceRequests(ask, +rScheduler.getMaximumResourceCapability()); + } catch (InvalidResourceRequestException e) { +RPCUtil.getRemoteException(e); + } {code} Incorrect log message. {code} +try { + SchedulerUtils.validateResourceRequest(amReq, + scheduler.getMaximumResourceCapability()); +} catch (InvalidResourceRequestException e) { + LOG.info(RM App submission failed in normalize AM Resource Request + + for application with id + applicationId + : + + e.getMessage()); {code} Also, in this method, why are we throwing an exception in the inner block and catching it in the outer block. Why is the inner try catch needed (instead of catching the exception in the outer catch)? On the same note, why can this validation not be done in ClientRMService just like its been done in ApplicationMasterService? That maintains symmetry and is easier to understand/correlate. It will also work when RMAppManager.handle() is not called synchronously from ClientRMService. Where are we testing that normalize is being set to the next higher multiple of min but not more than the max (for DRF case)? OR that checking against max is disabled by setting MAX allowed to -1. I am sorry if I have missed it. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618488#comment-13618488 ] Xuan Gong commented on YARN-309: Setting initial value for nextHeartBeatInterval back 1000L instead of 0L. Test the patch at single running cluster. Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618489#comment-13618489 ] Bikas Saha commented on YARN-382: - Given, YARN-193 is only fixing validation and this copying is still needed temporarily, can you please re-post a rebased patch with your original fix. Could you also please leave a comment saying this code can be removed once YARN-486 is completed since then there is no need to copy anything from Container to CLC. SchedulerUtils improve way normalizeRequest sets the resource capabilities -- Key: YARN-382 URL: https://issues.apache.org/jira/browse/YARN-382 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Zhijie Shen Attachments: YARN-382_1.patch, YARN-382_demo.patch In YARN-370, we changed it from setting the capability to directly setting memory and cores: -ask.setCapability(normalized); +ask.getCapability().setMemory(normalized.getMemory()); +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618491#comment-13618491 ] Bikas Saha commented on YARN-486: - Once this change is made there is no need to copy the amContainer.resource in ASC as done in YARN-382. Linking the jiras. Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land - Key: YARN-486 URL: https://issues.apache.org/jira/browse/YARN-486 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Bikas Saha Currently, id, resource request etc need to be copied over from Container to ContainerLaunchContext. This can be brittle. Also it leads to duplication of information (such as Resource from CLC and Resource from Container and Container.tokens). Sending Container directly to startContainer solves these problems. It also makes CLC clean by only having stuff in it that it set by the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618493#comment-13618493 ] Bikas Saha commented on YARN-193: - Also, do we really need to create a new Resource object every time we call normalize? This should be a different jira though. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618502#comment-13618502 ] Hadoop QA commented on YARN-309: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576303/YARN-309.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/633//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/633//console This message is automatically generated. Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-309: --- Attachment: YARN-309.10.patch Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.10.patch, YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618519#comment-13618519 ] Hadoop QA commented on YARN-309: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576309/YARN-309.10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/634//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/634//console This message is automatically generated. Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.10.patch, YARN-309.1.patch, YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-291) Dynamic resource configuration on NM
[ https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618522#comment-13618522 ] Luke Lu commented on YARN-291: -- Resource scheduling is fundamentally centralized at RM. The global resource view is currently bootstrapped via the node registration process, which is more of a historical artifact based on convenience, since the resource view can also be constructed directly on RM via an inventory database. It's a round about, inconvenient and inefficient way to (re)construct the resource view by modifying per node config explicitly and propagate partial views to RM, if you already have an inventory database. For a practical example: if you have OLTP workload (say HBase) sharing the same hardware with YARN and there is a load surge on HBase, we need to stop scheduling tasks/containers immediately on relevant (potentially all) nodes. The current patch (JMX is just used as a portable protocol for external management client to communicate with RM) can take effect immediately most efficiently. If we explicitly modify each nodemanager config and let the NM-RM protocol to propagate the change to RM, it would waste resource (CPU and network bandwidth) to contact (potentially) all the nodemanagers and cause unnecessary scheduling delays if the propagation is via regular heartbeat and/or DDoS the RM if (potentially) all the NMs need to re-register out-of-band. This is not about JMX based tricks. This is about changing global resource view directly where the scheduler is vs the Rube Goldbergish way of changing NM config individually and propagate changes to RM to reconstruct the resource view. IMO, the direct way is better because NM doesn't really care about what resource it really has. Dynamic resource configuration on NM Key: YARN-291 URL: https://issues.apache.org/jira/browse/YARN-291 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: Elastic Resources for YARN-v0.2.pdf, YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, YARN-291-all-v1.patch, YARN-291-core-HeartBeatAndScheduler-01.patch, YARN-291-JMXInterfaceOnNM-02.patch, YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, YARN-291-YARNClientCommandline-04.patch The current Hadoop YARN resource management logic assumes per node resource is static during the lifetime of the NM process. Allowing run-time configuration on per node resource will give us finer granularity of resource elasticity. This allows Hadoop workloads to coexist with other workloads on the same hardware efficiently, whether or not the environment is virtualized. About more background and design details, please refer: HADOOP-9165. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-457) Setting updated nodes from null to null causes NPE in AMResponsePBImpl
[ https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-457: - Attachment: YARN-457.patch Setting updated nodes from null to null causes NPE in AMResponsePBImpl -- Key: YARN-457 URL: https://issues.apache.org/jira/browse/YARN-457 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Priority: Minor Labels: Newbie Attachments: YARN-457.patch {code} if (updatedNodes == null) { this.updatedNodes.clear(); return; } {code} If updatedNodes is already null, a NullPointerException is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618525#comment-13618525 ] Luke Lu commented on YARN-18: - bq. I see a lot of factories and then no changes to actually support node-group.. YARN-18 is explicitly for making the topology pluggable based on previous review requests. So there is little node group specific changes, which are implemented in YARN-19. bq. we need to abstract the notion of topology in the Scheduler without having to make changes every time we need to add another layer in the topology. This is a goal of the JIRA. You should be able to add another layer without having to modify scheduler code after this patch is committed. I'm not sure that we could make arbitrary topology pluggable without scheduler change. However, it should work for common class of topologies, e.g., hierarchical network topology. Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira