[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117.5.patch Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init-start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners may wish to be informed of. When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. *Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface. h2. Service listener failures not handled Is this an error an error or not? Log and ignore may not be what is desired. *Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch clauses to the other state changes. h2. Support static listeners for all AbstractServices Add support to {{AbstractService}} that allow callers to register listeners
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631364#comment-13631364 ] Steve Loughran commented on YARN-117: - updated patch where {{TestNodeStatusUpdater.testNMConnectionToRM()}} should not fail irrespective of how long it takes for the NM's {{init()}} process to take. Until now the custom {{NodeStatusUpdater}} set its clock in the constructor, which was called during the NM's {{init()}}, but the test only set it's clock after {{init()}} and before {{start()}}. As a result, if the init took too long, the test would fail saying the RM took too long, when the delay logic was actually working. The fix is for the custom {{NodeStatusUpdater}} to set its {{waitStartTime}} in it's {{innerStart()}}, so only when the service is started. This has a narrower gap between the tests's measured start time and the updater -though the time to start a couple of services before the updater is started could still be troublesome. Even so, retaining the time measurement logic in the test (rather than just probing the Updater to verify it triggered) is essential to be confident that {{NodeManager().start()}} doesn't complete until the connection has been established Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init-start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631369#comment-13631369 ] Hadoop QA commented on YARN-117: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578645/YARN-117.5.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/733//console This message is automatically generated. Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init-start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners may wish to be informed of. When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. *Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface.
[jira] [Updated] (YARN-471) NM does not validate the resource capabilities before it registers with RM
[ https://issues.apache.org/jira/browse/YARN-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-471: - Attachment: YARN-471.1.patch NM does not validate the resource capabilities before it registers with RM -- Key: YARN-471 URL: https://issues.apache.org/jira/browse/YARN-471 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Labels: usability Attachments: YARN-471.1.patch Today, an NM could register with -1 memory and -1 cpu with the RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-471) NM does not validate the resource capabilities before it registers with RM
[ https://issues.apache.org/jira/browse/YARN-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned YARN-471: Assignee: Hitesh Shah NM does not validate the resource capabilities before it registers with RM -- Key: YARN-471 URL: https://issues.apache.org/jira/browse/YARN-471 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Hitesh Shah Labels: usability Attachments: YARN-471.1.patch Today, an NM could register with -1 memory and -1 cpu with the RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631386#comment-13631386 ] Hitesh Shah commented on YARN-541: -- @Krishna, could you provide more information: - What scheduler are you using? - Could you attach the application logs as well as the RM's logs. getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-523) Container localization failures aren't reported from NM to RM
[ https://issues.apache.org/jira/browse/YARN-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-523: - Assignee: Omkar Vinit Joshi Container localization failures aren't reported from NM to RM - Key: YARN-523 URL: https://issues.apache.org/jira/browse/YARN-523 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi This is mainly a pain on crashing AMs, but once we fix this, containers also can benefit - same fix for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-115) yarn commands shouldn't add m to the heapsize
[ https://issues.apache.org/jira/browse/YARN-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-115: - Labels: usability (was: ) yarn commands shouldn't add m to the heapsize --- Key: YARN-115 URL: https://issues.apache.org/jira/browse/YARN-115 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3 Reporter: Thomas Graves Labels: usability the yarn commands add m to the heapsize. This is unlike the hdfs side and the the old jt/tt used to do. JAVA_HEAP_MAX=-Xmx$YARN_RESOURCEMANAGER_HEAPSIZEm JAVA_HEAP_MAX=-Xmx$YARN_NODEMANAGER_HEAPSIZEm We should not be adding in the m and allow the user to specify units. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-358) bundle container classpath in temporary jar on all platforms, not just Windows
[ https://issues.apache.org/jira/browse/YARN-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631390#comment-13631390 ] Hitesh Shah commented on YARN-358: -- @Chris, are we just talking about the command line or does this affect environment variables too? Given that YARN can launch any kind of application ( C++/Java/script ), what are the areas of concern that need to be addressed for containers to launch correctly on windows? Should this be a YARN feature or is it better to hand this off to the application logic to handle correct launching of a container on a particular OS type? bundle container classpath in temporary jar on all platforms, not just Windows -- Key: YARN-358 URL: https://issues.apache.org/jira/browse/YARN-358 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: trunk-win Reporter: Chris Nauroth Currently, a Windows-specific code path bundles the classpath into a temporary jar with a manifest to work around command line length limitations. This code path does not need to be Windows-specific. We can use the same approach on all platforms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-471) NM does not validate the resource capabilities before it registers with RM
[ https://issues.apache.org/jira/browse/YARN-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631392#comment-13631392 ] Hadoop QA commented on YARN-471: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578646/YARN-471.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/734//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/734//console This message is automatically generated. NM does not validate the resource capabilities before it registers with RM -- Key: YARN-471 URL: https://issues.apache.org/jira/browse/YARN-471 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Hitesh Shah Labels: usability Attachments: YARN-471.1.patch Today, an NM could register with -1 memory and -1 cpu with the RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-386) [Umbrella] YARN API Changes
[ https://issues.apache.org/jira/browse/YARN-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-386: Summary: [Umbrella] YARN API Changes (was: [Umbrella] YARN API cleanup) [Umbrella] YARN API Changes --- Key: YARN-386 URL: https://issues.apache.org/jira/browse/YARN-386 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli This is the umbrella ticket to capture any and every API cleanup that we wish to do before YARN can be deemed beta/stable. Doing this API cleanup now and ASAP will help us escape the pain of supporting bad APIs in beta/stable releases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631399#comment-13631399 ] Bikas Saha commented on YARN-445: - Sounds like an enhancement in the NM API. Moving under YARN-386. Please unlink if that is not correct. I can see the usecase this seeks to solve. I am wondering what is the abstraction in the general case. That would help us to not change stuff for every similar use case. Keeping platform neutrality would be beneficial so that the usecases continue to work for non Java AM/tasks or on Windows. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.0.5-beta Reporter: Jason Lowe It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-445: Issue Type: Sub-task (was: New Feature) Parent: YARN-386 Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.5-beta Reporter: Jason Lowe It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631400#comment-13631400 ] Bikas Saha commented on YARN-45: My personal preference would be to not have an API that is not actionable. If the RM is not having any support for ResourceRequest scenarios then we can leave that out for later when such support does arise. Having something out there that does not work may lead to misunderstanding and confusion on the part of YARN app developers. Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Attachments: YARN-45.patch, YARN-45.patch The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira