[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650846#comment-13650846 ] Hudson commented on YARN-45: Integrated in Hadoop-Mapreduce-trunk #1418 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1418/]) YARN-45. Add protocol for schedulers to request containers back from ApplicationMasters. Contributed by Carlo Curino and Chris Douglas. (Revision 1479771) Result = SUCCESS cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1479771 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionContract.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionMessage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionResourceRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StrictPreemptionContract.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionContainerPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionContractPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionMessagePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionResourceRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StrictPreemptionContractPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestAMRMClientAsync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Fix For: 2.0.5-beta > > Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650791#comment-13650791 ] Hudson commented on YARN-45: Integrated in Hadoop-Hdfs-trunk #1391 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1391/]) YARN-45. Add protocol for schedulers to request containers back from ApplicationMasters. Contributed by Carlo Curino and Chris Douglas. (Revision 1479771) Result = FAILURE cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1479771 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionContract.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionMessage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionResourceRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StrictPreemptionContract.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionContainerPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionContractPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionMessagePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionResourceRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StrictPreemptionContractPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestAMRMClientAsync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Fix For: 2.0.5-beta > > Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650698#comment-13650698 ] Hudson commented on YARN-45: Integrated in Hadoop-Yarn-trunk #202 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/202/]) YARN-45. Add protocol for schedulers to request containers back from ApplicationMasters. Contributed by Carlo Curino and Chris Douglas. (Revision 1479771) Result = SUCCESS cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1479771 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionContract.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionMessage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionResourceRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StrictPreemptionContract.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionContainerPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionContractPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionMessagePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionResourceRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StrictPreemptionContractPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestAMRMClientAsync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Fix For: 2.0.5-beta > > Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650568#comment-13650568 ] Hudson commented on YARN-45: Integrated in Hadoop-trunk-Commit #3713 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3713/]) YARN-45. Add protocol for schedulers to request containers back from ApplicationMasters. Contributed by Carlo Curino and Chris Douglas. (Revision 1479771) Result = SUCCESS cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1479771 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionContract.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionMessage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/PreemptionResourceRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StrictPreemptionContract.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionContainerPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionContractPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionMessagePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/PreemptionResourceRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StrictPreemptionContractPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestAMRMClientAsync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Fix For: 2.0.5-beta > > Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650545#comment-13650545 ] Hadoop QA commented on YARN-45: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582045/YARN-45.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/881//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/881//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650532#comment-13650532 ] Bikas Saha commented on YARN-45: If you dont mind I will try to take a pass tomorrow morning at making some inline edits to the patch. Dont stop for me. I can always do it after the initial commit. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650527#comment-13650527 ] Carlo Curino commented on YARN-45: -- bq. Would be great if you could add a version number to your patches. Sorry, we weren't sure of the current convention. {quote} - PreemptionMessage.strict should perhaps be named strictContract explicitly. You did name the setters and the getters verbosely which is good. - You should mark all the api getters and setters to be synchronized. There are similar locking bugs in other existing records too but we are tracking them elsewhere. - PreemptionContainer.getId() - Javadoc should refer to containers instead of Resource? - PreemptionContract.getContainers() - Javadoc referring to "ResourceManager may also include a @link PreemptionContract that, if satisfied, may replace these" doesn't make sense to me. {quote} Fixed all of these; last one was a copy/paste of an older version of the code. Thanks for catching these. [~bikassaha]: we took another attempt at the javadoc, but it's probably still not sufficient. We opened YARN-XXX to track documentation of this feature in the AM how-to, which we'll address presently. (thanks everyone for the great feedback!) > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650246#comment-13650246 ] Bikas Saha commented on YARN-45: Overall, the approach looks good. Would be great if you could add a version number to your patches. The javadoc is trying to help by giving more information. However, if I think from the perspective of someone who doesnt understand YARN, RM, scheduling and preemption, the javadoc would be hard to understand. Can we re-write this wrt perspective of the user of the API. How are they supposed to interpret this data. What needs to be done by them. {code} + /** + * Get the description of containers owned by the AM, but requested back by + * the cluster. Note that the RM may have an inconsistent view of the + * resources owned by the AM. The AM may elect to ignore some or all requests. + * + * The message is a snapshot of the resources the RM wants back from the AM. + * While demand persists, the RM will sustain its ask. Resources requested + * consistently over some duration may be forcibly killed by the RM. {code} In general, the javadocs and class names are a little hard for me to understand, but it just might be me :) > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648986#comment-13648986 ] Carlo Curino commented on YARN-45: -- Based on all the feedback here, including discussions with [~acmurthy], [~vinodkv], [~bikassaha], [~hitesh], [~sseth], and [~tucu00], we propose the following message be added to the {{AllocateResponse}} (pseudo): {noformat} PreemptionMessage { StrictPreemptionContract { Set containers } strict; PreemptionContract { Set containers List resources } contract; } message {noformat} This has some advantages over the previous design: # By adding {{PreemptionContainer}} and {{PreemptionResourceRequest}} (wrappers of {{ContainerId}} and {{ResourceRequest}} respectively) we can add attributes to each item later on, without breaking the protocol (e.g., [~sandyr]'s earlier suggestion of time). # By separating strict and non-strict contracts, the RM can pull back specific containers or give the AM flexibility in satisfying the contract. It also allows the RM to simultaneously and unambiguously include requests with both constraints # By including the list of containers in the {{PreemptionContract}} together with the resources, AMs have a slightly more restricted search space when compared to "match all the resources that _might_ be killed, determine preferences among them". Thus, simpler AMs can mostly ignore the interpretation of {{ResourceRequest}} and just follow the RM hint. We're updating YARN-567, YARN-568, and YARN-569 to accommodate these changes, in addition to the rest of the downstream patches. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644675#comment-13644675 ] Chris Douglas commented on YARN-45: --- bq. we could express the ResourceRequest as a multiple of the minimum allocation +1 This is better > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644664#comment-13644664 ] Carlo Curino commented on YARN-45: -- [~acmurthy] I see your point, which was in fact reflected more clearly in our initial proposal. The only caveat is not to make this a capacity-only protocol (which you are not, but I wanted to reiterate that there are other use cases). I like [~bikassaha] and [~chris.douglas] spin on it (i.e., using ResourceRequest), as it gives us the immediate "capacity angle", but will eventually allow to evolve the implementations towards something richer (e.g., the preempt on behalf of a specific request that Bikas considered before) without impact to the protocols. I think there is a slightly cleaner version of Chris's proposal: use ResourceRequest and to represent a request that only cares about overall capacity we could express the ResourceRequest as a multiple of the minimum allocation (i.e., if we want 100GB of RAM back and min_container size is 1GB we ask for 100 x 1GB containers). This achieves Chris's proposal with a slightly prettier use of ResourceRequest. Note that there are size-matching issues (e.g., you have 1.5GB containers and I ask for 1x1GB containers, but we have very similar problems with Resource). I would say that as Chris pointed out [these semantics | https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] plus the use of ResourceRequest I propose here as a minor variation on Chris's take should cover Arun's and Bika's comments (and I believe also the prior 45+ messages). Thoughts? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644626#comment-13644626 ] Chris Douglas commented on YARN-45: --- I'm also a fan of {{ResourceRequest}}, but we're not really using all its features, yet. Similarly, {{Resource}} bakes in the fungibility of resources, which could be awkward as the RM accommodates richer requests (as in YARN-392). We could use {{ResourceRequest}}- so the API is there for extensions- but only populate the capability as an aggregate. With the convention that "\-1 containers" can mean "packed as you see fit," it expresses {{Resource}} (which we need in practice, since the priorities for requests don't always [match the preemption order|https://issues.apache.org/jira/browse/YARN-569?focusedCommentId=13638825&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13638825]), which is sufficient for the current schedulers. If we're adding the contract back with the set of containers, the [semantics|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] we discussed earlier still seem OK. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644327#comment-13644327 ] Bikas Saha commented on YARN-45: My understanding is the the containers being presented in PreemptionMessage are going to be preempted by the RM some time in the near future if the RM cannot find free resources elsewhere. The AM's are not supposed to preempt the containers but they are encourage to checkpoint and save work. The RM can always choose to not preempt these containers and so it would be sub-optimal for the AM to kill these containers. If we want to add additional information besides the set of containers-to-be-preempted then I would prefer ResourceRequest (like it was in the original patch) and not Resource. Not only is that symmetric but also allows the RM to provide additional information about where to free containers. A smarter RM could potentially ask for resources to be preempted where the under-allocated job wants it and a smart AM could help out by choosing containers close to the desired locations. Secondly, Resource is too amorphous by itself. Asking an AM to free 50GB does not tell it whether the RM needs 10*5 or 50*1. Without that information the AM can end up freeing containers in a manner that does not help the RM to meet the request of the under-allocated job, thus failing to meet quota and wasting work at the same time. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643473#comment-13643473 ] Hadoop QA commented on YARN-45: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580806/YARN-45.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 24 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/831//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/831//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643461#comment-13643461 ] Carlo Curino commented on YARN-45: -- Reposting the patch with included BuilderUtils changes per [~vinodkv] request. (missed them in previous diff). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643458#comment-13643458 ] Hadoop QA commented on YARN-45: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580805/YARN-45_summary_of_alternatives.pdf against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/830//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643444#comment-13643444 ] Carlo Curino commented on YARN-45: -- We modified the patch to account for the most recent round of comments from [~tgraves], [~kkambatl], [~vinodkv] and [~bikassaha]. In particular: # various javadoc fixes (including @link notation) # PreemptMessage -> PreemptionMessage # set/get version of PreemptionMessage (and propagate through the depending patches) # clarified in AllocateResponse javadoc that PreemptMessage could be repeated over time. [~bikassaha], you are right it is likely that there will be repetitions in the ask over time (4 above). In fact, by design the RM will "sustain" its asks until either: 1) the need for those resources is gone, 2) the containers are released (natural or AM-initiated completion), or 3) a timeout expires and the RM force-kill the containers. The possible overlap among subsequent messages is not a big concern on the AM side given our choice of a Set based PreemptionMessage. Duplicates are trivial to detect, and/or the AM can simply implement preemption in an idempotent way (which is what we do in our mapreduce solution). Regarding time, in the basic implementation we have for mapreduce, the AM does not attempt complex speculations on when to preempt, it simply acts on the requests in a idempotent way as soon as they are received (this also maximizes the chance to complete a checkpoint before being killed). In our design we pushed this in an AMPreemptionPolicy, so you can easily imagine more advanced policies to track containers over-time and speculate on when is best to preempt. Adding more sophisticated "timing" information in the protocol is also something I can see being an interesting addition, but I would want to spend some more time (no pun intended) working with it, before proposing a public protocol change---again we mention this in the attached summary document. We get into a bit more details in the attached document, which reports as [~vinodkv] asked a summary of the conversations around various alternatives using resource-based specification, and about adding time. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643373#comment-13643373 ] Bikas Saha commented on YARN-45: I like PreemptionMessage or PreemptionNotification. The patch mostly looks good and I agree with Vinod's commnents on a get and set. I am assuming that a container will show up repeatedly in AllocationResponse until it is either preempted or removed from the preemption list. The javadoc is not clear about this. At this point, I wonder how the client/app figures out the time to preempt left for the container. How does it differentiate between containers that are new additions to that list vs older ones. It could maintain its cache of when it first saw a container. Or the time to preempt could be passed along by the RM. Does it matter. Is it needed in any scenario? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643065#comment-13643065 ] Karthik Kambatla commented on YARN-45: -- IIUC, the compiler and IDEs understand @link: for instance, renaming the referenced class/method in Eclipse updates the link as well. I haven't used before and not sure if the same applies to it. Probably not important: while digging around, one advice that I came across was to use @link for the first occurrence to create a hyperlink and @code for other instances to format text but not create a hyperlink. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643061#comment-13643061 ] Carlo Curino commented on YARN-45: -- About the choice between and @link there are almost the same number of files using each (about 8% lead for @link in the count of file using each, stronger lead in yarn). Any preference among the watchers? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643045#comment-13643045 ] Carlo Curino commented on YARN-45: -- Thanks for the feedback, we will make sure these comments get reflected in the final version. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642939#comment-13642939 ] Karthik Kambatla commented on YARN-45: -- Thanks Thomas. Those are the only comments I have. Otherwise, the code part looks good to me. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642889#comment-13642889 ] Karthik Kambatla commented on YARN-45: -- Also, for the javadoc, do we prefer blah or {@link blah}? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642886#comment-13642886 ] Thomas Graves commented on YARN-45: --- A couple of very nit picks that we might fix before commit that Karthik referred to are a few typos in the comments/javadoc. AllocateResponse - comment still references resources - "description of resources and containers" PreemptMessage - Grammar needs fixing - " * A PreemptMessage is part of the RM-AM protocol, and it is used by the RM + * specify resources that are the RM wants to reclaim from the AM." > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642163#comment-13642163 ] Chris Douglas commented on YARN-45: --- If everyone's OK with the current patch as a base, I'll commit it in the next couple days. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639595#comment-13639595 ] Karthik Kambatla commented on YARN-45: -- Barely skimmed through the patch, it looks good. Noticed a few javadoc typos we might like to fix. Will try to get in a more detailed review "soon". > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638814#comment-13638814 ] Hadoop QA commented on YARN-45: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12579975/YARN-45.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/803//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/803//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638783#comment-13638783 ] Carlo Curino commented on YARN-45: -- Updated the protocol patch (and the implementation for capacity scheduler), to reflect the discussion in the above comments (plus various offline conversations). The current proposal is a minimal protocol change and compact policies, which capture the portion of our initial proposal on which we reached reasonable consensus. The key change is the following: # Simplified the protocol modification to include only Set as vehicle to express preemption requests. # Modified ProportionalCapacityPreemptionPolicy to select containers by reversed priority, and within each priority by reversed container id (reflects order of allocation). # Simplified all the "pipes" in the RM that propagated decisions about preemption around (to not-include resource-based preemptions). The decision is based on the following rationale: There seems to be agreement on the fact that ResourceRequest -based preemption is appealing due to: symmetry, compactness, and the flexibility it provides to the AM. However, the declarative nature of the specification makes the "tracking" over time quite tricky. In particular, both RM and AMs must be capable of maintain some form of history of the resources being requested: # for the RM, consciously preempt containers only for the fraction of resources that have been consistently asked to the AM over time (a notion of ResourceRequest intersection should be defined), # for the AM, to track its own preemption actions, and know when they are received by the RM (this is needed to discount the RM requests while the task are being check pointed). With [~chris.douglas] we worked out a possible set of semantics for the above and started to work on a version of the ProportionalCapacityPreemptionPolicy that reflects those. While they seem reasonable are likely to generate longer (speculative) discussions. So following the spirit of [~acmurthy]'s last comment and after feedback from [~tucu00], [~bikassaha], [~vinodkv], [~sseth], [~hitesh] we propose Set as an initial strategy that will allow us to: # observe most of the benefits of preemption, # gain experience in running schedulers leveraging preemption. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632630#comment-13632630 ] Chris Douglas commented on YARN-45: --- bq. ResourceRequest is not actionable in the sense that neither of the schedulers can currently send a non-empty ResourceRequest to preempt. Both only do preemption by containers though they have some plumbing to send RR's if they want to do so. So I am not quite sure what you mean by "We indeed have code that exercises the ResourceRequest version of it". A prototype impl against MapReduce responds to {{ResourceRequest}} in the preempt message. We're currently polishing and splitting that up for review, but wanted to get consensus on the Yarn changes in case new requirements required reworking the rest. An RM impl that includes killing for {{ResourceRequest}} (or {{Resource}}) is a more invasive change, particularly because (a) the AM needs to reason about which recently finished containers are included in the message (i.e., it needs to reason about what the RM knows, so the RM needs to be consistent in what it tells the AM) and (b) the RM needs to track its previous preemption requests, timing them out in the context of existing allocations and exited containers (i.e., decisions to preempt need to incorporate subsequent information). To get experience before proposing anything drastic, we marked this API as experimental, wrote the enforcement policy against {{ContainerID}}, and tucked it behind a pluggable interface. This way, the AM can ignore stale requests for exited containers and the RM can time out particular containers it asked for easily; every computed preemption set is bound in a namespace that sidesteps the most disruptive impl issues on both sides. bq. By not using location we are implicitly using the "*" location right? Might as well make it explicit. Non * locations will make sense when affinity based preemptions occur. Yes, that's exactly the intent. The policy in YARN-569 doesn't attempt to bias the preemptions to match the requests in under-capacity queues, but that's a natural policy to implement against this protocol. {quote} The bare-minimum requirement seems: # RM should notify the AM that a certain amount of resources will need to be reclaimed (ala SIGTERM). # Thus, the AM gets an opportunity to *pick* which containers it will sacrifice to satisfy the RM's requirements. # Iff the AM doesn't act, the RM will go ahead and terminate some containers (probably the most-recently allocated ones); ala SIGKILL. Given the above, I feel that this is a set of changes we need to be conservative about - particularly since the really simple pre-emption i.e. SIGKILL alone on RM side is trivial (from an API perspective). {quote} Totally agreed. The symmetry of {{ResourceRequest}} in the ask-back is attractive, but it's not a sufficient condition. To it, I'd add all the familiar attributes of using them in allocation requests (economy, expressiveness, versatility). While {{Resource}} covers the current impl, it leaves little room for related improvements, or even refinements (e.g., preferring resources requested by under-capacity queues, prioritizing types of containers, and time). The API isn't that complex, but a strict implementation would change the RM more, adding risk. To mitigate that, but still encourage applications to write against the richer type while we get experience with it, [~curino]'s formulation [above|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] seems like a decent set of semantics... We could add a new type that encodes a subset of the {{ResourceRequest}} type. It lacks symmetry, but it also allows them to evolve independently. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release contain
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632571#comment-13632571 ] Arun C Murthy commented on YARN-45: --- Sorry, I've been away for a couple of weeks due to family reasons and I'm just catching up. The bare-minimum requirement seems: # RM should notify the AM that a certain amount of resources will need to be reclaimed (ala SIGTERM). # Thus, the AM gets an opportunity to *pick* which containers it will sacrifice to satisfy the RM's requirements. # Iff the AM doesn't act, the RM will go ahead and terminate some containers (probably the most-recently allocated ones); ala SIGKILL. Given the above, I feel that this is a set of changes we need to be conservative about - particularly since the really simple pre-emption i.e. SIGKILL alone on RM side is trivial (from an API perspective). Thus, I'm concerned about jumping into a complex preemption API (ResourceRequest etc.) without having sufficient experience i.e. doing this in the first iteration itself. I like [~tucu00]'s initial suggestion of: # Resource resourcesToReclaim # Optionally, a Set which the RM will preempt i.e. SIGKILL In fact, for the first iteration, Set is something we can avoid if the semantics are clear i.e. RM will preempt the most-recently allocated containers. Once we have sufficient experience with this, we can then dive deeper to think about further enhancements to the API by adding features (in a compatible manner for 2.x or 3.x). Thoughts? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632542#comment-13632542 ] Bikas Saha commented on YARN-45: I took a quick look at this patch and the others and from what I see ResourceRequest is not actionable in the sense that neither of the schedulers can currently send a non-empty ResourceRequest to preempt. Both only do preemption by containers though they have some plumbing to send RR's if they want to do so. So I am not quite sure what you mean by "We indeed have code that exercises the ResourceRequest version of it". Of course, I may have missed something. The following comment may change after a detailed review of the changes in this patch and other related patches. But as of now I agree with you that RR makes sense because essentially this request is symmetric. AM uses RR to RM for resources to schedule and RM uses RR to AM for resources to preempt. By not using location we are implicitly using the "*" location right? Might as well make it explicit. Non * locations will make sense when affinity based preemptions occur. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632531#comment-13632531 ] Alejandro Abdelnur commented on YARN-45: Got it, makes sense. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632511#comment-13632511 ] Carlo Curino commented on YARN-45: -- [~bikassaha] Sounds good, I totally agree with the general spirit. We indeed have code that exercises the ResourceRequest version of it, thus it is actionable (detailed question later). [~tucu00] bq. Wouldn't be enough just to return the ContainerIds and a flag indicating that the set is strict or not? The AM can reconstruct all the resources information if it needs to. I think it is important to have the "resource-based" version because if the RM wants a large number of containers back (e.g., 1000) and does not care which ones, it would be very wasteful to resolve them on the RM (extra code, extra compute-cost), send a long detailed list, and have the AM simply aggregate the resources ignoring the individuals in the list and return some other containers. [~bikassaha] so previously very broad question can be rephrased, based on [~tucu00] review of the patch, more tightly as follows: Which of the following options you prefer? # we reuse ResourceRequest of which we use number of containers and Resource for each container (and for now not use locality or priority, although we might in the future) # create a new type that carries *only* number of containers and Resource for each container Pros and cons of reusing existing types vs minimalistic approach which you were pointing out before. I don't have much of a preference (minor leaning towards 1, but either way is fine). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631400#comment-13631400 ] Bikas Saha commented on YARN-45: My personal preference would be to not have an API that is not actionable. If the RM is not having any support for ResourceRequest scenarios then we can leave that out for later when such support does arise. Having something out there that does not work may lead to misunderstanding and confusion on the part of YARN app developers. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630964#comment-13630964 ] Carlo Curino commented on YARN-45: -- [~tucu00] Care to elaborate which properties we don't care about? In general, I like the symmetry of using ResourceRequest, because it allows the RM to compactly and precisely express what resources it wants back. In particular it allows to: 1) list large number of containers compactly, 2) allows to express locality preferences, and 3) allows to express priority among multiple requests. While we don't make use of 3 quite yet it seems not bad to have. Arguably, at least portion of this information can indeed be reconstructed from a Set + a tag, but this might force the RM to do extra work, build a potentially large list of containers, just to have the AM undo all that. Symmetrically one can imagine to use only Set and if the RM wants exactly a container back, it can try to constraint the request so that only the desired container matches. In this case too it is easy to provide examples in which this might be awkward to use. We discussed with Chris this a fair bit, and it seems that the set of use cases which are important to cover are not quite fully served by Set alone nor by Set, hence the proposal including both. I would say this comes down almost to a "style" choice, we could build a protocol that is likely to accommodate most of the future uses we foresee now and try (likely to be more stable), or define a minimal protocol that covers just the first use case we are targeting (Set would be it in this case), and evolve it whenever needed. [~bikassaha] if I understand correctly you are driving the protocol overhaul, do you care do comment on this? As for get*Count() we included them to remain consistent with other messages in the yarn protcols which had equivalent methods for each list/set in the message, I am happy to drop them if you guys think is best. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630925#comment-13630925 ] Alejandro Abdelnur commented on YARN-45: Comments on the patch. * Reusing ResourceRequest means we have a bunch of properties that are not applicable to the preempt message. Wouldn't be enough just to return the ContainerIds and a flag indicating that the set is strict or not? The AM can reconstruct all the resources information if it needs to. *Do we need the get*Count() methods? You can get the size from the set itself, or am I missing something? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630898#comment-13630898 ] Carlo Curino commented on YARN-45: -- As you pointed out, any decision made in the RM needs to deal with an inconsistent and evolving view of the world, and the preemption actions suffer from an inherent and significant lag. In designing policies around this, one must embrace such chaos and operate conservatively and try to affect only macroscopic properties (hence the many built-in dampers Chris mentioned). As for what to do with the preemption requests, I think we are quite aligned with your comments in our current implementation for the mapreduce AM/Task. Here's what we do: 1) Maps are typically short-lived, so it is often worth ignoring the preemption request and try to "make a run for it", as checkpointing and completion times risk to be comparable, and re-execution costs are low. 2) For reducer, since the state is valuable and runtimes often longer, the AM asks the task to checkpoint. In our current implementation, once the state of the reducer has been saved to a checkpoint we exit, as continuing execution is non-trivial (in particular managing partial output of reducers). I can envision a future version that tries to continue running after having taken a checkpoint. Note that this (the task exiting) does not introduce any new race-condition/complexity in either RM or AM, as both already handle failing/killed tasks, and the AM even have logic to kill its own reducers to free up space for maps. More importantly, this setup (in which containers exit as soon as they are done checkpointing) allows us to set rather generous "wait-before-kill" parameters, since the containers will be reclaimed as soon as the task is done checkpointing anyway. The alternative would have the RM pick a static policy for waiting, which risks to be either too long (hence delaying by too much the rebalancing), or too short (which risks to interrupt containers while finishing the checkpointing thus wasting work). I expect that no static solution would fair well for a broad range of AMs and job sizes. 3) When the preemption takes the form of a ResourceRequest we pick reducers over maps (as having reducers running when the map are killed would simply lead to wasted slot time). Looking forward in Yarn's future this is a key feature as other applications might have evolving priorities for containers which are not exposed to the RM, hence we can't rely on the RM to guess which container is best to preempt, and delegating the choice to the AM could be invaluable. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630893#comment-13630893 ] Chris Douglas commented on YARN-45: --- [~sandyr]: Yes, but the correct format/semantics for time are a complex discussion in themselves. To keep this easy to review and the discussion focused, we were going to file that separately. But I totally agree: for the AM to respond intelligently, the time before it's forced to give up the container is valuable input. [~bikash]: Agree almost completely. In YARN-569, the hysteresis you cite motivated several design points, including multiple dampers on actions taken by the preemption policy, out-of-band observation/enforcement, and no effort to fine-tune particular allocations. The role of preemption (to summarize what [~curino] discussed in detail in the prenominate JIRA) is to make coarse corrections around the core scheduler invariants (e.g., capacity, fairness). Rather than introducing new races or complexity, one could argue that preemption is a dual of allocation in an inconsistent environment. Your proposal matches case (1) in the above [comment|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950], where the RM specifies the set of containers in jeopardy and a contract (as {{ResourceRequest}}) for avoiding the kills, should the AM have cause to pick different containers. Further, your observation that the RM has enough information in priorities, etc. to make an educated guess at those containers is spot-on. IIRC, the policy uses allocation order when selecting containers, but that should be a secondary key after priority. The disputed point, and I'm not sure we actually disagree, is the claim that the AM should never kill things in response to this message. To be fair, that can be implemented by just ignoring the requests, so it's orthogonal to this particular protocol, but it's certainly an important "best practice" to discuss to ensure we're capturing the right thing. Certainly there are many cases where ignoring the message is correct; most CDFs of map task execution time show that over 80% finish in less than a minute, so the AM has few reasons to pessimistically kill them. There are a few scenarios where this isn't optimal. Take the case of YARN-415, where the AM is billed cumulatively for cluster time. Assume an AM knows (a) the container will not finish (reinforcing [~sandyr]'s point about including time in the preemption message) and (b) the work done is not worth checkpointing. It can conclude that killing the container is in its best interest, because squatting on the resource could affect its ability to get containers in the future (or simply cost more). Moreover, for long-lived services and speculative container allocation/retention, the AM may actually be holding the container only as an optimization or for a future execution, so it could release it at low cost to itself. Finally, the time allowed before the RM starts killing containers can be extended if AMs typically return resources before the deadline. It's also a mechanism for the RM to advise the AM about constraints that prevent it from granting its pending requests. The AM currently kills reducers if it can't get containers to regenerate lost map output. If the scheduler values some containers more than others, the AM's response to starvation can be improved from random killing. This is a case where the current implementation acknowledges the fact that it already runs in an inconsistent environment. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630769#comment-13630769 ] Bikas Saha commented on YARN-45: I like the idea of the RM giving information to the AM about actions that it might take which will affect the AM. However, I am wary of having the action taken in different places. eg. the KILL to the containers should come from the RM or the AM exclusively but not from both. Otherwise we open ourselves up to race conditions, unnecessary kills and complex logic in the RM. Preemption is something that, IMO the RM needs to do at the very last moment when there is no other alternative of resource being freed up. If we decide to preempt at time T1 and then actually preempt at time T2 then the cluster conditions may have changed between T1 and T2 which may invalidate the decisions taken at T1. New resources may have freed up that reduce the number of containers to be killed. This sub-optimality is directly proportional to length of time between T1 and T2. So ideally we want to keep T1=T2. One can argue that things can change after the preemption which may have made the preemption unnecessary. So the above argument of T1=T2 is fallacious. However, preemption policies are usually based on deadlines such as the allocation of queue1 must be met within X seconds. So RM does not have the luxury of waiting for X+1 seconds. The best it can do is to wait upto X seconds in the hope that things will work out and at X redistribute resources to meet the deficit. At the same time, I can see that there is an argument that the AM knows best how to free up its resources. It will be good to remember that the AM has already informed the RM about the importance of all its containers when it made the requests at different priorities. So the RM knows the order of importance of the containers and the RM also knows the amount of time each container has been allocated. Assuming container runtime as a proxy for container work done, this data can be used by the RM to preempt in a work preserving manner without having to talk to the AM. Notifying the AM has the usefulness of allowing the AM to take actions that preserve work such as checkpointing. However, IMO, the AM should only do checkpointing operations but not kill the containers. That should still happen at the RM as the very last option at the last moment. If the situation changes in the grace period and the containers do not need to be killed then there is no point in the AM killing them right now. This also lets us increase the grace period to a longer time because checkpointing and preserving work usually means persisting data in a stable store and may be slow in practical scenarios. To summarize, I would propose an API in which the RM tells the AM about exactly which containers it might imminently preempt with the contract being that the AM could take actions to preserve the work done in those containers. The AM can continue to run those containers until the RM actually preempts them if needed. If we really think that the choice of containers needs to be made at the AM then the AM needs to checkpoint those containers and inform the RM about the containers it has chosen. But the final decision to send the kill must be sent by the RM. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630734#comment-13630734 ] Sandy Ryza commented on YARN-45: Carlo, I'm glad that this is being proposed. Have you considered including how long the grace period is in the response? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629806#comment-13629806 ] Carlo Curino commented on YARN-45: -- Note: we don't have tests as there are no tests for the rest of the protocolbuffer messages either (this would consist in validating mostly auto-generated code). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629707#comment-13629707 ] Hadoop QA commented on YARN-45: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578339/YARN-45.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/723//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/723//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629691#comment-13629691 ] Hadoop QA commented on YARN-45: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578337/YARN-45.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/722//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629662#comment-13629662 ] Bikas Saha commented on YARN-45: Moved to sub-task of YARN-397 for scheduler API changes. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629660#comment-13629660 ] Carlo Curino commented on YARN-45: -- [~kkambatl], yes ResourceRequests can be used to capture locality preferences. In our first use we focus on capacity, so the RM policies are not very picky/aware of location, but we think it is good to build this into the protocol for later use (as commented above somewhere). (As for the last comment: we moved YARN-567, YARN-568, YARN-569 that will use this protocol into YARN-397, while this one is probably part of YARN-386). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629638#comment-13629638 ] Karthik Kambatla commented on YARN-45: -- [~bikassaha], shouldn't this be under YARN-397? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629635#comment-13629635 ] Karthik Kambatla commented on YARN-45: -- Great discussion, glad to see this coming along well. Carlo's latest comment makes sense to me. Let me know if I understand it right: ResourceRequest part of the message can capture locality, the AM will try to give back Resources on each node as per this locality information? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629620#comment-13629620 ] Bikas Saha commented on YARN-45: All API changes at this point are being tracked under YARN-386 > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629070#comment-13629070 ] Alejandro Abdelnur commented on YARN-45: sounds good > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628950#comment-13628950 ] Carlo Curino commented on YARN-45: -- Agreed on a single message, where the semantics is: 1) if both Set and ResourceRequest are specified, than it is what said (they overlap and you have to give me back at least the resources I ask otherwise these containers are at risk to getting killed) 2) if only Set is specified is the "stricter" semantics of I want these containers back and nothing else. 3) if only ResourceRequest is specified the semantics is "please give me back this many resources" without binding what containers are at risk (this might be good for policies that do not want to think about containers unless it is really time to kill them). Does this work for you? Seems to capture the combination of what we proposed so far. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628938#comment-13628938 ] Alejandro Abdelnur commented on YARN-45: I'm just trying to see if we can have (at least for now) a single message type instead of two that satisfies the usecases. Regarding keeping the tighter semantics, if not difficult/complex, I'm OK with it. Thanks. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628930#comment-13628930 ] Carlo Curino commented on YARN-45: -- Sorry I read only your last comment and answered to that... Regarding your previous "larger" comment: - what you propose is somewhat of a combination of 1 and 2 above, where we give the AM a hint about what would happen at the container level if the pressure remains. I don't have strong feelings about it, I agree it is easy to do, and maybe is a good compromised. - however, I want to be able to maintain the tighter semantics of 1 (in case the ResourceRequest is not specified in the message), which forces the AM to preempt exactly the set of containers I am specifying. (now with very "targeted" ResourceRequest you can in practice do something similar). This covers use cases like the one I mentioned above. We are posting more code in YARN-567 YARN-568 and YARN-569, check it out, it might provide context for this conversation. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628922#comment-13628922 ] Carlo Curino commented on YARN-45: -- Our main focus for now is to rebalance capacity, in this sense yes location is not important. However, one can envision the use of preemption also for other things, e.g., to build a monitor that tries to improve data-locality by issuing (a moderate amount of) "relocations" of a container (probably riding the same checkpointing mechanics we are bulding for MR). This is another case where container-based preemption can turn out to be useful. (This is at the moment just a speculation). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628893#comment-13628893 ] Alejandro Abdelnur commented on YARN-45: Forgot to add, unless I'm missing something location of the preemption is not important, just capacity, right? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628889#comment-13628889 ] Alejandro Abdelnur commented on YARN-45: Carlo, what about a small twist? A preempt message (instead of request, as there is no preempt response) would contain: * Resources (# CPUs & # Memory) : total amount of resources that may be preempted if no action is taken by the AM. * Set : list of containers that would be killed by the RM to claim the resources if no action is taken by the AM. Computing the resources is straight forward, just aggregating the resources of the Set. An AM can take action using either or information. If an AM releases the requested amount of resources, even if they don't match the received container IDs, then the AM will not be over threshold anymore, thus getting rid of the preemption pressure fully or partially. If the AM fullfils the preemption only partially, then the RM will still kill some containers from the set. As the set is not ordered, still it is not known to the AM what containers will exactly be killed. So the set is just the list of containers in danger of being preempted. I may be backtracking a bit on my previous comments, 'trading these containers for equivalent ones' seems acceptable and gives the scheduler some freedom on how to best take care of things if an AM is over limit. If an AM releases the requested amount of resources, regardless of what containers releases, the AM won't be preempted for this preemption message. We just need to clearly spell out the behavior. With this approach I think we don't need #1 and #2? Thoughts? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628625#comment-13628625 ] Carlo Curino commented on YARN-45: -- Agreed. As for #1, your previous comments made us indeed "simplify" #1 as follows: We inform the AM that a Set will be killed unless he preempts them (the exact same containers). We dropped the "trading these containers for equivalent ones" as we agreed with your comments that would be too funky. The rationale behind including this simple container-based preemption is twofold: a) it matches very well with what the FairScheduler does today (we simply provide a cheaper form of preemption w.r.t. the straight-up kill it used to do), and b) it allows for compact bookkeeping for "kill if no preemption happens" in a policy we wrote to add preemption to the CapacityScheduler which seems to behave well. As for #2 I totally agree this is important to have, and it has lots of potential since it empowers the AM to make smart local decisions (it is well aligned with the overall spirit of Yarn I think). We will handle this both in the RM and AM in future patches. Where "future" = we have the code, but need a polish before posting. Cheers, Carlo > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628503#comment-13628503 ] Alejandro Abdelnur commented on YARN-45: My soft objection to #1 is just it is telling me 'you are overcapacity better get rid of stuff, this is our choice to kill but you could release others and you are good'. So why not just tell the amount of capacity I should release to be safe? IMO, if an AM will deal with the complexity of this functionality it should be able to map to containers locally and then decide what to release. IMO #2 is the one I care as it truly gives the AM the flexibility to decide what containers to get rid of based on the specified resources. Yes, I prefer this one. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628477#comment-13628477 ] Carlo Curino commented on YARN-45: -- This is still a point we are discussing and it is not fully binded, this is why is why it comes out confusing and why we were soliciting opinions. Your observations I think are helping us frame this a bit better. We can see three possible uses of preemption: 1) A preemption policy that does not necessarily trust the AM, picks containers and list them as a Set, and give the AM a heads up on who is going to die soon if it is not preempted. Note that If the AM is mapreduce this is not too bad as we know how containers are used (maps before reducers) and so we can pick containers in a reasonable order. We have been testing a policy that does this, and works well in our tests. Also this is a perfect match with how the FairScheduler thinks about preemption. 2) A preemption policy that trusts the AM and specifies preemption as a Set. This works well for known AMs that we know try to enforce the preemption requests, and/or if we do not care to force-killing anyway and preemption requests are best-effort. We have played around with a version of this too. If I am not mistaken this is also the case you care the most about, right? 3) A version of 2 which also enforces its preemption-requests via killing if they are not satisfied within a certain period of time. This is not-trivial to build as there is inherent ambiguity of how ResourceRequest are mapped to containers over-time, so the enforcement part is hard to get right / prove correctness for. We believe that 3 might be the ideal point of tendency but proving its correctness is non-trivial and would require deeper surgery to the RM/Schedulers, for example if in subsequent moment in time I want the same amount of resources out of an AM it is hard to unambiguously decide whether is due to an AM not preempting as I asked (just forcibly killing its containers is fine), or whether this are subsequent and independent request of resources (so I should not kill but wait). The proposed protocol, with the change that makes it a tagged union of Set and Set seems to allow for all of the above, and be easy to explain. I will update the patch to fix to reflect this if you agree. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628358#comment-13628358 ] Alejandro Abdelnur commented on YARN-45: Carlo, I may be missing something then. >From your description I'm understanding that a PreemptRequest could contain >either Set or Set but not both. If I'm correct with this assumption, then if the RM chooses to send Set then we are back to square one where the RM is deciding what to kill, it just giving a heads up. If the idea is that the RM will send PreemptRequest containing both Set and Set which are equivalent (just 2 ways of expressing the same amount of resources), then it seems OK. In this case, the Set is just a convenience fo the AM not to go and dig its internal data structures. But you seem to indicate this is not the case in your second last paragraph. I'd argue that the Set is just an early warning, it does not delegate the choice to the AM. The fact that the AM could decide to get rid of another container in the same location and make this preemption to go away seems twisted. Regardless of the convenience because of the implementation, I don't think the RM should cares which containers the AM chooses to release but the amount of resources. My specific use case is that that the AM should get an amount of preempt resources and decide which containers are best to release. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628305#comment-13628305 ] Carlo Curino commented on YARN-45: -- Alejandro, thanks for the feedback, and yes you are spot on. I think what you propose is akin to the Set we have (which is similar if I understand correctly to the PreemptResource thing you describe). We plan to support this, and it does cover one set of use cases very well, i.e., when we have a "broad" request and we are ok with the AM resolving this as it see fit. As you point out this is good because it allows the AM to be smart about what to return and thus more likely to save expensive preemptions in favor of cheap ones, or even return a container which is not data-local in place of one that is data-local etc... However, this feels contrived when we know precisely what we want back from a certain AM (e.g., we want to preempt a specific container). To this purpose the Set -based preemption is easier to use, and also simplifies the bookeeping done in the RM (in our preemption policy), to decide when to "kill" a container if the AM does not preempt it within a certain timeout. This is a good match with the FairScheduler internals and we adapted CapacityScheduler to leverage this too by means of a preemption monitor. This will be more clear when we release the actual monitor (in the next few days) but the idea is that if we talk to the AM in terms of a Set there is no ambiguity to detect when the AM is ignoring us, and thus we have to move on with container killing (e.g., to enforce capacity/fairness). On the contrary using ResourceRequest or something like that, we might not know whether the resource I want back now is the same I wanted in some previous iteration (hence i am being ignored by the AM) or they just happen to be the same/similar. If we can devise a simple way to leverage a single resource-based representation for both scenarios I would be happy to drop the Set, but so far we haven't found a clean way to do it, so we provisioned for both Set and/or Set to be optionally part of a PreemptRequest. The current semantic is that these are disjoint sets of resources we want (some called-out as containers, and some expressed as resources), but we don't have a strong reason for this not to be a tagged union. Do you think the above covers the use case you have in mind or am I missing something? (BTW I am very curious to hear what's your use case). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628247#comment-13628247 ] Alejandro Abdelnur commented on YARN-45: Nice, the proposed functionality comes quite handy for some stuff I'm working on. Regarding the question on how to model the PreemptRequest, have you thought about the following alternative? * The PreemptRequest would contain only a Set. * PreemptResource has the following properties: a String location, a Resource capability * The PreemptResource location can be ANY, a rack or a node. * The PreemptResource capability is the total capacity that should be released. * The AM, if taking the hint, would release containers that match the location and add up to the PreemptResource capability. By doing this, you give full control to the AM to decide what to release by grouping any containers that match the location. Thoughts? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626162#comment-13626162 ] Hadoop QA commented on YARN-45: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577679/YARN-45.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/691//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/691//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/691//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/691//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626139#comment-13626139 ] Carlo Curino commented on YARN-45: -- High level idea: The philosophy behind preemption is that we give the AM a heads up about resources that are likely to be taken away, and give it the opportunity to save the state of its tasks. A separate kill-based mechanism already exists (leveraged by the FairScheduler for its preemption) to forcibly recover containers. This is our fallback if the AM does not release containers in a certain amount of time (note that due to the quickly evolving conditions of the cluster we might not kill a container if at a later time we realize this is not strictly needed to achieve fairness/capacity). This means an AM can be written completely ignoring preemption hints, and would work correctly (although it might waste useful work). The goal is to allow for smart local policies in the AM, which leverage application-level understanding of the ongoing computation to face the imminent reduction of resources (e.g., by saving the state of the computation to a checkpoint, by promoting partial output, by migrating competencies to other tasks, by try to complete the work quickly). The goal is to spare the RM from understanding application-level optimization concerns but rather focus on resource management issues. As a consequence we envision (among others) preemption requests that are not fully bounded, allowing the AM to leverage some flexibility. Note that the significant "lag" imposed by the heartbeat protocols between RM-AM and AM-Tasks and NM-RM force us to consider in most cases preemption actions to be limited to a rather long time horizon. We can't expect to operate in a tight sub-second control loop, but rather trigger changes in the cluster allocation in the orders of tens of seconds. As a consequence preemption should be used to correct macroscopic issues that are likely to be somewhat stable over time, rather than micro-managing container allocations. We consider the following use cases for preemption: # Scheduling policies aimed at rebalancing some global property such as capacity or fairness. This allows to go for example over capacity on a queue and get resources back as the cluster conditions change. # Scheduling policies that are making point decisions about individual containers (e.g., preempt a container on a machine and restart it elsewhere to improve data locality, or preempting containers on a box that is observing excessive IOs). # Administrative actions that are aimed at modifying the cluster allocations without wasting work (e.g., draining a machine or a rack before taking it offline for maintenance), manually reducing allocations for a job, etc. Use cases 1 and 3 can be implemented by picking containers at the RM, or by expressing a "broad" request of a certain amount of resources (we reuse the ResourceRequest for this, in a way that is symmetric to the AM request) and let the AM to bound this to specific containers. While use case 2 is more likely to be implemented using ContainerIDs. Protocol change proposal: Our proposal consists in extending the ResourceResponse with a PreemptRequest message (further extensible in the future) that contains a Set and a Set. The current semantics is that these two sets are non-overlapping (i.e., if I ask for a specific container and a ResourceRequest the AM is supposed to satisfy both). Once again, as we never rely on the AM to "enforce" preemption but we have a kill-based fallback, the AM implementation is not required to understand the preemption requests (nor even acknowledging their receiving). This make for an simple upgrade story and one could run mixed preemption-aware and not-preemption-aware AMs on the same cluster. A current open question we would like input on is whether to have the PreemptRequest to be a union-type where we have either sets (but not both together), or whether to allow, as we do in the attached patch, for both to co-exists in the same PreemptRequest. We do not have a current need for the "both" use case, but maybe others do. thoughts? Coming up next: We are cleaning up further patches to the FairScheduler, CapacityScheduler and ApplicationMasterService leveraging this AM-RM protocol, and changes to the mapreduce AM that implements work-saving preemption via checkpointing for Shuffle and Reducers (while for Mappers we are currently "making a run for it" given the commonly short runtime of maps). The other patches will be posted soon. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Com