[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs
[ https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629839#comment-13629839 ] Hadoop QA commented on YARN-441: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578368/YARN-441.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/725//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/725//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/725//console This message is automatically generated. > Clean up unused collection methods in various APIs > -- > > Key: YARN-441 > URL: https://issues.apache.org/jira/browse/YARN-441 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Xuan Gong > Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, > YARN-441.4.patch > > > There's a bunch of unused methods like getAskCount() and getAsk(index) in > AllocateRequest, and other interfaces. These should be removed. > In YARN, found them in. MR will have it's own set. > AllocateRequest > StartContaienrResponse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.
[ https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629837#comment-13629837 ] Xuan Gong commented on YARN-561: org.apache.hadoop.yarn.api.records.container has containerId and NodeId(which can get address and port) which are enough for container talked to its local NM. And by YARN-486, we have already add org.apache.hadoop.yarn.api.records.container to ContainImpl. So, it will get those information now. > Nodemanager should set some key information into the environment of every > container that it launches. > - > > Key: YARN-561 > URL: https://issues.apache.org/jira/browse/YARN-561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Xuan Gong > Labels: usability > > Information such as containerId, nodemanager hostname, nodemanager port is > not set in the environment when any container is launched. > For an AM, the RM does all of this for it but for a container launched by an > application, all of the above need to be set by the ApplicationMaster. > At the minimum, container id would be a useful piece of information. If the > container wishes to talk to its local NM, the nodemanager related information > would also come in handy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
[ https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629827#comment-13629827 ] Xuan Gong commented on YARN-457: Also need add this.updatedNodes.clear() before we actually add all the updatedNodes > Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl > > > Key: YARN-457 > URL: https://issues.apache.org/jira/browse/YARN-457 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Kenji Kikushima >Priority: Minor > Labels: Newbie > Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457.patch > > > {code} > if (updatedNodes == null) { > this.updatedNodes.clear(); > return; > } > {code} > If updatedNodes is already null, a NullPointerException is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.
[ https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-561: -- Assignee: Xuan Gong (was: Omkar Vinit Joshi) > Nodemanager should set some key information into the environment of every > container that it launches. > - > > Key: YARN-561 > URL: https://issues.apache.org/jira/browse/YARN-561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Xuan Gong > Labels: usability > > Information such as containerId, nodemanager hostname, nodemanager port is > not set in the environment when any container is launched. > For an AM, the RM does all of this for it but for a container launched by an > application, all of the above need to be set by the ApplicationMaster. > At the minimum, container id would be a useful piece of information. If the > container wishes to talk to its local NM, the nodemanager related information > would also come in handy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-441) Clean up unused collection methods in various APIs
[ https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-441: --- Attachment: YARN-441.4.patch create new patch based on the self-review comments on patch3 > Clean up unused collection methods in various APIs > -- > > Key: YARN-441 > URL: https://issues.apache.org/jira/browse/YARN-441 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Xuan Gong > Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, > YARN-441.4.patch > > > There's a bunch of unused methods like getAskCount() and getAsk(index) in > AllocateRequest, and other interfaces. These should be removed. > In YARN, found them in. MR will have it's own set. > AllocateRequest > StartContaienrResponse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs
[ https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629819#comment-13629819 ] Xuan Gong commented on YARN-441: Patch3 self Review: 1. For each record API, we should only have getter and setter. We can keep getter and setter which get or take the whole list 2. For the functions which get, set, remove one item from the whole list or addAll, removeAll, clear the whole list, we can simply get the whole list first, then do the following get, set, remove or clear actions. So, those functions can be removed. > Clean up unused collection methods in various APIs > -- > > Key: YARN-441 > URL: https://issues.apache.org/jira/browse/YARN-441 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Xuan Gong > Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch > > > There's a bunch of unused methods like getAskCount() and getAsk(index) in > AllocateRequest, and other interfaces. These should be removed. > In YARN, found them in. MR will have it's own set. > AllocateRequest > StartContaienrResponse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629818#comment-13629818 ] Hadoop QA commented on YARN-514: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578361/YARN-514.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/724//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/724//console This message is automatically generated. > Delayed store operations should not result in RM unavailability for app > submission > -- > > Key: YARN-514 > URL: https://issues.apache.org/jira/browse/YARN-514 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Zhijie Shen > Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, > YARN-514.4.patch > > > Currently, app submission is the only store operation performed synchronously > because the app must be stored before the request returns with success. This > makes the RM susceptible to blocking all client threads on slow store > operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629806#comment-13629806 ] Carlo Curino commented on YARN-45: -- Note: we don't have tests as there are no tests for the rest of the protocolbuffer messages either (this would consist in validating mostly auto-generated code). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-514: - Attachment: YARN-514.4.patch Fix the incorrect indents. > Delayed store operations should not result in RM unavailability for app > submission > -- > > Key: YARN-514 > URL: https://issues.apache.org/jira/browse/YARN-514 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Zhijie Shen > Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, > YARN-514.4.patch > > > Currently, app submission is the only store operation performed synchronously > because the app must be stored before the request returns with success. This > makes the RM susceptible to blocking all client threads on slow store > operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629782#comment-13629782 ] Zhijie Shen commented on YARN-514: -- @Biksa, the enum values in the proto needs to be changed because YarnApplicationStateProto will be used by application report. MR may also need it when doing state conversion from Yarn state to MR state. > Delayed store operations should not result in RM unavailability for app > submission > -- > > Key: YARN-514 > URL: https://issues.apache.org/jira/browse/YARN-514 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Zhijie Shen > Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch > > > Currently, app submission is the only store operation performed synchronously > because the app must be stored before the request returns with success. This > makes the RM susceptible to blocking all client threads on slow store > operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629745#comment-13629745 ] Bikas Saha commented on YARN-514: - For MAPREDUCE-5140 please check for uses of both NEW and SUBMITTED in order to find out places where NEW_SAVING would need to be handled. > Delayed store operations should not result in RM unavailability for app > submission > -- > > Key: YARN-514 > URL: https://issues.apache.org/jira/browse/YARN-514 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Zhijie Shen > Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch > > > Currently, app submission is the only store operation performed synchronously > because the app must be stored before the request returns with success. This > makes the RM susceptible to blocking all client threads on slow store > operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629742#comment-13629742 ] Bikas Saha commented on YARN-514: - Looks good overall. Minor tab issues in the patch. I dont think we want to change the enum values in the proto. Please prepare a MAPREDUCE side patch for MAPREDUCE-5140. These need to go in together. > Delayed store operations should not result in RM unavailability for app > submission > -- > > Key: YARN-514 > URL: https://issues.apache.org/jira/browse/YARN-514 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Zhijie Shen > Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch > > > Currently, app submission is the only store operation performed synchronously > because the app must be stored before the request returns with success. This > makes the RM susceptible to blocking all client threads on slow store > operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-482) FS: Extend SchedulingMode to intermediate queues
[ https://issues.apache.org/jira/browse/YARN-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-482: -- Attachment: yarn-482.patch Here is a preliminary patch that # Renames SchedulingMode to SchedulingPolicy, as policy seems to be more apt name # Extends setting SchedulingPolicy to intermediate queues # Fixes previously broken assignContainer() hierarchy to include intermediate queues > FS: Extend SchedulingMode to intermediate queues > > > Key: YARN-482 > URL: https://issues.apache.org/jira/browse/YARN-482 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-482.patch > > > FS allows setting {{SchedulingMode}} for leaf queues. Extending this to > non-leaf queues allows using different kinds of fairness: e.g., root can have > three child queues - fair-mem, drf-cpu-mem, drf-cpu-disk-mem taking different > number of resources into account. In turn, this allows users to decide on the > scheduling latency vs sophistication of the scheduling mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629707#comment-13629707 ] Hadoop QA commented on YARN-45: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578339/YARN-45.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/723//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/723//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-45: - Attachment: YARN-45.patch > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-45: - Attachment: (was: YARN-45.patch) > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-45: - Attachment: YARN-45.patch > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629691#comment-13629691 ] Hadoop QA commented on YARN-45: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578337/YARN-45.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/722//console This message is automatically generated. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-45: - Attachment: (was: YARN-45.patch) > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-45: - Attachment: YARN-45.patch > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629662#comment-13629662 ] Bikas Saha commented on YARN-45: Moved to sub-task of YARN-397 for scheduler API changes. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-45: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-397 > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-45: --- Issue Type: Improvement (was: Sub-task) Parent: (was: YARN-386) > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629660#comment-13629660 ] Carlo Curino commented on YARN-45: -- [~kkambatl], yes ResourceRequests can be used to capture locality preferences. In our first use we focus on capacity, so the RM policies are not very picky/aware of location, but we think it is good to build this into the protocol for later use (as commented above somewhere). (As for the last comment: we moved YARN-567, YARN-568, YARN-569 that will use this protocol into YARN-397, while this one is probably part of YARN-386). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption
[ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-568: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-397 > FairScheduler: support for work-preserving preemption > -- > > Key: YARN-568 > URL: https://issues.apache.org/jira/browse/YARN-568 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: fair.patch > > > In the attached patch, we modified the FairScheduler to substitute its > preemption-by-killling with a work-preserving version of preemption (followed > by killing if the AMs do not respond quickly enough). This should allows to > run preemption checking more often, but kill less often (proper tuning to be > investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-569: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-397 > CapacityScheduler: support for preemption (using a capacity monitor) > > > Key: YARN-569 > URL: https://issues.apache.org/jira/browse/YARN-569 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: 3queues.pdf, capacity.patch, > CapScheduler_with_preemption.pdf > > > There is a tension between the fast-pace reactive role of the > CapacityScheduler, which needs to respond quickly to > applications resource requests, and node updates, and the more introspective, > time-based considerations > needed to observe and correct for capacity balance. To this purpose we opted > instead of hacking the delicate > mechanisms of the CapacityScheduler directly to add support for preemption by > means of a "Capacity Monitor", > which can be run optionally as a separate service (much like the > NMLivelinessMonitor). > The capacity monitor (similarly to equivalent functionalities in the fairness > scheduler) operates running on intervals > (e.g., every 3 seconds), observe the state of the assignment of resources to > queues from the capacity scheduler, > performs off-line computation to determine if preemption is needed, and how > best to "edit" the current schedule to > improve capacity, and generates events that produce four possible actions: > # Container de-reservations > # Resource-based preemptions > # Container-based preemptions > # Container killing > The actions listed above are progressively more costly, and it is up to the > policy to use them as desired to achieve the rebalancing goals. > Note that due to the "lag" in the effect of these actions the policy should > operate at the macroscopic level (e.g., preempt tens of containers > from a queue) and not trying to tightly and consistently micromanage > container allocations. > - Preemption policy (ProportionalCapacityPreemptionPolicy): > - > Preemption policies are by design pluggable, in the following we present an > initial policy (ProportionalCapacityPreemptionPolicy) we have been > experimenting with. The ProportionalCapacityPreemptionPolicy behaves as > follows: > # it gathers from the scheduler the state of the queues, in particular, their > current capacity, guaranteed capacity and pending requests (*) > # if there are pending requests from queues that are under capacity it > computes a new ideal balanced state (**) > # it computes the set of preemptions needed to repair the current schedule > and achieve capacity balance (accounting for natural completion rates, and > respecting bounds on the amount of preemption we allow for each round) > # it selects which applications to preempt from each over-capacity queue (the > last one in the FIFO order) > # it remove reservations from the most recently assigned app until the amount > of resource to reclaim is obtained, or until no more reservations exits > # (if not enough) it issues preemptions for containers from the same > applications (reverse chronological order, last assigned container first) > again until necessary or until no containers except the AM container are left, > # (if not enough) it moves onto unreserve and preempt from the next > application. > # containers that have been asked to preempt are tracked across executions. > If a containers is among the one to be preempted for more than a certain > time, the container is moved in a the list of containers to be forcibly > killed. > Notes: > (*) at the moment, in order to avoid double-counting of the requests, we only > look at the "ANY" part of pending resource requests, which means we might not > preempt on behalf of AMs that ask only for specific locations but not any. > (**) The ideal balance state is one in which each queue has at least its > guaranteed capacity, and the spare capacity is distributed among queues (that > wants some) as a weighted fair share. Where the weighting is based on the > guaranteed capacity of a queue, and the function runs to a fix point. > Tunables of the ProportionalCapacityPreemptionPolicy: > # observe-only mode (i.e., log the actions it would take, but behave as > read-only) > # how frequently to run the policy > # how long to wait between preemption and kill of a container > # which fraction of the containers I would like to obtain should I preempt > (has to do with the natural rate at which containers are returned) > # deadzone size, i.e., what % of over-capacity should I ignore (if we are off > perfect balance by some small % we i
[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-567: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-397 > RM changes to support preemption for FairScheduler and CapacityScheduler > > > Key: YARN-567 > URL: https://issues.apache.org/jira/browse/YARN-567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: common.patch > > > A common tradeoff in scheduling jobs is between keeping the cluster busy and > enforcing capacity/fairness properties. FairScheduler and CapacityScheduler > takes opposite stance on how to achieve this. > The FairScheduler, leverages task-killing to quickly reclaim resources from > currently running jobs and redistributing them among new jobs, thus keeping > the cluster busy but waste useful work. The CapacityScheduler is typically > tuned > to limit the portion of the cluster used by each queue so that the likelihood > of violating capacity is low, thus never wasting work, but risking to keep > the cluster underutilized or have jobs waiting to obtain their rightful > capacity. > By introducing the notion of a work-preserving preemption we can remove this > tradeoff. This requires a protocol for preemption (YARN-45), and > ApplicationMasters that can answer to preemption efficiently (e.g., by > saving their intermediate state, this will be posted for MapReduce in a > separate JIRA soon), together with a scheduler that can issues preemption > requests (discussed in separate JIRAs YARN-568 and YARN-569). > The changes we track with this JIRA are common to FairScheduler and > CapacityScheduler, and are mostly propagation of preemption decisions through > the ApplicationMastersService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629638#comment-13629638 ] Karthik Kambatla commented on YARN-45: -- [~bikassaha], shouldn't this be under YARN-397? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629635#comment-13629635 ] Karthik Kambatla commented on YARN-45: -- Great discussion, glad to see this coming along well. Carlo's latest comment makes sense to me. Let me know if I understand it right: ResourceRequest part of the message can capture locality, the AM will try to give back Resources on each node as per this locality information? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629620#comment-13629620 ] Bikas Saha commented on YARN-45: All API changes at this point are being tracked under YARN-386 > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-45: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-386 > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629590#comment-13629590 ] Hadoop QA commented on YARN-547: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578317/yarn-547-20130411.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/721//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/721//console This message is automatically generated. > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch > > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-547: --- Attachment: (was: yarn-547-20130411.1.patch) > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch > > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-547: --- Attachment: yarn-547-20130411.1.patch > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch > > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-547: --- Attachment: yarn-547-20130411.1.patch > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch > > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629569#comment-13629569 ] Omkar Vinit Joshi commented on YARN-547: Failed test is actually testing Now invalid transitions. Fixing it. > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch > > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.
[ https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629549#comment-13629549 ] Hudson commented on YARN-319: - Integrated in Hadoop-trunk-Commit #3603 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3603/]) Fixing CHANGES.txt entry for YARN-319. (Revision 1467133) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1467133 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > Submit a job to a queue that not allowed in fairScheduler, client will hold > forever. > > > Key: YARN-319 > URL: https://issues.apache.org/jira/browse/YARN-319 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.2-alpha >Reporter: shenhong >Assignee: shenhong > Fix For: 2.0.5-beta > > Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, > YARN-319.patch > > > RM use fairScheduler, when client submit a job to a queue, but the queue do > not allow the user to submit job it, in this case, client will hold forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629514#comment-13629514 ] Hadoop QA commented on YARN-486: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578308/YARN-486.6.branch2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/720//console This message is automatically generated. > Change startContainer NM API to accept Container as a parameter and make > ContainerLaunchContext user land > - > > Key: YARN-486 > URL: https://issues.apache.org/jira/browse/YARN-486 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-486.1.patch, YARN-486-20130410.txt, > YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, > YARN-486.6.branch2.patch, YARN-486.6.patch > > > Currently, id, resource request etc need to be copied over from Container to > ContainerLaunchContext. This can be brittle. Also it leads to duplication of > information (such as Resource from CLC and Resource from Container and > Container.tokens). Sending Container directly to startContainer solves these > problems. It also makes CLC clean by only having stuff in it that it set by > the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-486: --- Attachment: YARN-486.6.branch2.patch Patch for branch-2 > Change startContainer NM API to accept Container as a parameter and make > ContainerLaunchContext user land > - > > Key: YARN-486 > URL: https://issues.apache.org/jira/browse/YARN-486 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-486.1.patch, YARN-486-20130410.txt, > YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, > YARN-486.6.branch2.patch, YARN-486.6.patch > > > Currently, id, resource request etc need to be copied over from Container to > ContainerLaunchContext. This can be brittle. Also it leads to duplication of > information (such as Resource from CLC and Resource from Container and > Container.tokens). Sending Container directly to startContainer solves these > problems. It also makes CLC clean by only having stuff in it that it set by > the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629510#comment-13629510 ] Hadoop QA commented on YARN-547: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578295/yarn-547-20130411.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalizedResource {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/719//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/719//console This message is automatically generated. > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-547-20130411.patch > > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-544) Failed resource localization might introduce a race condition.
[ https://issues.apache.org/jira/browse/YARN-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-544. -- Resolution: Duplicate Thanks for the update, Omkar. Closing this as duplicate. > Failed resource localization might introduce a race condition. > -- > > Key: YARN-544 > URL: https://issues.apache.org/jira/browse/YARN-544 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > When resource localization fails [Public localizer / > LocalizerRunner(Private)] it sends ContainerResourceFailedEvent to the > containers which then sends ResourceReleaseEvent to the failed resource. In > the end when LocalizedResource's ref count drops to 0 its state is changed > from DOWNLOADING to INIT. > Now if a Resource gets ResourceRequestEvent in between > ContainerResourceFailedEvent and last ResourceReleaseEvent then for that > resource ref count will not drop to 0 and the container which sent the > ResourceRequestEvent will keep waiting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-537) Waiting containers are not informed if private localization for a resource fails.
[ https://issues.apache.org/jira/browse/YARN-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-537. -- Resolution: Duplicate Fixed as part of YARN-539. Closing as duplicate. > Waiting containers are not informed if private localization for a resource > fails. > - > > Key: YARN-537 > URL: https://issues.apache.org/jira/browse/YARN-537 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > In ResourceLocalizationService.LocalizerRunner.update() if localization fails > then all the other waiting containers are not informed only the initiator is > informed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-542) Change the default global AM max-attempts value to be not one
[ https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629480#comment-13629480 ] Hadoop QA commented on YARN-542: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578289/YARN-542.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/718//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/718//console This message is automatically generated. > Change the default global AM max-attempts value to be not one > - > > Key: YARN-542 > URL: https://issues.apache.org/jira/browse/YARN-542 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-542.1.patch > > > Today, the global AM max-attempts is set to 1 which is a bad choice. AM > max-attempts accounts for both AM level failures as well as container crashes > due to localization issue, lost nodes etc. To account for AM crashes due to > problems that are not caused by user code, mainly lost nodes, we want to give > AMs some retires. > I propose we change it to atleast two. Can change it to 4 to match other > retry-configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629476#comment-13629476 ] Vinod Kumar Vavilapalli commented on YARN-486: -- I merged YARN-319 into branch-2. But YARN-488 won't be merged yet because it is a WINDOWS only change, so can you upload a patch for branch-2? Tx. > Change startContainer NM API to accept Container as a parameter and make > ContainerLaunchContext user land > - > > Key: YARN-486 > URL: https://issues.apache.org/jira/browse/YARN-486 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-486.1.patch, YARN-486-20130410.txt, > YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, > YARN-486.6.patch > > > Currently, id, resource request etc need to be copied over from Container to > ContainerLaunchContext. This can be brittle. Also it leads to duplication of > information (such as Resource from CLC and Resource from Container and > Container.tokens). Sending Container directly to startContainer solves these > problems. It also makes CLC clean by only having stuff in it that it set by > the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629475#comment-13629475 ] Xuan Gong commented on YARN-486: Another issue is at YARN-488 which is not committed into branch-2, either. It do the changes ContainerLaunchContext amContainer = BuilderUtils .newContainerLaunchContext(null, "testUser", BuilderUtils .newResource(1024, 1), Collections.emptyMap(), -new HashMap(), Arrays.asList("sleep", "100"), +new HashMap(), cmd, new HashMap(), null, new HashMap()); At TestContainerManagerSecurity:submitAndRegisterApplication > Change startContainer NM API to accept Container as a parameter and make > ContainerLaunchContext user land > - > > Key: YARN-486 > URL: https://issues.apache.org/jira/browse/YARN-486 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-486.1.patch, YARN-486-20130410.txt, > YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, > YARN-486.6.patch > > > Currently, id, resource request etc need to be copied over from Container to > ContainerLaunchContext. This can be brittle. Also it leads to duplication of > information (such as Resource from CLC and Resource from Container and > Container.tokens). Sending Container directly to startContainer solves these > problems. It also makes CLC clean by only having stuff in it that it set by > the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-547: --- Attachment: yarn-547-20130411.patch > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-547-20130411.patch > > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629467#comment-13629467 ] Omkar Vinit Joshi commented on YARN-547: Fix details :- * Underlying problem:- Resource was getting requested even when it is in DOWNLOADING state for ResourceRequestEvent. * Solution :- Fixing unwanted transition and for RequestEvent in DOWNLOADING state just adds container in the waiting queue. * Tests :- Making sure that resource never moves back to INIT state even when requesting container releases it before localization. In case of Release event when resource is in DOWNLOADING state just updates container list(ref). > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.
[ https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-319: - Fix Version/s: (was: 2.0.3-alpha) 2.0.5-beta Even though the fix version is set to 2.0.3, it isn't merged into branch-2 at all. I just merged it into 2.0.5-beta, and changing the fix version. > Submit a job to a queue that not allowed in fairScheduler, client will hold > forever. > > > Key: YARN-319 > URL: https://issues.apache.org/jira/browse/YARN-319 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.2-alpha >Reporter: shenhong >Assignee: shenhong > Fix For: 2.0.5-beta > > Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, > YARN-319.patch > > > RM use fairScheduler, when client submit a job to a queue, but the queue do > not allow the user to submit job it, in this case, client will hold forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one
[ https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-542: - Attachment: YARN-542.1.patch I've drafted a patch, which includes the following modifications: 1. Change the default value of yarn.resourcemanager.am.max-attempts from 1 to 2. 2. In the test cases, where more than one attempt is set, YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS is used instead of the hard-coding values. 3. Assert the set maxAttempts > 1 where one and more than one will make difference. > Change the default global AM max-attempts value to be not one > - > > Key: YARN-542 > URL: https://issues.apache.org/jira/browse/YARN-542 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-542.1.patch > > > Today, the global AM max-attempts is set to 1 which is a bad choice. AM > max-attempts accounts for both AM level failures as well as container crashes > due to localization issue, lost nodes etc. To account for AM crashes due to > problems that are not caused by user code, mainly lost nodes, we want to give > AMs some retires. > I propose we change it to atleast two. Can change it to 4 to match other > retry-configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs
[ https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629414#comment-13629414 ] Hadoop QA commented on YARN-441: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578280/YARN-441.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/717//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/717//console This message is automatically generated. > Clean up unused collection methods in various APIs > -- > > Key: YARN-441 > URL: https://issues.apache.org/jira/browse/YARN-441 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Xuan Gong > Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch > > > There's a bunch of unused methods like getAskCount() and getAsk(index) in > AllocateRequest, and other interfaces. These should be removed. > In YARN, found them in. MR will have it's own set. > AllocateRequest > StartContaienrResponse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs
[ https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629377#comment-13629377 ] Xuan Gong commented on YARN-441: Add the void setServiceResponse(String key, ByteBuffer value) back to StartContainerResponse interface. > Clean up unused collection methods in various APIs > -- > > Key: YARN-441 > URL: https://issues.apache.org/jira/browse/YARN-441 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Xuan Gong > Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch > > > There's a bunch of unused methods like getAskCount() and getAsk(index) in > AllocateRequest, and other interfaces. These should be removed. > In YARN, found them in. MR will have it's own set. > AllocateRequest > StartContaienrResponse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-441) Clean up unused collection methods in various APIs
[ https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-441: --- Attachment: YARN-441.3.patch > Clean up unused collection methods in various APIs > -- > > Key: YARN-441 > URL: https://issues.apache.org/jira/browse/YARN-441 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Xuan Gong > Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch > > > There's a bunch of unused methods like getAskCount() and getAsk(index) in > AllocateRequest, and other interfaces. These should be removed. > In YARN, found them in. MR will have it's own set. > AllocateRequest > StartContaienrResponse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629335#comment-13629335 ] Xuan Gong commented on YARN-486: Can not merge into branch-2, because There is no such test case TestFairScheduler:testNotAllowSubmitApplication in branch which is introduced by YARN-319, and look like that this patch is not submitted to branch-2 > Change startContainer NM API to accept Container as a parameter and make > ContainerLaunchContext user land > - > > Key: YARN-486 > URL: https://issues.apache.org/jira/browse/YARN-486 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-486.1.patch, YARN-486-20130410.txt, > YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, > YARN-486.6.patch > > > Currently, id, resource request etc need to be copied over from Container to > ContainerLaunchContext. This can be brittle. Also it leads to duplication of > information (such as Resource from CLC and Resource from Container and > Container.tokens). Sending Container directly to startContainer solves these > problems. It also makes CLC clean by only having stuff in it that it set by > the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one
[ https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-542: - Description: Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires. I propose we change it to atleast two. Can change it to 4 to match other retry-configs. was: Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires. I propose we change it to atleast two. Can change it to 4 to match other retry-configs. > Change the default global AM max-attempts value to be not one > - > > Key: YARN-542 > URL: https://issues.apache.org/jira/browse/YARN-542 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > Today, the global AM max-attempts is set to 1 which is a bad choice. AM > max-attempts accounts for both AM level failures as well as container crashes > due to localization issue, lost nodes etc. To account for AM crashes due to > problems that are not caused by user code, mainly lost nodes, we want to give > AMs some retires. > I propose we change it to atleast two. Can change it to 4 to match other > retry-configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one
[ https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-542: - Summary: Change the default global AM max-attempts value to be not one (was: Change the default AM retry value to be not one) > Change the default global AM max-attempts value to be not one > - > > Key: YARN-542 > URL: https://issues.apache.org/jira/browse/YARN-542 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries > accounts for both AM level failures as well as container crashes due to > localization issue, lost nodes etc. To account for AM crashes due to problems > that are not caused by user code, mainly lost nodes, we want to give AMs some > retires. > I propose we change it to atleast two. Can change it to 4 to match other > retry-configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-563) Add application type to ApplicationReport
[ https://issues.apache.org/jira/browse/YARN-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629204#comment-13629204 ] Hitesh Shah commented on YARN-563: -- +1 on the suggestion. If you are working on this, a few comments: - applicationType should also be part of ApplicationSubmissionContext - command-line tool to list applications (bin/yarn tool) should support filtering based on type - type should be a string > Add application type to ApplicationReport > -- > > Key: YARN-563 > URL: https://issues.apache.org/jira/browse/YARN-563 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Thomas Weise > > This field is needed to distinguish different types of applications (app > master implementations). For example, we may run applications of type XYZ in > a cluster alongside MR and would like to filter applications by type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629265#comment-13629265 ] Hudson commented on YARN-486: - Integrated in Hadoop-trunk-Commit #3596 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3596/]) YARN-486. Changed NM's startContainer API to accept Container record given by RM as a direct parameter instead of as part of the ContainerLaunchContext record. Contributed by Xuan Gong. MAPREDUCE-5139. Update MR AM to use the modified startContainer API after YARN-486. Contributed by Xuan Gong. (Revision 1467063) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1467063 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.j
[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629256#comment-13629256 ] Vinod Kumar Vavilapalli commented on YARN-486: -- I committed this to trunk, it isn't merging into branch-2 though, can you please check? > Change startContainer NM API to accept Container as a parameter and make > ContainerLaunchContext user land > - > > Key: YARN-486 > URL: https://issues.apache.org/jira/browse/YARN-486 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-486.1.patch, YARN-486-20130410.txt, > YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, > YARN-486.6.patch > > > Currently, id, resource request etc need to be copied over from Container to > ContainerLaunchContext. This can be brittle. Also it leads to duplication of > information (such as Resource from CLC and Resource from Container and > Container.tokens). Sending Container directly to startContainer solves these > problems. It also makes CLC clean by only having stuff in it that it set by > the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
[ https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629231#comment-13629231 ] Xuan Gong commented on YARN-457: First of all, I think the changes will be AllocationResponsePBImpl, there is no AMResponsePBImpl anymore. Could you update to the lastest trunk version, please ? I think we need to change the whole setUpdatedNodes function definition, Only changing the if block is not enough. The whole change may like this way: if(updatedNodes == null) { return } initLocalNewNodeReportList(); this.updatedNodes.add(updatedNodes); The way we implement the setUpdatedNodes is just like we are implementing the setAllocatedContainers() in AllocationResponsePBImpl > Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl > > > Key: YARN-457 > URL: https://issues.apache.org/jira/browse/YARN-457 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Kenji Kikushima >Priority: Minor > Labels: Newbie > Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457.patch > > > {code} > if (updatedNodes == null) { > this.updatedNodes.clear(); > return; > } > {code} > If updatedNodes is already null, a NullPointerException is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-559) Make all YARN API and libraries available through an api jar
[ https://issues.apache.org/jira/browse/YARN-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reassigned YARN-559: -- Assignee: Vinod Kumar Vavilapalli > Make all YARN API and libraries available through an api jar > > > Key: YARN-559 > URL: https://issues.apache.org/jira/browse/YARN-559 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Vinod Kumar Vavilapalli > > This should be the dependency for interacting with YARN and would prevent > unnecessary leakage of other internal stuff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629147#comment-13629147 ] Omkar Vinit Joshi commented on YARN-547: There are couple of invalid transition for LocalizedResource now. Updating them as a part of this patch * From INIT state ** From INIT to INIT on RELEASE event. This is not possible now as new resource is created in INIT state on REQUEST event and immediately moved to DOWNLOADING state. With the [yarn-539|https://issues.apache.org/jira/browse/YARN-539] fix now the resource will never ever move back from LOCALIZED or DOWNLOADING state to INIT state. ** From INIT to LOCALIZED on LOCALIZED event. This too is impossible to occur now. * From DOWNLOADING state ** From DOWNLOADING to DOWNLOADING on REQUEST event. Updating the transition. Earlier it was starting one more localization. Now just adding the requesting container to the LocalizedResource container list. * From LOCALIZED state ** Resource will never get LOCALIZED event in LOCALIZED state. removing it. Earlier this was possible as there were multiple downloads for the same resource. Now this is not possible. > New resource localization is tried even when Localized Resource is in > DOWNLOADING state > --- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > At present when multiple containers try to request a localized resource > 1) If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > 2) Now if in this state multiple ResourceRequestEvents come in then > ResourceLocalizationEvents are fired for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. > Location : ResourceLocalizationService.addResource .. addition of the request > into "attempts" in case of an event already exists. > The root cause for this is the presence of FetchResourceTransition on > receiving ResourceRequestEvent in DOWNLOADING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-563) Add application type to ApplicationReport
[ https://issues.apache.org/jira/browse/YARN-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-563: Issue Type: Sub-task (was: Improvement) Parent: YARN-386 > Add application type to ApplicationReport > -- > > Key: YARN-563 > URL: https://issues.apache.org/jira/browse/YARN-563 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Thomas Weise > > This field is needed to distinguish different types of applications (app > master implementations). For example, we may run applications of type XYZ in > a cluster alongside MR and would like to filter applications by type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629070#comment-13629070 ] Alejandro Abdelnur commented on YARN-45: sounds good > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628950#comment-13628950 ] Carlo Curino commented on YARN-45: -- Agreed on a single message, where the semantics is: 1) if both Set and ResourceRequest are specified, than it is what said (they overlap and you have to give me back at least the resources I ask otherwise these containers are at risk to getting killed) 2) if only Set is specified is the "stricter" semantics of I want these containers back and nothing else. 3) if only ResourceRequest is specified the semantics is "please give me back this many resources" without binding what containers are at risk (this might be good for policies that do not want to think about containers unless it is really time to kill them). Does this work for you? Seems to capture the combination of what we proposed so far. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails
[ https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628945#comment-13628945 ] Hudson commented on YARN-539: - Integrated in Hadoop-Mapreduce-trunk #1396 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/]) YARN-539. Addressed memory leak of LocalResource objects NM when a resource localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466756 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java > LocalizedResources are leaked in memory in case resource localization fails > --- > > Key: YARN-539 > URL: https://issues.apache.org/jira/browse/YARN-539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, > yarn-539-20130410.patch > > > If resource localization fails then resource remains in memory and is > 1) Either cleaned up when next time cache cleanup runs and there is space > crunch. (If sufficient space in cache is available then it will remain in > memory). > 2) reused if LocalizationRequest comes again for the same resource. > I think when resource localization fails then that event should be sent to > LocalResourceTracker which will then remove it from its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling
[ https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628949#comment-13628949 ] Hudson commented on YARN-487: - Integrated in Hadoop-Mapreduce-trunk #1396 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/]) YARN-487. Modify path manipulation in LocalDirsHandlerService to let TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 1466746) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466746 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java > TestDiskFailures fails on Windows due to path mishandling > - > > Key: YARN-487 > URL: https://issues.apache.org/jira/browse/YARN-487 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 3.0.0 > > Attachments: YARN-487.1.patch > > > {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an > extra leading '/' on the path within {{LocalDirsHandlerService}} when running > on Windows. The test assertions also fail to account for the fact that > {{Path}} normalizes '\' to '/'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628944#comment-13628944 ] Hudson commented on YARN-495: - Integrated in Hadoop-Mapreduce-trunk #1396 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/]) YARN-495. Changed NM reboot behaviour to be a simple resync - kill all containers and re-register with RM. Contributed by Jian He. (Revision 1466752) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466752 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeAction.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java > Change NM behavior of reboot to resync > -- > > Key: YARN-495 > URL: https://issues.apache.org/jira/browse/YARN-495 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.0.5-beta > > Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch, > YARN-495.4.patch, YARN-495.5.patch, YARN-495.6.patch > > > When a reboot command is sent from RM, the node manager doesn't clean up the > containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption
[ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-568: -- Assignee: Carlo Curino > FairScheduler: support for work-preserving preemption > -- > > Key: YARN-568 > URL: https://issues.apache.org/jira/browse/YARN-568 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: fair.patch > > > In the attached patch, we modified the FairScheduler to substitute its > preemption-by-killling with a work-preserving version of preemption (followed > by killing if the AMs do not respond quickly enough). This should allows to > run preemption checking more often, but kill less often (proper tuning to be > investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-569: -- Assignee: Carlo Curino > CapacityScheduler: support for preemption (using a capacity monitor) > > > Key: YARN-569 > URL: https://issues.apache.org/jira/browse/YARN-569 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: 3queues.pdf, capacity.patch, > CapScheduler_with_preemption.pdf > > > There is a tension between the fast-pace reactive role of the > CapacityScheduler, which needs to respond quickly to > applications resource requests, and node updates, and the more introspective, > time-based considerations > needed to observe and correct for capacity balance. To this purpose we opted > instead of hacking the delicate > mechanisms of the CapacityScheduler directly to add support for preemption by > means of a "Capacity Monitor", > which can be run optionally as a separate service (much like the > NMLivelinessMonitor). > The capacity monitor (similarly to equivalent functionalities in the fairness > scheduler) operates running on intervals > (e.g., every 3 seconds), observe the state of the assignment of resources to > queues from the capacity scheduler, > performs off-line computation to determine if preemption is needed, and how > best to "edit" the current schedule to > improve capacity, and generates events that produce four possible actions: > # Container de-reservations > # Resource-based preemptions > # Container-based preemptions > # Container killing > The actions listed above are progressively more costly, and it is up to the > policy to use them as desired to achieve the rebalancing goals. > Note that due to the "lag" in the effect of these actions the policy should > operate at the macroscopic level (e.g., preempt tens of containers > from a queue) and not trying to tightly and consistently micromanage > container allocations. > - Preemption policy (ProportionalCapacityPreemptionPolicy): > - > Preemption policies are by design pluggable, in the following we present an > initial policy (ProportionalCapacityPreemptionPolicy) we have been > experimenting with. The ProportionalCapacityPreemptionPolicy behaves as > follows: > # it gathers from the scheduler the state of the queues, in particular, their > current capacity, guaranteed capacity and pending requests (*) > # if there are pending requests from queues that are under capacity it > computes a new ideal balanced state (**) > # it computes the set of preemptions needed to repair the current schedule > and achieve capacity balance (accounting for natural completion rates, and > respecting bounds on the amount of preemption we allow for each round) > # it selects which applications to preempt from each over-capacity queue (the > last one in the FIFO order) > # it remove reservations from the most recently assigned app until the amount > of resource to reclaim is obtained, or until no more reservations exits > # (if not enough) it issues preemptions for containers from the same > applications (reverse chronological order, last assigned container first) > again until necessary or until no containers except the AM container are left, > # (if not enough) it moves onto unreserve and preempt from the next > application. > # containers that have been asked to preempt are tracked across executions. > If a containers is among the one to be preempted for more than a certain > time, the container is moved in a the list of containers to be forcibly > killed. > Notes: > (*) at the moment, in order to avoid double-counting of the requests, we only > look at the "ANY" part of pending resource requests, which means we might not > preempt on behalf of AMs that ask only for specific locations but not any. > (**) The ideal balance state is one in which each queue has at least its > guaranteed capacity, and the spare capacity is distributed among queues (that > wants some) as a weighted fair share. Where the weighting is based on the > guaranteed capacity of a queue, and the function runs to a fix point. > Tunables of the ProportionalCapacityPreemptionPolicy: > # observe-only mode (i.e., log the actions it would take, but behave as > read-only) > # how frequently to run the policy > # how long to wait between preemption and kill of a container > # which fraction of the containers I would like to obtain should I preempt > (has to do with the natural rate at which containers are returned) > # deadzone size, i.e., what % of over-capacity should I ignore (if we are off > perfect balance by some small % we ignore it) > # overall amount of preempti
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628938#comment-13628938 ] Alejandro Abdelnur commented on YARN-45: I'm just trying to see if we can have (at least for now) a single message type instead of two that satisfies the usecases. Regarding keeping the tighter semantics, if not difficult/complex, I'm OK with it. Thanks. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628937#comment-13628937 ] Carlo Curino commented on YARN-569: --- - Comments of attached Graphs -- The attached graph highlights the need for preemption by means of an example designed to highlights this. We run 2 sort jobs over 128GB of data on a 10 nodes cluster, starting the first job in queue B (20% guaranteed capacity) and the second job 400sec later in queue A (80% guaranteed capacity). We compare three scenarios: # Default CapacityScheduler with A and B having maximum capacity set to 100%: the cluster utilization is high, B runs fast since it can use the entire cluster when A is not around, but A needs to wait for very long (almost 20 min) before obtaining access to its all of its guaranteed capacity (and over 250 secs to get any container beside the AM). # Default CapacityScheduler with A and B have maximum capacity set to 80 and 20% respectively, A obtains its guaranteed resources immediately, but the cluster utilization is very low and jobs in B take over 2X longer since they cannot use spare overcapacity. # CapacityScheduler + preemption: A and B are configured as in 1) but we preempt containers. We obtain both high-utilization, short runtimes for B (comparable to scenario 1), and prompt resources to A (within 30 sec). The second attached graph shows a scenario with 3 queues A, B, C with 40%, 20%, 40% capacity guaranteed. We show more "internals" of the policy by plotting, instantaneous resource utilization as above, total pending request, guaranteed capacity, ideal assignment of memory, ideal preemption, actual preemption. Things to note: # The idealized memory assignment and instaneous resource utilization are very close to each other, i.e., the combination of CapacityScheduler+Preemption tightly follows the the ideal distribution of resources # When only one job is running it gets 100% of the cluster, when B, A are running they get 33% and 66% each (which is a fair overcapacity assignment from their 20%, 40% guaranteed capacity), when all three jobs are running (and they want at least their capacity worth of resources) they obtain their guaranteed capacity. #actual preemption is a fraction of ideal preemption, this is because we account for natural completion of tasks (with a configurable parameter) #in this experiment we do not bound the total amount of preemption per round (i.e., parameter set to 1.0) > CapacityScheduler: support for preemption (using a capacity monitor) > > > Key: YARN-569 > URL: https://issues.apache.org/jira/browse/YARN-569 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Carlo Curino > Attachments: 3queues.pdf, capacity.patch, > CapScheduler_with_preemption.pdf > > > There is a tension between the fast-pace reactive role of the > CapacityScheduler, which needs to respond quickly to > applications resource requests, and node updates, and the more introspective, > time-based considerations > needed to observe and correct for capacity balance. To this purpose we opted > instead of hacking the delicate > mechanisms of the CapacityScheduler directly to add support for preemption by > means of a "Capacity Monitor", > which can be run optionally as a separate service (much like the > NMLivelinessMonitor). > The capacity monitor (similarly to equivalent functionalities in the fairness > scheduler) operates running on intervals > (e.g., every 3 seconds), observe the state of the assignment of resources to > queues from the capacity scheduler, > performs off-line computation to determine if preemption is needed, and how > best to "edit" the current schedule to > improve capacity, and generates events that produce four possible actions: > # Container de-reservations > # Resource-based preemptions > # Container-based preemptions > # Container killing > The actions listed above are progressively more costly, and it is up to the > policy to use them as desired to achieve the rebalancing goals. > Note that due to the "lag" in the effect of these actions the policy should > operate at the macroscopic level (e.g., preempt tens of containers > from a queue) and not trying to tightly and consistently micromanage > container allocations. > - Preemption policy (ProportionalCapacityPreemptionPolicy): > - > Preemption policies are by design pluggable, in the following we present an > initial policy (ProportionalCapacityPreemptionPolicy) we have been > experimenting with. The ProportionalCapacityPreemptionPolicy behaves as > follows: > # it gathers from the scheduler the state of the queues, in particular, th
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-569: -- Attachment: capacity.patch > CapacityScheduler: support for preemption (using a capacity monitor) > > > Key: YARN-569 > URL: https://issues.apache.org/jira/browse/YARN-569 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Carlo Curino > Attachments: 3queues.pdf, capacity.patch, > CapScheduler_with_preemption.pdf > > > There is a tension between the fast-pace reactive role of the > CapacityScheduler, which needs to respond quickly to > applications resource requests, and node updates, and the more introspective, > time-based considerations > needed to observe and correct for capacity balance. To this purpose we opted > instead of hacking the delicate > mechanisms of the CapacityScheduler directly to add support for preemption by > means of a "Capacity Monitor", > which can be run optionally as a separate service (much like the > NMLivelinessMonitor). > The capacity monitor (similarly to equivalent functionalities in the fairness > scheduler) operates running on intervals > (e.g., every 3 seconds), observe the state of the assignment of resources to > queues from the capacity scheduler, > performs off-line computation to determine if preemption is needed, and how > best to "edit" the current schedule to > improve capacity, and generates events that produce four possible actions: > # Container de-reservations > # Resource-based preemptions > # Container-based preemptions > # Container killing > The actions listed above are progressively more costly, and it is up to the > policy to use them as desired to achieve the rebalancing goals. > Note that due to the "lag" in the effect of these actions the policy should > operate at the macroscopic level (e.g., preempt tens of containers > from a queue) and not trying to tightly and consistently micromanage > container allocations. > - Preemption policy (ProportionalCapacityPreemptionPolicy): > - > Preemption policies are by design pluggable, in the following we present an > initial policy (ProportionalCapacityPreemptionPolicy) we have been > experimenting with. The ProportionalCapacityPreemptionPolicy behaves as > follows: > # it gathers from the scheduler the state of the queues, in particular, their > current capacity, guaranteed capacity and pending requests (*) > # if there are pending requests from queues that are under capacity it > computes a new ideal balanced state (**) > # it computes the set of preemptions needed to repair the current schedule > and achieve capacity balance (accounting for natural completion rates, and > respecting bounds on the amount of preemption we allow for each round) > # it selects which applications to preempt from each over-capacity queue (the > last one in the FIFO order) > # it remove reservations from the most recently assigned app until the amount > of resource to reclaim is obtained, or until no more reservations exits > # (if not enough) it issues preemptions for containers from the same > applications (reverse chronological order, last assigned container first) > again until necessary or until no containers except the AM container are left, > # (if not enough) it moves onto unreserve and preempt from the next > application. > # containers that have been asked to preempt are tracked across executions. > If a containers is among the one to be preempted for more than a certain > time, the container is moved in a the list of containers to be forcibly > killed. > Notes: > (*) at the moment, in order to avoid double-counting of the requests, we only > look at the "ANY" part of pending resource requests, which means we might not > preempt on behalf of AMs that ask only for specific locations but not any. > (**) The ideal balance state is one in which each queue has at least its > guaranteed capacity, and the spare capacity is distributed among queues (that > wants some) as a weighted fair share. Where the weighting is based on the > guaranteed capacity of a queue, and the function runs to a fix point. > Tunables of the ProportionalCapacityPreemptionPolicy: > # observe-only mode (i.e., log the actions it would take, but behave as > read-only) > # how frequently to run the policy > # how long to wait between preemption and kill of a container > # which fraction of the containers I would like to obtain should I preempt > (has to do with the natural rate at which containers are returned) > # deadzone size, i.e., what % of over-capacity should I ignore (if we are off > perfect balance by some small % we ignore it) > # overall amount of preemption we can afford for each run of
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-569: -- Attachment: 3queues.pdf CapScheduler_with_preemption.pdf > CapacityScheduler: support for preemption (using a capacity monitor) > > > Key: YARN-569 > URL: https://issues.apache.org/jira/browse/YARN-569 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Carlo Curino > Attachments: 3queues.pdf, capacity.patch, > CapScheduler_with_preemption.pdf > > > There is a tension between the fast-pace reactive role of the > CapacityScheduler, which needs to respond quickly to > applications resource requests, and node updates, and the more introspective, > time-based considerations > needed to observe and correct for capacity balance. To this purpose we opted > instead of hacking the delicate > mechanisms of the CapacityScheduler directly to add support for preemption by > means of a "Capacity Monitor", > which can be run optionally as a separate service (much like the > NMLivelinessMonitor). > The capacity monitor (similarly to equivalent functionalities in the fairness > scheduler) operates running on intervals > (e.g., every 3 seconds), observe the state of the assignment of resources to > queues from the capacity scheduler, > performs off-line computation to determine if preemption is needed, and how > best to "edit" the current schedule to > improve capacity, and generates events that produce four possible actions: > # Container de-reservations > # Resource-based preemptions > # Container-based preemptions > # Container killing > The actions listed above are progressively more costly, and it is up to the > policy to use them as desired to achieve the rebalancing goals. > Note that due to the "lag" in the effect of these actions the policy should > operate at the macroscopic level (e.g., preempt tens of containers > from a queue) and not trying to tightly and consistently micromanage > container allocations. > - Preemption policy (ProportionalCapacityPreemptionPolicy): > - > Preemption policies are by design pluggable, in the following we present an > initial policy (ProportionalCapacityPreemptionPolicy) we have been > experimenting with. The ProportionalCapacityPreemptionPolicy behaves as > follows: > # it gathers from the scheduler the state of the queues, in particular, their > current capacity, guaranteed capacity and pending requests (*) > # if there are pending requests from queues that are under capacity it > computes a new ideal balanced state (**) > # it computes the set of preemptions needed to repair the current schedule > and achieve capacity balance (accounting for natural completion rates, and > respecting bounds on the amount of preemption we allow for each round) > # it selects which applications to preempt from each over-capacity queue (the > last one in the FIFO order) > # it remove reservations from the most recently assigned app until the amount > of resource to reclaim is obtained, or until no more reservations exits > # (if not enough) it issues preemptions for containers from the same > applications (reverse chronological order, last assigned container first) > again until necessary or until no containers except the AM container are left, > # (if not enough) it moves onto unreserve and preempt from the next > application. > # containers that have been asked to preempt are tracked across executions. > If a containers is among the one to be preempted for more than a certain > time, the container is moved in a the list of containers to be forcibly > killed. > Notes: > (*) at the moment, in order to avoid double-counting of the requests, we only > look at the "ANY" part of pending resource requests, which means we might not > preempt on behalf of AMs that ask only for specific locations but not any. > (**) The ideal balance state is one in which each queue has at least its > guaranteed capacity, and the spare capacity is distributed among queues (that > wants some) as a weighted fair share. Where the weighting is based on the > guaranteed capacity of a queue, and the function runs to a fix point. > Tunables of the ProportionalCapacityPreemptionPolicy: > # observe-only mode (i.e., log the actions it would take, but behave as > read-only) > # how frequently to run the policy > # how long to wait between preemption and kill of a container > # which fraction of the containers I would like to obtain should I preempt > (has to do with the natural rate at which containers are returned) > # deadzone size, i.e., what % of over-capacity should I ignore (if we are off > perfect balance by some small % we ignore it) > # overall amou
[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption
[ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-568: -- Attachment: fair.patch > FairScheduler: support for work-preserving preemption > -- > > Key: YARN-568 > URL: https://issues.apache.org/jira/browse/YARN-568 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Carlo Curino > Attachments: fair.patch > > > In the attached patch, we modified the FairScheduler to substitute its > preemption-by-killling with a work-preserving version of preemption (followed > by killing if the AMs do not respond quickly enough). This should allows to > run preemption checking more often, but kill less often (proper tuning to be > investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-567: -- Attachment: common.patch > RM changes to support preemption for FairScheduler and CapacityScheduler > > > Key: YARN-567 > URL: https://issues.apache.org/jira/browse/YARN-567 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: common.patch > > > A common tradeoff in scheduling jobs is between keeping the cluster busy and > enforcing capacity/fairness properties. FairScheduler and CapacityScheduler > takes opposite stance on how to achieve this. > The FairScheduler, leverages task-killing to quickly reclaim resources from > currently running jobs and redistributing them among new jobs, thus keeping > the cluster busy but waste useful work. The CapacityScheduler is typically > tuned > to limit the portion of the cluster used by each queue so that the likelihood > of violating capacity is low, thus never wasting work, but risking to keep > the cluster underutilized or have jobs waiting to obtain their rightful > capacity. > By introducing the notion of a work-preserving preemption we can remove this > tradeoff. This requires a protocol for preemption (YARN-45), and > ApplicationMasters that can answer to preemption efficiently (e.g., by > saving their intermediate state, this will be posted for MapReduce in a > separate JIRA soon), together with a scheduler that can issues preemption > requests (discussed in separate JIRAs YARN-568 and YARN-569). > The changes we track with this JIRA are common to FairScheduler and > CapacityScheduler, and are mostly propagation of preemption decisions through > the ApplicationMastersService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption
[ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-568: -- Description: In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. was: In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45. > FairScheduler: support for work-preserving preemption > -- > > Key: YARN-568 > URL: https://issues.apache.org/jira/browse/YARN-568 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Carlo Curino > > In the attached patch, we modified the FairScheduler to substitute its > preemption-by-killling with a work-preserving version of preemption (followed > by killing if the AMs do not respond quickly enough). This should allows to > run preemption checking more often, but kill less often (proper tuning to be > investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-569: -- Description: There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a "Capacity Monitor", which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to "edit" the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the "lag" in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the "ANY" part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it) # overall amount of preemption we can afford for each run of the policy (in terms of total cluster capacity) In our current experiments this set of tunables seem to be a good start to shape the preemption action properly. More sophisticated preemption policies could take into account different type of applications running, job priorities, cost of preemption, integral of capacity imbalance. This is very much a control-theory kind of problem, and some of the lessons on designing and tuning controllers are likely to apply. Generality: The monitor-based scheduler edit, and the preemption mechanisms we introduced here are designed to be more general than enforcing capacity/fairness, i
[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-567: -- Description: A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. was: A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. > RM changes to support preemption for FairScheduler and CapacityScheduler > > > Key: YARN-567 > URL: https://issues.apache.org/jira/browse/YARN-567 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > > A common tradeoff in scheduling jobs is between keeping the cluster busy and > enforcing capacity/fairness properties. FairScheduler and CapacityScheduler > takes opposite stance on how to achieve this. > The FairScheduler, leverages task-killing to quickly reclaim resources from > currently running jobs and redistributing them among new jobs, thus keeping > the cluster busy but waste useful work. The CapacityScheduler is typically > tuned > to limit the portion of the cluster used by each queue so that the likelihood > of violating capacity is low, thus never wasting work, but risking to keep > the cluster underutilized or have jobs waiting to obtain their rightful > capacity. > By introducing the notion of a work-preserving preemption we can remove this > tradeoff. This requires a protocol for preemption (YARN-45), and > ApplicationMasters that can answer to preemption efficiently (e.g., by > saving their intermediate state, this will be posted for MapReduce in a > separate JIRA soon), together with a scheduler that can issues preemption > requests (discussed in separate JIRAs YARN-568 and YARN-569). > The changes we track with this JIRA are common to FairScheduler and > CapacityScheduler, and are mostly propagation of preemption decisions through > the ApplicationMastersService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
Carlo Curino created YARN-569: - Summary: CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Carlo Curino There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a "Capacity Monitor", which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to "edit" the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the "lag" in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the "ANY" part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it) # overall amount of preemption we can afford for each run of the policy (in terms of total cluster capacity) In our current experiments this set of tunables seem to be a good start to shape the preemption action properly. More sophisticated preemption policies could take into account different type of applications running, job priorities, cost of preemption, integral of capacity imbalance. This is very much a control-theory kind of problem, and some of the lessons on designing and tuning c
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628930#comment-13628930 ] Carlo Curino commented on YARN-45: -- Sorry I read only your last comment and answered to that... Regarding your previous "larger" comment: - what you propose is somewhat of a combination of 1 and 2 above, where we give the AM a hint about what would happen at the container level if the pressure remains. I don't have strong feelings about it, I agree it is easy to do, and maybe is a good compromised. - however, I want to be able to maintain the tighter semantics of 1 (in case the ResourceRequest is not specified in the message), which forces the AM to preempt exactly the set of containers I am specifying. (now with very "targeted" ResourceRequest you can in practice do something similar). This covers use cases like the one I mentioned above. We are posting more code in YARN-567 YARN-568 and YARN-569, check it out, it might provide context for this conversation. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-568) FairScheduler: support for work-preserving preemption
Carlo Curino created YARN-568: - Summary: FairScheduler: support for work-preserving preemption Key: YARN-568 URL: https://issues.apache.org/jira/browse/YARN-568 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Carlo Curino In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628922#comment-13628922 ] Carlo Curino commented on YARN-45: -- Our main focus for now is to rebalance capacity, in this sense yes location is not important. However, one can envision the use of preemption also for other things, e.g., to build a monitor that tries to improve data-locality by issuing (a moderate amount of) "relocations" of a container (probably riding the same checkpointing mechanics we are bulding for MR). This is another case where container-based preemption can turn out to be useful. (This is at the moment just a speculation). > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler
Carlo Curino created YARN-567: - Summary: RM changes to support preemption for FairScheduler and CapacityScheduler Key: YARN-567 URL: https://issues.apache.org/jira/browse/YARN-567 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling
[ https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628902#comment-13628902 ] Hudson commented on YARN-487: - Integrated in Hadoop-Hdfs-trunk #1369 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/]) YARN-487. Modify path manipulation in LocalDirsHandlerService to let TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 1466746) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466746 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java > TestDiskFailures fails on Windows due to path mishandling > - > > Key: YARN-487 > URL: https://issues.apache.org/jira/browse/YARN-487 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 3.0.0 > > Attachments: YARN-487.1.patch > > > {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an > extra leading '/' on the path within {{LocalDirsHandlerService}} when running > on Windows. The test assertions also fail to account for the fact that > {{Path}} normalizes '\' to '/'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails
[ https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628898#comment-13628898 ] Hudson commented on YARN-539: - Integrated in Hadoop-Hdfs-trunk #1369 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/]) YARN-539. Addressed memory leak of LocalResource objects NM when a resource localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466756 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java > LocalizedResources are leaked in memory in case resource localization fails > --- > > Key: YARN-539 > URL: https://issues.apache.org/jira/browse/YARN-539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, > yarn-539-20130410.patch > > > If resource localization fails then resource remains in memory and is > 1) Either cleaned up when next time cache cleanup runs and there is space > crunch. (If sufficient space in cache is available then it will remain in > memory). > 2) reused if LocalizationRequest comes again for the same resource. > I think when resource localization fails then that event should be sent to > LocalResourceTracker which will then remove it from its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628897#comment-13628897 ] Hudson commented on YARN-495: - Integrated in Hadoop-Hdfs-trunk #1369 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/]) YARN-495. Changed NM reboot behaviour to be a simple resync - kill all containers and re-register with RM. Contributed by Jian He. (Revision 1466752) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466752 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeAction.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java > Change NM behavior of reboot to resync > -- > > Key: YARN-495 > URL: https://issues.apache.org/jira/browse/YARN-495 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.0.5-beta > > Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch, > YARN-495.4.patch, YARN-495.5.patch, YARN-495.6.patch > > > When a reboot command is sent from RM, the node manager doesn't clean up the > containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628893#comment-13628893 ] Alejandro Abdelnur commented on YARN-45: Forgot to add, unless I'm missing something location of the preemption is not important, just capacity, right? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628889#comment-13628889 ] Alejandro Abdelnur commented on YARN-45: Carlo, what about a small twist? A preempt message (instead of request, as there is no preempt response) would contain: * Resources (# CPUs & # Memory) : total amount of resources that may be preempted if no action is taken by the AM. * Set : list of containers that would be killed by the RM to claim the resources if no action is taken by the AM. Computing the resources is straight forward, just aggregating the resources of the Set. An AM can take action using either or information. If an AM releases the requested amount of resources, even if they don't match the received container IDs, then the AM will not be over threshold anymore, thus getting rid of the preemption pressure fully or partially. If the AM fullfils the preemption only partially, then the RM will still kill some containers from the set. As the set is not ordered, still it is not known to the AM what containers will exactly be killed. So the set is just the list of containers in danger of being preempted. I may be backtracking a bit on my previous comments, 'trading these containers for equivalent ones' seems acceptable and gives the scheduler some freedom on how to best take care of things if an AM is over limit. If an AM releases the requested amount of resources, regardless of what containers releases, the AM won't be preempted for this preemption message. We just need to clearly spell out the behavior. With this approach I think we don't need #1 and #2? Thoughts? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628868#comment-13628868 ] Hadoop QA commented on YARN-427: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578200/YARN-427-trunk-b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/716//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/716//console This message is automatically generated. > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, > YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, > YARN-427-trunk-b.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628860#comment-13628860 ] Aleksey Gorshkov commented on YARN-427: --- Patches updated patch YARN-427-trunk-b.patch for trunk patch YARN-427-branch-2-b.patch for branch-2 patch YARN-427-branch-0.23-b.patch for branch-0.23 > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, > YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, > YARN-427-trunk-b.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: YARN-427-trunk-b.patch > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, > YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, > YARN-427-trunk-b.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: YARN-427-branch-2-b.patch YARN-427-branch-0.23-b.patch > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, > YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, > YARN-427-trunk-b.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628850#comment-13628850 ] Hadoop QA commented on YARN-427: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578196/YARN-427-trunk-b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/715//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/715//console This message is automatically generated. > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, > YARN-427-trunk-a.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: (was: YARN-427-branch-2-b.patch) > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, > YARN-427-trunk-a.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: (was: YARN-427-trunk-b.patch) > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, > YARN-427-trunk-a.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: YARN-427-trunk-b.patch > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, > YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, > YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: YARN-427-branch-2-b.patch > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, > YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, > YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling
[ https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628830#comment-13628830 ] Hudson commented on YARN-487: - Integrated in Hadoop-Yarn-trunk #180 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/180/]) YARN-487. Modify path manipulation in LocalDirsHandlerService to let TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 1466746) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466746 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java > TestDiskFailures fails on Windows due to path mishandling > - > > Key: YARN-487 > URL: https://issues.apache.org/jira/browse/YARN-487 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 3.0.0 > > Attachments: YARN-487.1.patch > > > {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an > extra leading '/' on the path within {{LocalDirsHandlerService}} when running > on Windows. The test assertions also fail to account for the fact that > {{Path}} normalizes '\' to '/'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails
[ https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628826#comment-13628826 ] Hudson commented on YARN-539: - Integrated in Hadoop-Yarn-trunk #180 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/180/]) YARN-539. Addressed memory leak of LocalResource objects NM when a resource localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466756 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java > LocalizedResources are leaked in memory in case resource localization fails > --- > > Key: YARN-539 > URL: https://issues.apache.org/jira/browse/YARN-539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, > yarn-539-20130410.patch > > > If resource localization fails then resource remains in memory and is > 1) Either cleaned up when next time cache cleanup runs and there is space > crunch. (If sufficient space in cache is available then it will remain in > memory). > 2) reused if LocalizationRequest comes again for the same resource. > I think when resource localization fails then that event should be sent to > LocalResourceTracker which will then remove it from its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628825#comment-13628825 ] Hudson commented on YARN-495: - Integrated in Hadoop-Yarn-trunk #180 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/180/]) YARN-495. Changed NM reboot behaviour to be a simple resync - kill all containers and re-register with RM. Contributed by Jian He. (Revision 1466752) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1466752 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeAction.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java > Change NM behavior of reboot to resync > -- > > Key: YARN-495 > URL: https://issues.apache.org/jira/browse/YARN-495 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.0.5-beta > > Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch, > YARN-495.4.patch, YARN-495.5.patch, YARN-495.6.patch > > > When a reboot command is sent from RM, the node manager doesn't clean up the > containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira