[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1680: -- Assignee: Chen He (was: Craig Welch) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3320) Support a Priority OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3320: -- Assignee: Wangda Tan (was: Craig Welch) Support a Priority OrderingPolicy - Key: YARN-3320 URL: https://issues.apache.org/jira/browse/YARN-3320 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Wangda Tan When [YARN-2004] is complete, bring relevant logic into the OrderingPolicy framework -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584266#comment-14584266 ] Craig Welch commented on YARN-1680: --- [~airbots], unfortunately, I'm having no more luck seeing this through than you have had! I have gone ahead and handed this back to you, if you don't believe you'll have time to work on it, you might want to see if [~leftnoteasy] is interested in picking it up. Thanks. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1198: -- Assignee: Wangda Tan (was: Craig Welch) Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Wangda Tan Attachments: YARN-1198.1.patch, YARN-1198.10.patch, YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.12-with-1857.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584271#comment-14584271 ] Craig Welch commented on YARN-1039: --- I'll go back to my earlier assertion that I think it's not duration we are really concerned with here, that is covered in various ways in other places, but more the notion of an application type, a batch or a service, with the defining characteristic being for the potential of continuous operation (service) or unit of work which will run to completion (batch), and an enumeration of service and batch make sense to me. In any case, [~vinodkv], it seems that there still seems to be enough diversity of opinion here to require some ongoing discussion/reconciliation, so I will leave this in your capable hands. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3510: -- Assignee: Wangda Tan (was: Craig Welch) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Wangda Tan Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, YARN-3510.6.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1039: -- Assignee: Vinod Kumar Vavilapalli (was: Craig Welch) Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571405#comment-14571405 ] Craig Welch commented on YARN-3510: --- [~leftnoteasy], I think [~sunilg] was referring to container priorities, not application priorities [~sunilg], container priorities are still taken into account wrt to ordering of container preemption within an application just as they are today. So typically an application would not have higher level priority containers preempted until after any lower level ones had been, and it would only be in cases where there was a wide discrepancy in usage between applications and a need for a significant preemption to rebalance queues where I would expect any given application would end up giving up high priority containers. That's not actually a new behavior for the capacity scheduler preemption, the existing logic already works this way. Put another way, the approach will tend to avoid preempting high priority containers as a rule, but it could happen as you describe in some cases. Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, YARN-3510.6.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570039#comment-14570039 ] Craig Welch commented on YARN-3510: --- [~leftnoteasy] and I had some offline discussion. The patch currently here is simply meant to keep from unbalancing whatever allocation process is active by, generally, keeping relative usage between applications the same. It doesn't attempt to actively re-allocate in a way which achieves the overall allocation policy, i.e., as if all the applications had started at once. (this is a more complex proposition, obviously). There's a desire to have this because, among other things, sometime down the road we may do preemption just among users/applications in a queue and it will be necessary for the preemption to actively work toward the allocation goals to do that, rather than just maintain current levels. This will add some medium level complexity to the current patch, deltas with the current approach are: Since the effect of preemption on order for fairness doesn't occur until the container is released, and we want to consider it right away, there will need to be a need to retain info about pending preemption for comparison on the app resources (it will be a deduction from usage for ordering purposes, as if the preemption had already happened) The preemptEvenly loop will need to reorder the app which was preempted after each preemption and then restart the iteration over apps (not necessarily over all apps, again, just until the first preemption) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, YARN-3510.6.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3510: -- Attachment: YARN-3510.2.patch Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3510: -- Attachment: YARN-3510.3.patch Remove some unnecessary changes to other preemption tests introduced while exploring behavior... Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch, YARN-3510.3.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3510: -- Attachment: YARN-3510.5.patch Found a couple little missed things improved documentation of the new configuration parameter. Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563983#comment-14563983 ] Craig Welch commented on YARN-3510: --- The attached patch supports an optional configuration option for the preemption policy, {code} yarn.resourcemanager.monitor.capacity.preemption.preempt_evenly{code}, which (when set to true) causes the policy to only preempt one live container per application per round, and to do multiple rounds until the desired resources are obtained (or no further progress is occurring), so that preemption should generally maintain existing relative usage between apps. This is in contrast to the default behavior (when unset or set to false) (equivalent to the existing behavior), which is to take as much as possible from each app in order of the preemption iterator. The default works well for the fifo case, but will unbalance usage between apps in the fair case. Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
[ https://issues.apache.org/jira/browse/YARN-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3510: -- Attachment: YARN-3510.6.patch Ran all of the tests listed as failing or timing out on my box with the patch and the all pass, must be a build server issue or something of that nature. Clicking on the findbugs link indicates that there are no findbugs issues (0 listed), is there something wrong with the feedback process? Fixed all of the checkstyle issues except one which I don't think is important. Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3510.2.patch, YARN-3510.3.patch, YARN-3510.5.patch, YARN-3510.6.patch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560210#comment-14560210 ] Craig Welch commented on YARN-3626: --- The checkstyle is insignificant, the rest is all good. On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.15.patch, YARN-3626.16.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.15.patch On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.15.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559935#comment-14559935 ] Craig Welch commented on YARN-3626: --- [~vinodkv] indicated it's private Windows specific, all for doing something better overall long term aka [YARN-3685] [~cnauroth] switched to simpler valueOf On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.15.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.16.patch Sure, annotated also On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.15.patch, YARN-3626.16.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556814#comment-14556814 ] Craig Welch commented on YARN-3632: --- bq. {code} if (application.updateResourceRequests(ask)) { } {code} No, I want to avoid any possible interleaving of locks between the application and the queue, getting the ordering policy locks the queue briefly and this should not happen inside an application lock. bq. {code} updateDemandForQueue {code} The demand is being updated for that queue, I think the naming is clear enough. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556767#comment-14556767 ] Craig Welch commented on YARN-3632: --- bq. 1) ... done bq. 2) ... done bq. 3) ... done Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.7.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556922#comment-14556922 ] Craig Welch commented on YARN-3632: --- BTW, the whitespace and checkstyle look to be unimportant, the javac unrelated, and TestNodeLabelContainerAllocation passes fine for me with the patch so it is also unrelated. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555302#comment-14555302 ] Craig Welch commented on YARN-3632: --- Just uploaded a patch which addresses the comments. It holds off on reordering entities until just before iteration to avoid the unnecessary repeated reordering detailed above. bq. this null check is not needed, if it can never be null; It can be null if the asks are empty ( in which case, we don't want to queue for reordering, and don't :-) ) Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.6.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552686#comment-14552686 ] Craig Welch commented on YARN-3626: --- Checkstyle looks insignificant. [~cnauroth], [~vinodkv], I've changed the approach to use the environment instead of configuration as suggested, can one of you review pls? On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552700#comment-14552700 ] Craig Welch commented on YARN-3681: --- [~varun_saxena] the patch you had doesn't apply properly for me, I've uploaded a patch which does the same things which does, and which I've had the opportunity to test. @xgong, can you take a look at this one (.0.patch)? Thanks. yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.0.patch yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.branch-2.0.patch Here is one for branch-2 yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, YARN-3681.branch-2.0.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3681: -- Attachment: YARN-3681.1.patch Oh the irony, neither did my own. Updated to one which does. yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.0.patch, YARN-3681.01.patch, YARN-3681.1.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.14.patch On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551749#comment-14551749 ] Craig Welch commented on YARN-3681: --- Tested my own version of this patch yesterday which does the same thing and works, so +1 LGTM yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.01.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546374#comment-14546374 ] Craig Welch commented on YARN-3626: --- Right, going back to [~cnauroth], [~vinodkv], we chatted and you asserted that the original approach can't work, but it seemed to, it's not entirely clear to me why it shouldn't... On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.5.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546271#comment-14546271 ] Craig Welch commented on YARN-3632: --- One line change to address missing whitespace issue. Again, the javac and findbugs don't appear to have anything to do with the patch. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546288#comment-14546288 ] Craig Welch commented on YARN-3626: --- [~cnauroth], [~vvasudev] - This patch goes back to the original approach I passed by you offline - the fix itself is the same, but it uses the classpath instead of configuration to determine when the behavior should change. Your thoughts? On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546370#comment-14546370 ] Craig Welch commented on YARN-3626: --- bq. Can't we do the above? We definitely cannot insert mapreduce incantations like job.jar into YARN. That's why I took the config based approach, which apparently is invalid... but it also worked, which is quite confusing. I'm going to go back and validate our reasoning for believing it shoudn't. bq. Can't we do the above? We definitely cannot insert mapreduce incantations like job.jar into YARN. I suppose we can if it would work. It needs to be something which can be propagated from Oozie, which adds additional complexity. Ideally, we need something the MrApps can set based on the presence of the mapred configuration so that it propagates through. Do we have an example of this being done elsewhere? On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.9.patch In that case, here's a patch which goes back to the original approach used during troubleshooting, which uses the classpath itself to communicate the difference (it only touches other code to revert parts of the earlier patch no longer needed, the actual change, when done this way, is solely in ContainerLaunch.java, and it makes the conditional determination based on the classpath differences already present due to the manipulation earlier in the chain, in this case, by mapreduce due to user.classpath.first) On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546403#comment-14546403 ] Craig Welch commented on YARN-3632: --- findbugs and javac appear to be irrelevant... Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.11.patch Now using the environment to pass the configuration. On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.4.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546110#comment-14546110 ] Craig Welch commented on YARN-3632: --- Uploaded patch to address checkstyle and whitespace concerns and to move queue acquisition from app inside the app syncronization. javac error unrelated, for an untouched class. {code} [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java:[2171,32] [unchecked] Possible heap pollution from parameterized vararg type E [WARNING] where E is a type-variable: {code} Findbugs also appears unrelated, no changes appear to call into the area where the concern lies (it looks similar to the javac location, as though it missed a commit in the comparison??) Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.1.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.3.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544851#comment-14544851 ] Craig Welch commented on YARN-3632: --- Now with testing Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.6.patch Fix broken unit tests On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542559#comment-14542559 ] Craig Welch commented on YARN-3626: --- Checkstyle looks unimportant, everything else OK On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3626.0.patch, YARN-3626.4.patch, YARN-3626.6.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
[ https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542746#comment-14542746 ] Craig Welch commented on YARN-3306: --- I don't think it's really necessary at this point, we've been careful to keep the integration points narrow and controllable with configuration, as long as we continue to do that I don't think we need the separation of a branch. [Umbrella] Proposing per-queue Policy driven scheduling in YARN --- Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: PerQueuePolicydrivenschedulinginYARN.pdf Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a perÂ-queue policy driven architecture. We propose the creation of a c​ommon policy framework​ and implement a​common set of policies​ that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the l​eaf Âqueue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
[ https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542829#comment-14542829 ] Craig Welch commented on YARN-3306: --- bq. Integration points being narrow is probably more a reason to work on a branch I don't see why - it seems to me to suggest the opposite, that we are able to achieve isolation of the functionality in the main codebase without risk bq. The code looks quite isolated Another reason we don't need to work in a branch We're trying to approach this iteratively, building specific, narrow functionalities to completion and then making them available for use and feedback, this will be difficult if it's all isolated away in a branch. The approach so far works well for that process - much better than doing all the work in isolation and then bringing a much larger change into the main codebase all at once. As far as I can tell, separating this out into a branch is a net negative- there is overhead to doing so and it runs contrary to the iterative approach we're trying to take, without providing any clear benefit. [Umbrella] Proposing per-queue Policy driven scheduling in YARN --- Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: PerQueuePolicydrivenschedulinginYARN.pdf Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a perÂ-queue policy driven architecture. We propose the creation of a c​ommon policy framework​ and implement a​common set of policies​ that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the l​eaf Âqueue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.4.patch With testing On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3626.0.patch, YARN-3626.4.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.0.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Bug Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
Craig Welch created YARN-3632: - Summary: Ordering policy should be allowed to reorder an application when demand changes Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Bug Reporter: Craig Welch Assignee: Craig Welch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538849#comment-14538849 ] Craig Welch commented on YARN-3626: --- To resolve this, the situation should be detected and, when applicable, localized resources should be put at the beginning of the classpath rather than the end. On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
Craig Welch created YARN-3626: - Summary: On Windows localized resources are not moved to the front of the classpath when they should be Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.0.patch The attached patch propagates the conditional as a yarn configuration option and moves localized resources to the front of the classpath when appropriate On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3626.0.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533684#comment-14533684 ] Craig Welch commented on YARN-1680: --- bq. This requires when a node doing heartbeat with changed available resource, all apps blacklisted the node need to be notified Well, that's not quite so. From what we were talking about, it means that the blacklist deduction can't be a fixed amount but that it needs to be calculated by looking at the unused resource of the blacklisted nodes during headroom calculation. The rest of the above proposal for detecting changes, etc, works, but instead of a static deduction value we would need a reference to the blacklisted nodes for the app and look at their unused resources during the apps headroom calculation, so there is that cost, but it's not related to the heartbeat or a notification as such bq. headroom for app could be under estimated I think, generally, we should not take an approach which will underestimate/underutilize if we have 6302 to fall back on. If we don't, then we might want to do it only if we decide not to do the accurate calculation in some cases based on limits (see immediately below), but not as a matter of course. bq. Only do accurate headroom calculation when there're not too much blacklisted nodes as well as apps with blacklisted nodes. I think if we put a limit on it, it should be a purely local decision, to only do the calculation with x blacklisted nodes for an app, which we would expect to rarely be an issue. There is a potential for performance issues here, but we don't really know how great a concern it is. bq. MAPREDUCE-6302 is targeting to preempt reducer even if we reported inaccurate headroom for apps. I think the approach looks good to me I think that may work as a fallback option for MR, assuming it works out without issue, if we decide to not do the proper headroom calculation in some cases, but that's MR specific so it won't help non MR apps, and it has the issues I brought up before with performance degradation vs the proper headroom calculation. For these reasons I don't think it's a substitute for fixing this issue overall, it may be a fallback option if we limit the cases where we do the proper adjustment. bq. Move headroom calculation to application side, I think now we cannot do it at least for now...Application will only receive updated NodeReport from when node changes heathy status instead of regular heartbeat Well, in some sense that works OK for this because we really only need to know about those changes in node status status wrt the blacklist to detect recalculation changes with the approach proposed above. The problem is that we will also need a way to query for current usage per node while doing the calculation, I don't know if an efficient call for that exists (it would ideally be batch for N nodes where we would ask for all the blacklisted nodes at once.) There is also the broader issue that we don't seem to have a single entry point client-side for doing this right now, so we would need to touch a few points to add a library/something of that nature to do this, and for AM's we may not be aware of/that are not part of the core, they would have to potentially do some integration to get this. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531635#comment-14531635 ] Craig Welch commented on YARN-1680: --- [~leftnoteasy] bq. Actually I think this statement may not true, assume we compute an accurate headroom for app, but that doesn't mean the app can get as much resource as we compute...you may not be able to get it after hours. This would only occur if other applications were allocated those resources, in which case the headroom will drop and the application will be made aware of it via headroom updates. The scenario you propose as a counter example is inaccurate. It is the case that accurate headroom (including a fix for the blacklist issue here) will result in faster overall job completion than the reactionary approach with allocation failure. [~vinodkv] bq. OTOH, blacklisting / hard-locality are app-decisions. From the platform's perspective, those nodes, free or otherwise, are actually available for apps to use Not quite so, as the scheduler respects the blacklist and doesn't allocate containers to an app when it would run counter to the apps blacklisting That said, so far the discussion regarding the proposal has largely been about where the activity should live, let's put that aside for a moment and concentrate on the approach itself. With api additions / additional library work / etc it should be possible to do the same thing outside the scheduler as within. Whether and what to do in or out of the scheduler needs to be settled still, of course, but a decision on how the headroom will be adjusted is needed in any case, and and is needed before putting together the change wherever it ends up living. So: where app headroom is finalized == in the scheduler OR in a library available/used by AM's. if externalized, obviously api's to get whatever info is not yet available outside the scheduler will need to be added Retain a node/rack blacklist where app headroom is finalized (already the case) Add a last change timestamp or incrementing counter to track node addition/removal at the cluster level (which is what exists for cluster black/white listing afaict), updated when those events occur Add a last change timestamp/counter to where app headroom is finalized to track blacklist changes have last updated values on where app headroom is finalized to track the above two last change values, updated when blacklist values are recalculated On headroom calculation, where app headroom is finalized checks if it has any entries in the blacklist or if it has a blacklist deduction value in it's resourceusage entry (see below), to determine if blacklist must be taken into account if blacklist must be taken into account, check the last updated values for both cluster and app blacklist changes, if and only if either is stale (last updated != last change) then recalculate the blacklist deduction when calculating the blacklist deduction use Chen He basic logic from existing patches. Place the deduction value into where app headroom is finalized. NodeLables could be taken into account as well, only blacklist entries which match the nodelabel expression used by the application would be added to the deduction, if a nl expression is in play whenever the headroom is generated where app headroom is finalized, perform the blacklist value deduction availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529644#comment-14529644 ] Craig Welch commented on YARN-1680: --- bq. Please leave out the head-room concerns w.r.t node-labels. IIRC, we had tickets at YARN-796 tracking that. It is very likely a completely different solution, so. I'm not sure that's so - there is already a process of calculating headroom for labels associated with an application, the above is an extension of that to blacklisted nodes to handle label cases. If we leave it out, then the solution won't work for node-labels, and it can be made to do so, so that would be a loss. bq. When I said node-labels above, I meant partitions. Clearly the problem and the corresponding solution will likely be very similar for node-constraints (one type of node-labels). After all, blacklisting is a type of (anti) node-constraint. It could be modeled that way, but then it will be qualitatively different from the solution for non-label cases, which is not a good thing... bq. There is no notion of a cluster-level blacklisting in YARN. We have notions of unhealthy/lost/decommissioned nodes in a cluster. This is what I am referring when I say: bq. addition/removal at the cluster level I'm not suggesting/referring to anything other than nodes entering/leaving the cluster bq. Coming to the app-level blacklisting, clearly, the solution proposed is better than dead-locks. But blindly reducing the resources corresponding to blacklisted nodes will result in under-utilization (sometimes massively) and over-conservative scheduling requests by apps. So, that's the point of the recommended approach. The idea is to detect when it is necessary to recalculate the impact of the blacklisting on app headroom, which is when either blacklisting from the app has changed or the node composition of the cluster has changed (each of which should be relatively infrequent, certainly in relation to headroom calculation), and at that time to accurately calculate the impact by only adding the resource value nodes which actually exist from the blacklist into the value of the deduction. It isn't blindly reducing resources, it's doing it accurately, and should both prevent deadlocks and under-utilization bq. One way to resolve this is to get the apps (or optionally in the AMRMClient library) to deduct the resource unusable on blacklisted nodes It could be moved into the AM's or client library, but then they would have to do the same sort of thing, and then the logic needs to be duplicated amongst the AM's or will only be available to those which use the library (do they all?). It's worth considering if it can be made to cover them all via the library, but I'm not sure this isn't something which should be handled as part of the headroom calculation in the rm, as it is meant to provide this accurately, and is otherwise aware of the blacklist. Which suggested to me that we already have the blacklist for the application in the RM/available to the scheduler (I'm not sure why that wasn't obvious to me before...), which does appear to be the case and which therefore drops out concerns about adding it - it's already there... availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state
[ https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529488#comment-14529488 ] Craig Welch commented on YARN-2421: --- Hi [~lichangleo], thanks for working on this fix. Can you resolve the javac warning and run the TestRMContainerImpl test locally with the patch to verify the patch is not the cause? It seems to be persistently failing. CapacityScheduler still allocates containers to an app in the FINISHING state - Key: YARN-2421 URL: https://issues.apache.org/jira/browse/YARN-2421 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.1 Reporter: Thomas Graves Assignee: Chang Li Attachments: yarn2421.patch, yarn2421.patch, yarn2421.patch I saw an instance of a bad application master where it unregistered with the RM but then continued to call into allocate. The RMAppAttempt went to the FINISHING state, but the capacity scheduler kept allocating it containers. We should probably have the capacity scheduler check that the application isn't in one of the terminal states before giving it containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529738#comment-14529738 ] Craig Welch commented on YARN-1680: --- bq. I think we should stop adding such application-specific logic into RM, application can have very varied resource request, for example On the whole I think that's a reasonable perspective, but I'm not sure this is the right place to write the line in the sand. It isn't clear to me that this will be particularly costly, and the deadlock issues are quite real. Further, it seems to me that nodelable specific calculations are very much of the same cloth as this / the same type of problem and cost, and they are in the scheduler. So why not this also? And if this doesn't belong in the scheduler, I'd suggest that nodelable specific headroom logic probably doesn't belong there either. bq. In short term, treat the headroom just a hint, like what Karthik Kambatla mentioned I think that's a nice idea, but it doesn't make up for having accurate headroom. It may keep these cases from leading to deadlock but there will be a cost, the job will be slowed as it reacts after the fact to allocation failures - so job completion will slow. Better than a deadlock, but not as good as if it had received accurate headroom and could have avoided the reactionary delay. There may be other issues with that change as well, I'm not sure it should be undertaken lightly or that we should take a dependency on it to solve this issue. bq. In longer term, support headroom calculation in client-side utils, maybe AMRMClient is a good place. or at least divide the headroom calculation between the scheduler and elsewhere. This brings back the earlier question - do we have a good place we can do this which is shared among the AM implementations so that we don't have duplicated logic. I'm still skeptical that this is really the right time and place to begin making that transition - but assuming we want to at some point, it's worth seeing if we have a good place to do it - is it AMRMClient (Is this the same as the AMRMClientLibrary for purposes of this discussion???) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529224#comment-14529224 ] Craig Welch commented on YARN-1680: --- I've been looking over [~airbots] prior patches, the discussion, etc, this was what I was going to suggest as an approach. As I mentioned before, I think that accuracy will unfortunately require holding on to the blacklist in the scheduler app, I think this is OK because these should be relatively small, but it is still a drawback. We could impose a limit to size as a mitigating factor, but that could affect accuracy in some cases as well. In any event, this is the approach I'm suggesting: Retain a node/rack blacklist in the scheduler application based on addition/removals from the application master Add a last change timestamp or incrementing counter to track node addition/removal at the cluster level (which is what exists for cluster black/white listing afaict), updated when those events occur Add a last change timestamp/counter to the application to track blacklist changes have last updated values on the application to track the above two last change values, updated when blacklist values are recalculated On headroom calculation, the app checks if it has any entries in the blacklist or if it has a blacklist deduction value in it's resourceusage entry (see below), to determine if blacklist must be taken into account if blacklist must be taken into account, check the last updated values for both cluster and app blacklist changes, if and only if either is stale (last updated != last change) then recalculate the blacklist deduction when calculating the blacklist deduction use [~airbots] basic logic from existing patches. Place the deduction value into a new enumeration index type in ResourceUsage. NodeLables could be taken into account as well, there is some logic about label(s) of interest on the application, in addition to a no label value which is generally applicable, a value for the label(s) of interest could be generated whenever the headroom is handed out by the provider, add a step which applies the proper blacklist deduction if present Thoughts on the approach? availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Craig Welch Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3165) Possible inconsistent queue state when queue reinitialization failed
[ https://issues.apache.org/jira/browse/YARN-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523807#comment-14523807 ] Craig Welch commented on YARN-3165: --- So, given that the old queue actually takes on the new values thereby is changed during the process, how would we do a rollback? Is an option to instead have a two stage commit where the validation occurs in one pass and then the taking on of the changes occurs in a second pass? Possible inconsistent queue state when queue reinitialization failed Key: YARN-3165 URL: https://issues.apache.org/jira/browse/YARN-3165 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He This came up in a discussion with [~chris.douglas]. If queue reinitialization failed in the middle, it is possible that queues are left in an inconsistent state - some queues are already updated, but some are not. One example is below code in leafQueue: {code} if (newMax.getMemory() oldMax.getMemory() || newMax.getVirtualCores() oldMax.getVirtualCores()) { throw new IOException( Trying to reinitialize + getQueuePath() + the maximum allocation size can not be decreased! + Current setting: + oldMax + , trying to set it to: + newMax); } {code} If exception is thrown here, the previous queues are already updated, but latter queues are not. So we should make queue reinitialization transactional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3211) Do not use zero as the beginning number for commands for LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523884#comment-14523884 ] Craig Welch commented on YARN-3211: --- Can you add comments in code explaining why the enum starts at 1 so that it is not confusing to others who look at the code down the line? Can you add a test for the change? What will happen after this change if the parsing encounters a non-numeric string? Do not use zero as the beginning number for commands for LinuxContainerExecutor --- Key: YARN-3211 URL: https://issues.apache.org/jira/browse/YARN-3211 Project: Hadoop YARN Issue Type: Bug Reporter: Liang-Chi Hsieh Priority: Minor Attachments: YARN-3211.patch Current the implementation of LinuxContainerExecutor and container-executor uses some numbers as its commands. The commands begin from zero (INITIALIZE_CONTAINER). When LinuxContainerExecutor gives the numeric command as the command line parameter to run container-executor. container-executor calls atoi() to parse the command string to integer. However, we know that atoi() will return zero when it can not parse the string to integer. So if you give an non-numeric command, container-executor still accepts it and runs INITIALIZE_CONTAINER command. I think it is wrong and we should not use zero as the beginning number of the commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523764#comment-14523764 ] Craig Welch commented on YARN-3126: --- Hi [~Xia Hu], thanks for putting together a patch for this. Could you add some unit tests to verify the fix? FairScheduler: queue's usedResource is always more than the maxResource limit - Key: YARN-3126 URL: https://issues.apache.org/jira/browse/YARN-3126 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.3.0 Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. Reporter: Xia Hu Labels: assignContainer, fairscheduler, resources Fix For: trunk-win Attachments: resourcelimit-02.patch, resourcelimit.patch When submitting spark application(both spark-on-yarn-cluster and spark-on-yarn-cleint model), the queue's usedResources assigned by fairscheduler always can be more than the queue's maxResources limit. And by reading codes of fairscheduler, I suppose this issue happened because of ignore to check the request resources when assign Container. Here is the detail: 1. choose a queue. In this process, it will check if queue's usedResource is bigger than its max, with assignContainerPreCheck. 2. then choose a app in the certain queue. 3. then choose a container. And here is the question, there is no check whether this container would make the queue sources over its max limit. If a queue's usedResource is 13G, the maxResource limit is 16G, then a container which asking for 4G resources may be assigned successful. This problem will always happen in spark application, cause we can ask for different container resources in different applications. By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3188) yarn application --list should list all the applications ( Not only submitted,accepted and running)
[ https://issues.apache.org/jira/browse/YARN-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523814#comment-14523814 ] Craig Welch commented on YARN-3188: --- I think that this consensus is that this is working as we would want it to. Any objection to resolving as such? yarn application --list should list all the applications ( Not only submitted,accepted and running) --- Key: YARN-3188 URL: https://issues.apache.org/jira/browse/YARN-3188 Project: Hadoop YARN Issue Type: Bug Components: applications, client Reporter: Anushri Assignee: Anushri Priority: Minor By default yarn application --list should list all the applications since we are not giving -appstate option. Currently it is giving like following.. {noformat} [hdfs@host194 bin]$ ./yarn application -list 15/02/12 19:33:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 15/02/12 19:33:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id Application-NameApplication-Type User Queue State Final-State ProgressTracking-URL application_1422888408992_15010 grep-search MAPREDUCE hdfs defaultACCEPTED UNDEFINED 0% N/A [ {noformat} *Can somebody please assign this issue to me..?* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3193) When visit standby RM webui, it will redirect to the active RM webui slowly.
[ https://issues.apache.org/jira/browse/YARN-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523824#comment-14523824 ] Craig Welch commented on YARN-3193: --- I believe the redirect intentionally takes a moment to allow the user to see that they did not hit the active rm, which is useful to them. In any case, it does not take very long, and the length of time can be adjusted if desired. I think this can be resolved, [~Japol] if you strongly feel otherwise, pls feel free to reopen explain. When visit standby RM webui, it will redirect to the active RM webui slowly. Key: YARN-3193 URL: https://issues.apache.org/jira/browse/YARN-3193 Project: Hadoop YARN Issue Type: Improvement Components: webapp Reporter: Japs_123 Priority: Minor when visit the standby RM web ui, it will redirect to the active RM web ui. but this redirect is very slow which give client bad experience. I have try to visit standby namenode, it directly show the web to client quickly. So, can we improve this experience with YARN like HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3193) When visit standby RM webui, it will redirect to the active RM webui slowly.
[ https://issues.apache.org/jira/browse/YARN-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch resolved YARN-3193. --- Resolution: Won't Fix When visit standby RM webui, it will redirect to the active RM webui slowly. Key: YARN-3193 URL: https://issues.apache.org/jira/browse/YARN-3193 Project: Hadoop YARN Issue Type: Improvement Components: webapp Reporter: Japs_123 Priority: Minor when visit the standby RM web ui, it will redirect to the active RM web ui. but this redirect is very slow which give client bad experience. I have try to visit standby namenode, it directly show the web to client quickly. So, can we improve this experience with YARN like HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3553) TreeSet is not a nice container for organizing schedulableEntities.
[ https://issues.apache.org/jira/browse/YARN-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523757#comment-14523757 ] Craig Welch commented on YARN-3553: --- [~xinxianyin], this pattern is important for these implementations for efficiency reasons - the frequency with which changes occur which effect ordering is much less than the frequency with which the applications must be available in the proper order for allocation (esp. when allocating on heartbeat, which is typical..). By iteratively resorting individual elements only when needed we avoid frequent resorting of all applications in the queue, which can be quite expensive with the frequency it would occur. Values are cached with a pretty simple update lifecycle to avoid the issues you are concerned about. Finally, this is an implementation specific choice, other implementations of ordering policies are free to use other data structures / sorting frequency, although the concern wrt efficiency which this approach avoids would apply to any non-iterative approaches. TreeSet is not a nice container for organizing schedulableEntities. --- Key: YARN-3553 URL: https://issues.apache.org/jira/browse/YARN-3553 Project: Hadoop YARN Issue Type: Wish Components: scheduler Reporter: Xianyin Xin For TreeSet, element is identified by comparator, not the object reference. If any *attributes that used for comparing two elements* of an specific element is modified by other methods, the TreeSet will be in an un-sorted state, and cannot become sorted forever except that we reconstruct another TreeSet with the elements. To avoid this, one must be *very careful* when they try to modify the attributes (such as increase or decrease the used capacity of a schedulabeEntity) of an object. An example in AbstractComparatorOrderingPolicy.java, Line63, {code} protected void reorderSchedulableEntity(S schedulableEntity) { //remove, update comparable data, and reinsert to update position in order schedulableEntities.remove(schedulableEntity); updateSchedulingResourceUsage( schedulableEntity.getSchedulingResourceUsage()); schedulableEntities.add(schedulableEntity); } {code} This method tries to remove the schedulableEntity first and then reinsert it so as to reorder the set. However, the changes of the schedulableEntity should be done in the middle of the above two operations. But the comparator of the class is not clear, so we don't know which attributes of the schedulableEntity was changed. If we changed the schedulableEntity outside the method and then inform the orderingPolicy that we made such a change, the operation schedulableEntities.remove(schedulableEntity) would not work correctly since the element of a TreeSet is identified by comparator. Any implement class of this abstract class should overwrite this method, but few does. Another choice is that we make modification of a schedulableEntity manually, but we mustn't forget to reorder the set when we do so and must remember the order: remove, modify the attributes(used for comparing), insert, or use an iterator to mark the schedulableEntity so that we can remove and reinsert it correctly. YARN-897 is an example that we fell into the trap. If the comparator become complex in future, e.g., we consider other types of resources in comparator, such traps will be more and disperse anywhere, which makes it easy to let a TreeSet become a un-sorted state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523794#comment-14523794 ] Craig Welch commented on YARN-3153: --- I think {code} yarn.scheduler.capacity.maximum-am-capacity-per-queue {code} should just be {code} yarn.scheduler.capacity.maximum-am-capacity {code} I don't know that we need the inheritance part - I think that the default per cluster + override per queue is sufficient. I think there will be some complexity around handling while both are present - which takes precedence if both are there? Instead of just taking a setting with fallback to default, I think we'll need to check if either is specified, with the new setting taking priority, and if neither is specified then fallback to the default (which is the effectively same either way...) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio -- Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as ratio, in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3188) yarn application --list should list all the applications ( Not only submitted,accepted and running)
[ https://issues.apache.org/jira/browse/YARN-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch resolved YARN-3188. --- Resolution: Won't Fix [~anushri] I'm going to go ahead and resolve this as the consensus appears to be that the behavior is best as it is. If you feel strongly otherwise feel free to reopen and explain. yarn application --list should list all the applications ( Not only submitted,accepted and running) --- Key: YARN-3188 URL: https://issues.apache.org/jira/browse/YARN-3188 Project: Hadoop YARN Issue Type: Bug Components: applications, client Reporter: Anushri Assignee: Anushri Priority: Minor By default yarn application --list should list all the applications since we are not giving -appstate option. Currently it is giving like following.. {noformat} [hdfs@host194 bin]$ ./yarn application -list 15/02/12 19:33:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 15/02/12 19:33:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id Application-NameApplication-Type User Queue State Final-State ProgressTracking-URL application_1422888408992_15010 grep-search MAPREDUCE hdfs defaultACCEPTED UNDEFINED 0% N/A [ {noformat} *Can somebody please assign this issue to me..?* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3121) FairScheduler preemption metrics
[ https://issues.apache.org/jira/browse/YARN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523720#comment-14523720 ] Craig Welch commented on YARN-3121: --- Hi [~adhoot], does it make sense to share some common metrics infrastructure with the capacity scheduler on this? Some metrics along these lines were recently added, [YARN-3293], perhaps the additional ones you have here can be added in those from the other side populated for FairScheduler as well, with some of the implementation shared? FairScheduler preemption metrics Key: YARN-3121 URL: https://issues.apache.org/jira/browse/YARN-3121 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: yARN-3121.prelim.patch, yARN-3121.prelim.patch Add FSQueuemetrics for preemption related information -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit
[ https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch resolved YARN-2848. --- Resolution: Fixed (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit -- Key: YARN-2848 URL: https://issues.apache.org/jira/browse/YARN-2848 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Likely solutions to [YARN-1680] (properly handling node and rack blacklisting with cluster level node additions and removals) will entail managing an application-level slice of the cluster resource available to the application for use in accurately calculating the application headroom and user limit. There is an assumption that events which impact this resource will occur less frequently than the need to calculate headroom, userlimit, etc (which is a valid assumption given that occurs per-allocation heartbeat). Given that, the application should (with assistance from cluster-level code...) detect changes to the composition of the cluster (node addition, removal) and when those have occurred, calculate an application specific cluster resource by comparing cluster nodes to it's own blacklist (both rack and individual node). I think it makes sense to include nodelabel considerations into this calculation as it will be efficient to do both at the same time and the single resource value reflecting both constraints could then be used for efficient frequent headroom and userlimit calculations while remaining highly accurate. The application would need to be made aware of nodelabel changes it is interested in (the application or removal of labels of interest to the application to/from nodes). For this purpose, the application submissions's nodelabel expression would be used to determine the nodelabel impact on the resource used to calculate userlimit and headroom (Cases where the application elected to request resources not using the application level label expression are out of scope for this - but for the common usecase of an application which uses a particular expression throughout, userlimit and headroom would be accurate) This could also provide an overall mechanism for handling application-specific resource constraints which might be added in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit
[ https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523629#comment-14523629 ] Craig Welch commented on YARN-2848: --- The ResourceUsage functionality added in [YARN-3356] [YARN-3099] and [YARN-3092] is effectively an implementation of the approach suggested here, was also used for [YARN-3463]. Given that, I'm going to close this one. While it's not yet been used to address the blacklist issue with headroom [YARN-1680], that should be handled there in any case. (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit -- Key: YARN-2848 URL: https://issues.apache.org/jira/browse/YARN-2848 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Likely solutions to [YARN-1680] (properly handling node and rack blacklisting with cluster level node additions and removals) will entail managing an application-level slice of the cluster resource available to the application for use in accurately calculating the application headroom and user limit. There is an assumption that events which impact this resource will occur less frequently than the need to calculate headroom, userlimit, etc (which is a valid assumption given that occurs per-allocation heartbeat). Given that, the application should (with assistance from cluster-level code...) detect changes to the composition of the cluster (node addition, removal) and when those have occurred, calculate an application specific cluster resource by comparing cluster nodes to it's own blacklist (both rack and individual node). I think it makes sense to include nodelabel considerations into this calculation as it will be efficient to do both at the same time and the single resource value reflecting both constraints could then be used for efficient frequent headroom and userlimit calculations while remaining highly accurate. The application would need to be made aware of nodelabel changes it is interested in (the application or removal of labels of interest to the application to/from nodes). For this purpose, the application submissions's nodelabel expression would be used to determine the nodelabel impact on the resource used to calculate userlimit and headroom (Cases where the application elected to request resources not using the application level label expression are out of scope for this - but for the common usecase of an application which uses a particular expression throughout, userlimit and headroom would be accurate) This could also provide an overall mechanism for handling application-specific resource constraints which might be added in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3257) FairScheduler: MaxAm may be set too low preventing apps from starting
[ https://issues.apache.org/jira/browse/YARN-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523935#comment-14523935 ] Craig Welch commented on YARN-3257: --- The implementation looks correct to me functionally, but I wonder if this should be something deferred to the policy? I realize on a practical level this might mean some duplicate code, but I wonder if, from a policy/contract perspective, it properly should be up to the policy to support or not support this exception to the rule logic? This might mean expanding the signature on the policy side to include the number of currently running apps as I don't know that the policy is otherwise aware of that. I don't feel strongly that this needs to be the approach, I just wanted to throw it out there for consideration. FairScheduler: MaxAm may be set too low preventing apps from starting - Key: YARN-3257 URL: https://issues.apache.org/jira/browse/YARN-3257 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3257.001.patch In YARN-2637 CapacityScheduler#LeafQueue does not enforce max am share if the limit prevents the first application from starting. This would be good to add to FSLeafQueue as well -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522133#comment-14522133 ] Craig Welch commented on YARN-1680: --- Hi [~airbots], any luck on this? Do you mind if I take it on again? availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509641#comment-14509641 ] Craig Welch commented on YARN-3319: --- Yes, it's configured in the capacity scheduler configuration with something like this: {code} property name(yarn-queue-prefix).ordering-policy.fair.enable-size-based-weight/name valuetrue/value /property {code} Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.8.0 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch, YARN-3319.75.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507701#comment-14507701 ] Craig Welch commented on YARN-3319: --- bq. Some minor comments about configuration part by index: 1) done 2) done 3) done - see below bq. Do you think is it better to make property in queue-name.ordering-policy.policy-name.property-key?... Now that there is not proper composition only one policy can be active at a time and it shouldn't be necessary to namespace config items this way. At the same time, I could see us getting back to proper composition at some point, where this would be helpful. I've implemented it as a prefix convention in the policy instead of constraining the contents of the map in the capacity scheduler configuration. This is because we still support passing a class name as the policy type, which would make the configurations for class name based items unwieldy. It would also allow us to have shared configuration items between policies if we do end up with proper composition again. The end result of the configuration was as you suggested 4) done 5) done bq. FairOrderingPolicy: all 3 done bq. Findbugs warning? Failed to stage change, so it didn't make it into patch, should be there now. Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.74.patch Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507917#comment-14507917 ] Craig Welch commented on YARN-3319: --- The failed tests pass on my box with the patch, unrelated. The checkstyle is referring to ResourceLimits, which the patch doesn't change... poking around in the build artifacts there are some exceptions in some of the checkstyle stuff, I'm not sure it's actually working correctly Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.75.patch trying something wrt checkstyle Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch, YARN-3319.75.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.73.patch another findbugs Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.72.patch Fix a findbugs Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.70.patch Up to date with same patch index on [YARN-3463], still one todo, which is to add the fair - FairPolicyClassName translation in the scheduler configuration. Will finalize after [YARN-3463] commit Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.71.patch with fair config support, sizeBasedWeight config support Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.70.patch Ok, getInfo it is, attached Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch, YARN-3463.69.patch, YARN-3463.70.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3510) Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness
Craig Welch created YARN-3510: - Summary: Create an extension of ProportionalCapacityPreemptionPolicy which preempts a number of containers from each application in a way which respects fairness Key: YARN-3510 URL: https://issues.apache.org/jira/browse/YARN-3510 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Craig Welch Assignee: Craig Welch The ProportionalCapacityPreemptionPolicy preempts as many containers from applications as it can during it's preemption run. For fifo this makes sense, as it is prempting in reverse order therefore maintaining the primacy of the oldest. For fair ordering this does not have the desired effect - instead, it should preempt a number of containers from each application which maintains a fair balance /close to a fair balance between them -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502207#comment-14502207 ] Craig Welch commented on YARN-3463: --- bq. ... I think we can just initialize this.comparator and this.schedulableEntities inside FifoOrderingPolicy constructor and remove the setComparator method Done bq. this should be inside the {removed} ... Done bq. getStatusMessage - getInfo ? Originally, it was getInfo - https://issues.apache.org/jira/browse/YARN-3318?focusedCommentId=14494396page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14494396 I have to say I prefer getInfo to getStatusMessage myself, as getStatusMessage suggests to me a transient nature which may change (metrics, etc), whereas this is information about policy type and configuration which is effectively static, it is just generic info, and isn't particularly transient If you feel strongly that it should be getInfo, let me know and I'll change it back. Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch, YARN-3463.69.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.69.patch Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch, YARN-3463.69.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499204#comment-14499204 ] Craig Welch commented on YARN-3463: --- bq. how about change it to MapString, String to explicitly pass option_key=value pairs to configure OrderingPolicy signature changed, will add configuration to pass in sizeBasedWeight as part of the FairOrderingPolicy patch, as that's where it would belong... bq. you can suppress them to avoid javac warning suppressed Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.68.patch Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499205#comment-14499205 ] Craig Welch commented on YARN-3463: --- btw, the tests pass on my box with the change, failures not related to the patch Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.67.patch rm unneeded capached change Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498492#comment-14498492 ] Craig Welch commented on YARN-3463: --- bq. should we make pendingApplications to use customized comparator No, applications still start as they did before, no reason to change it bq. can we make it not use generic type now for simpler No, this comes from the interface definition and it's needed to enable another scheduler, say FS, be able to use the same code with their derived application types of choice bq. I think we may carefully add ORDERING_POLICY_CONFIG, since this will be a public config. I understand the reason to add the policy_config is to support policy=fair, config=fair+fifo usecase So this configuration is not for defining the policy, that is ORDERING_POLICY (which is where you would have fair, fifo, etc), this is for configuration elements which may modify the behavior of the policy, such as sizeBasedWeight, it's needed for that purpose. bq. Why change disposableLeafQueue.getApplications().size() I think this is left over from earlier changes is no longer needed, will remove. bq. Suppress generic warning? Javac warning? I'm actually not getting any there, I think because this is used for mocking Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.64.patch rebased to current trunk Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.65.patch Suppress orderingpolicy from appearing in web service responses, is still on the web ui Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.66.patch Fix build warnings, the tests all pass on my box. Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494924#comment-14494924 ] Craig Welch commented on YARN-3463: --- I did, but it has the right stuff in it :-) I'll fix in a minute. I looked and it appears that the rest api will show the ordering policy, it appears that the UI and the rest api are joined this way - I'll see if it can be suppressed in a reasonable way. If it can't, though, I think we need to leave it in - it's simply not acceptable to leave users blind/unable to determine what policy is in effect. Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.61.patch, YARN-3463.50.patch, YARN-3643.58.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3643.61.patch Get the patch name right. Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.61.patch, YARN-3463.50.patch, YARN-3643.58.patch, YARN-3643.61.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: (was: YARN-3319.61.patch) Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3643.61.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: (was: YARN-3643.58.patch) Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3643.61.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.61.patch Again with the patch name Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3463.50.patch, YARN-3463.61.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)