[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644208#comment-13644208 ] Andrew Ferguson commented on YARN-326: -- [~sandyr] bingo. that was exactly the concern I alluded to before. glad we found it while thinking about the design. :-) [~kkambatl] yup, that's the idea -- fractional min-share, which would be interpreted as a fraction of the dominant resource (which wouldn't be pre-specified, so the queue's dominant resource could adapt based on the jobs submitted) ... I wrote my example a bit quickly, sorry! let me know if something's still not clear. the new plan sounds like a good approach. I like it. Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644209#comment-13644209 ] Andrew Ferguson commented on YARN-326: -- ps -- I forgot to include a pointer to the newest paper in the DRF line of work: http://www.cs.berkeley.edu/~matei/papers/2013/eurosys_choosy.pdf Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643070#comment-13643070 ] Andrew Ferguson commented on YARN-326: -- hey Sandy, sure, I certainly see the appeal of the absolute values approach -- like I said, it's a design tradeoff. however, one point of DRF is that we can sensibly consider fractions of multidimensional resource vectors since the fraction is defined as the fraction of the cluster consumed by the most dominant resource. having single-dimensional fractions like this is nice because we can then a) weight them, and b) calculate max-min fairness as in the one-dimensional case (eg, memory) case. consider the history and geology departments you introduced above. let's say our policy is that each queue gets equal weight (since the the departments went in on the purchase of the cluster 50/50), and that each queue should be guaranteed a minimum of 1/4 of the cluster (so that a queue fresh with jobs ramps-up to 1/4 of the cluster quickly). in your proposal, since the departments have different shaped demands (one for high-memory, the other for high-cpu), we would configure their minimum share vectors based on these different shaped demands. this would work fine as long as the departments continued to submit resource requests which had these same, pre-configured shapes. however, if we establish the minimums using fractions, then the departments can easily change between different shaped jobs, and still have the minimums work out for them sensibly. does this make sense? let's be concrete: 10 nodes with 8 CPUs and 64 GB of RAM if history usually submits jobs for (1 CPU, 16 GB) and geology for (2 CPU, 8 GB). with your proposal, we might define history's minimum allocation to be (10 CPU, 160 GB) (1/4 of the dominant resource) and geology to be (20 CPU, 80 GB) (again, 1/4 of dominant resource). if either department changed the shape of their requests, they wouldn't get full use of their minimum. so, what if we listed the minimums as simply 1/4 * cluster size, but not considering DRF? ie, giving (20 CPU and 160 GB) as the minimum allocation to each? well, if the departments continued to submit the different shaped jobs (1 CPU, 16 GB) and (2 CPU, 8 GB), the design described would continue to see the queues as being below their minimum allocation, even after the bottleneck resource fully consumed its amount of the minimum allocation. in the extreme case, I highly suspect a job could get *more* than its DRF-based fair share, simply by having one of its non-dominant resources remain below the amount listed in its minimum share. (can you see this? if not, I'll work out an example) the beauty of the fractions approach, in my mind, is that it will apply no matter which resource is the bottleneck resource. hope this example is clear. sorry I haven't had time to look at your code -- this is just based on my reading of your design doc. perhaps all is well and good in the code itself. :-) cheers, Andrew Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642286#comment-13642286 ] Andrew Ferguson commented on YARN-326: -- hi Sandy, I'm wondering if you want minimum and maximum shares to actually be fractions of the cluster, rather than resource vectors? that would fit more with the fairness aspect of the FairScheduler, but it's completely a design decision. for example, what happens if the sum of the minimum shares for each queue exceeds the size of the cluster? (or the size of the cluster during a failure?) or, if my queue has been given a minimum share of (2 CPU, 240 GB RAM) -- because I was originally using tasks with high-memory, what happens if I decide to switch to using tasks with high-CPU and low-memory? I think a minimum share of 1/8 might make more sense since it would allow the queue's users to request the resources as they see fit. anyway, just a thought. cheers, Andrew Add multi-resource scheduling to the fair scheduler --- Key: YARN-326 URL: https://issues.apache.org/jira/browse/YARN-326 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574187#comment-13574187 ] Andrew Ferguson commented on YARN-3: [~acmurthy] thanks for the merge Arun! Add support for CPU isolation/monitoring of containers -- Key: YARN-3 URL: https://issues.apache.org/jira/browse/YARN-3 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: mapreduce-4334-design-doc.txt, mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535111#comment-13535111 ] Andrew Ferguson commented on YARN-3: [~vinodkv] you bet! I will fix these today. thanks, Andrew Add support for CPU isolation/monitoring of containers -- Key: YARN-3 URL: https://issues.apache.org/jira/browse/YARN-3 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Andrew Ferguson Attachments: mapreduce-4334-design-doc.txt, mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated YARN-147: - Attachment: YARN-147-v8.patch Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-147-v8.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508411#comment-13508411 ] Andrew Ferguson commented on YARN-3: hi everyone, sorry for the delay on this patch -- the east coast hurricane other events set me behind schedule. I have attached a new version of this work to YARN-147 (v8); it is based on the latest version of trunk. as always, you can see my github tree for exact changes: https://github.com/adferguson/hadoop-common/ this patch has been tested (and confirmed to work) as follows: - default executor, no cgroups - Linux executor, no cgroups - Linux executor, with cgroups - Linux executor, mount cgroups automatically - Linux executor, cgroups already mounted asked to mount - error condition: cgroups already mounted cannot write to cgroup - error condition: asked to mount cgroups, but cannot mount both error conditions result in the NodeManager halting, as we have discussed above. [~bikassaha], to answer your first question: mountCgroups is a function in LinuxContainerExecutor because that class is simply a Java wrapper for the functions provided by the LCE. [~bikassaha], to answer your second question: if we use cgroups to limit CPU and there is only one container running on the machine, the current design will allow the container to access all of the CPU resources until other tasks start running (a work-conserving design). this design is using the CPU weights feature of cgroups, rather than the cpu bandwidth feature (or the entirely separate cpusets controller) to limit the bandwidth (a non-work-conserving design). thank you, Andrew Add support for CPU isolation/monitoring of containers -- Key: YARN-3 URL: https://issues.apache.org/jira/browse/YARN-3 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Andrew Ferguson Attachments: mapreduce-4334-design-doc.txt, mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483266#comment-13483266 ] Andrew Ferguson commented on YARN-3: (replying to comments on YARN-147 here instead as per [~acmurthy]'s request) thanks for catching that bug [~sseth]! I've updated my git repo [1], and will post a new patch after addressing the review from [~vinodkone]. I successfully tested it quite a bit with and without cgroups back in the summer, but it seems the patch has shifted enough since the testing that I should do it again. [1] https://github.com/adferguson/hadoop-common/commits/adf-yarn-147 Add support for CPU isolation/monitoring of containers -- Key: YARN-3 URL: https://issues.apache.org/jira/browse/YARN-3 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Andrew Ferguson Attachments: mapreduce-4334-design-doc.txt, mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483268#comment-13483268 ] Andrew Ferguson commented on YARN-147: -- hi [~acmurthy], I've started posting replies on YARN-3 instead. the LCE bug is fixed and I'll post a new patch after addressing [~vinodkv]'s comments. thanks! Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483412#comment-13483412 ] Andrew Ferguson commented on YARN-3: thanks for the review [~vinodkv]. I'll post an updated patch on YARN-147. there's a lot of food for thought here (design questions), so here are some comments: bq. yarn.nodemanager.linux-container-executor.cgroups.mount has different defaults in code and in yarn-default.xml yeah -- personally, I think the default should be false since it's not clear what a sensible default mount path is. I had changed the line in the code in response to Tucu's comment [1], but I'm changing it back to false since true doesn't seem sensible to me. if anyone in the community has a sensible default mount path, then we can surely change the default to true in both the code and yarn-default.xml :-/ bq. Can you explain this? Is this sleep necessary. Depending on its importance, we'll need to fix the following Id check, AMs don't always have ID equaling one. the sleep is necessary as sometimes the LCE reports that the container has exited, even though the AM process has not terminated. hence, because the process is still running, we can't remove the cgroup yet; therefore, the code sleeps briefly. since the AM doesn't always have the ID of 1, what do you suggest I do to determine whether the container has the AM or not? if there isn't a good rule, the code can just always sleep before removing the cgroup. bq. container-executor.c: If a mount-point is already mounted, mount gives a EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. on NM restarts). We could unmount cgroup fs on shutdown but that isn't always guaranteed. great catch! thanks! I've made this non-fatal. now, the NM will attempt to re-mount the cgroup, will print a message that it can't do that because it's mounted, and everything will work, because it will simply work as in the case where the cluster admin has already mounted the cgroups. bq. Not sure of the benefit of configurable yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just always mount to a path that it creates and owns? Similar comment for the hierarchy-prefix. for the hierarchy-prefix, this needs to be configurable since, in the scenario where the admin creates the cgroups in advance, the NM doesn't have privileges to create its own hierarchy. for the mount-path, this is a good question. Linux distributions mount the cgroup controllers in various locations, so I thought it was better to keep it configurable, since I figured it would be confusing if the OS had already mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then the NM started mounting additional controllers in /path/nm/owns/cgroup/. bq. CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple places - updateCgroup() and createCgroup(). In the later, if cgroups are enabled, and we can't create the file, it is a critical error? I'm fine either way. what would people prefer to see? is it better to launch a container even if we can't enforce the limits? or is it better to prevent the container from launching? happy to make the necessary quick change. bq. Make ResourcesHandler top level. I'd like to merge the ContainersMonitor functionality with this so as to monitor/enforce memory limits also. ContainersMinotor is top-level, we should make ResourcesHandler also top-level so that other platforms don't need to create this type-hierarchy all over again when they wish to implement some or all of this functionality. if I'm reading this correctly, yes, that is what I first wanted to do when I started this patch (see discussions at the top of this YARN-3 thread, the early patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we have decided to go another way. thank you, Andrew [1] https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13470926page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13470926 Add support for CPU isolation/monitoring of containers -- Key: YARN-3 URL: https://issues.apache.org/jira/browse/YARN-3 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Andrew Ferguson Attachments: mapreduce-4334-design-doc.txt, mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch,
[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated YARN-147: - Attachment: YARN-147-v6.patch updated as per reviews on comments here and on YARN-3. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores
[ https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481081#comment-13481081 ] Andrew Ferguson commented on YARN-2: hi Arun, this patch is looking GREAT! in particular, the ResourceCalculator class is super useful -- I really like it. :-) my version, without it, is definitely much harder to follow... before some specific feedback, I want to say that I agree that cores should be floats/fractional-units for three reasons: # they make sense for long-running services, which may require little CPU, but should be available on each node, with the ease of having been scheduled by YARN. # this gives us a fine-grained knob for implementing dynamic re-adjustment one day; ie, I may want to increase an executing job's weight by 10%, or decrease by 15%, etc. # the publicly released traces of resource requests usage in Google's cluster (to my knowledge, the only traces of their kind) include fractional amounts for CPU; having fractional CPU requests in YARN may make it easier to translate insights from that dataset to making better resource requests in a YARN cluster. ok, here are some specific comments on the patch: * *YarnConfiguration.java*: duplicate import of {{com.google.common.base.Joiner}} * *DefaultContainer.java*: {{divideAndCeil}} explicitly uses the two-argument form of {{createResource}} to create a resource with 0 cores, whereas other Resources created in this calculator create resources with 1 core. this seems counter-intuitive to me, as {{divideAndCeil}} tends to result in an _overestimate_ of resource consumption, rather than an _underestimate_. either way, perhaps a comment would be helpful, as it is the only time this method is used this way in the memory-only comparator * *MultiResourceCalculator.java*: in {{compare()}}, you are looking to order the resources by how dominant they are, and then compare by most-dominant resource, second most-dominant, etc. ... I think the boolean flag to {{getResourceAsValue()}} doesn't make this clear. with the flag, the question in my mind would be wait, why would I want the non-dominant resource?. simply having a boolean flag makes extending to three or more resources less clean. I implemented this by treating each resource request as a vector, normalizing by clusterResources, and then sorting the components by dominance. * *MultiResourceCalcuator.java*, *DefaultCalculator.java*, *Resources.java*: for the {{multiplyAndNormalizeUp}} and {{multiplyAndNormalizeDown}} methods, consider renaming the third argument to stepping instead of factor is it's not a factor used for the multiplication, rather it's a unit of discretization to round to (stepping may not be the best word, but perhaps it's closer). just a thought... * *CSQueueUtils.java*: extra spaces in front of {{@Lock(CSQueue.class)}} * *CapacityScheduler.java*: in the {{allocate()}} method, there's a call to normalize the request (after a comment about sanity checks). currently, it only normalizes the memory; I think the patch should also normalize the number of CPU's requested, no? * *LeafQueue.java*: in {{assignReservedContainer}} consider changing {{Resources.divide}} to {{Resources.ratio}} when calculating {{potentialNewCapacity}} (and the current capacity). While both calls should give the same result, {{ratio}} has fewer floating-point operations, and, better yet, is semantically what is meant in this case -- we're calculating the ratio between (used + requested) and available. Frankly, this is perhaps something to take a closer look at (as [~vinodkv] pointed out): whether both {{divide}} and {{ratio}} are needed, and if so, which should be used in each case. Also, both *ContainerTokenIdentifier.java* and *BuilderUtils.java* assume that memory is the only resource; I'm not certain they should be updated, but I wanted to mention them just in case. Oh, and should *yarn-default.xml* be updated with values for {{yarn.scheduler.minimum-allocaiton-cores}} and {{yarn.scheduler.maximum-allocation-cores}} ? Hope this helps, Arun! depending on how the discussion of integral vs fractional cores shakes out, I think this patch is good to go. cheers, Andrew Enhance CS to schedule accounting for both memory and cpu cores --- Key: YARN-2 URL: https://issues.apache.org/jira/browse/YARN-2 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch,
[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores
[ https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481082#comment-13481082 ] Andrew Ferguson commented on YARN-2: oops, quick typo fix: by *DefaultContainer.java*, I meant *DefaultCalculator.java* .. thanks! Enhance CS to schedule accounting for both memory and cpu cores --- Key: YARN-2 URL: https://issues.apache.org/jira/browse/YARN-2 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated YARN-147: - Attachment: YARN-147-v3.patch update native code per review by Colin Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479387#comment-13479387 ] Andrew Ferguson commented on YARN-147: -- hi Colin, thanks for looking at the native code. since the changes were pretty extensive, would you mind taking a careful look again? if it's easier for you, the incremental changes can be seen here: https://github.com/adferguson/hadoop-common/commits/adf-yarn-147 I hope I've faithfully implemented the new key-value API you suggested -- let me know if that's not the case. If the mount fails, I let the exception bubble all the way up to stop the NodeManager, as Tucu suggested before about a different error. The one thing I did not do is change the open / write / close methods to fopen / fprintf / fclose, as the rest of the native code does not use those methods. Which would you prefer to see: adjust my patch to use fopen, etc., or fix my use of open, etc.? Yes, I totally agree that it would be better if main.c used getopt_long; it definitely smells like another JIRA to me. :-) thanks! Andrew Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated YARN-147: - Attachment: YARN-147-v4.patch small fix in two places: don't log re-throw the same exception -- construct new exceptions with better context, and use the previous one as the cause. thanks Tucu for pointing this out! Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated YARN-147: - Attachment: YARN-147-v2.patch thanks for the additional comments, Tucu! I've updated the patch as per your review. hopefully I have done everything correctly. btw, I have this patch in github: https://github.com/adferguson/hadoop-common/tree/adf-yarn-147 you can see the changes for this patch in the most recent commit. thanks! Andrew Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated YARN-147: - Attachment: YARN-147-v1.patch updated patch as per Tucu's review Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475909#comment-13475909 ] Andrew Ferguson commented on YARN-147: -- hi Tucu, thanks very much for opening this new jira and reviewing the patch. I've updated a new version which addresses most of your comments. answers to the questions in your review: .bq cgroupMountPath, if there is no default we should fail if not set, can't we have a sensible default? I've added a check to fail if not set. as far as I can tell, there isn't a single default path for cgroups -- some distributions use /sys/fs/cgroup, some use /cgroup, others, /cgroups. I've even seen /mnt/cgroup (Debian perhaps?); these also vary across releases of the same distro. :-( .bq default value for cgroupPrefix has '/', here will produce a '//' in the path yes, I made that choice deliberately. I wanted to convey that cgroupPrefix can be a path (which is why I kept the '/') and when I use it, I also added a '/' in case the user did not put a '/' at the right place in the prefix. my understanding is that on Unix, '//' in a path is interpreted as '/', no? .bq Nf the filereader cannot be open/read, is this acceptable or should stop execution by throwing exception? eh, we could go either way here, but I think it's reasonable to not throw the exception. if the file can't be read, then the map from cgroup controller to path isn't built, and we already have existing checks which skip controllers which can't be found in the path (say, if the file can be read correctly, but the CPU controller isn't mounted). ok, great. I'm going to mark this as patch available and see if the findbugs warning has gone away (I can't seem to get it to run locally). thanks!! Andrew Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira