[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-28 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644208#comment-13644208
 ] 

Andrew Ferguson commented on YARN-326:
--

[~sandyr] bingo. that was exactly the concern I alluded to before. glad we 
found it while thinking about the design. :-)

[~kkambatl] yup, that's the idea -- fractional min-share, which would be 
interpreted as a fraction of the dominant resource (which wouldn't be 
pre-specified, so the queue's dominant resource could adapt based on the jobs 
submitted) ... I wrote my example a bit quickly, sorry! let me know if 
something's still not clear.

the new plan sounds like a good approach. I like it.



 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, 
 YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-28 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644209#comment-13644209
 ] 

Andrew Ferguson commented on YARN-326:
--

ps -- I forgot to include a pointer to the newest paper in the DRF line of 
work: http://www.cs.berkeley.edu/~matei/papers/2013/eurosys_choosy.pdf

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, 
 YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-26 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643070#comment-13643070
 ] 

Andrew Ferguson commented on YARN-326:
--

hey Sandy,

sure, I certainly see the appeal of the absolute values approach -- like I 
said, it's a design tradeoff.

however, one point of DRF is that we can sensibly consider fractions of 
multidimensional resource vectors since the fraction is defined as the 
fraction of the cluster consumed by the most dominant resource. having 
single-dimensional fractions like this is nice because we can then a) weight 
them, and b) calculate max-min fairness as in the one-dimensional case (eg, 
memory) case.

consider the history and geology departments you introduced above. let's say 
our policy is that each queue gets equal weight (since the the departments went 
in on the purchase of the cluster 50/50), and that each queue should be 
guaranteed a minimum of 1/4 of the cluster (so that a queue fresh with jobs 
ramps-up to 1/4 of the cluster quickly).

in your proposal, since the departments have different shaped demands (one for 
high-memory, the other for high-cpu), we would configure their minimum share 
vectors based on these different shaped demands. this would work fine as long 
as the departments continued to submit resource requests which had these same, 
pre-configured shapes.

however, if we establish the minimums using fractions, then the departments can 
easily change between different shaped jobs, and still have the minimums work 
out for them sensibly. does this make sense?

let's be concrete:

10 nodes with 8 CPUs and 64 GB of RAM

if history usually submits jobs for (1 CPU, 16 GB) and geology for (2 CPU, 8 
GB). with your proposal, we might define history's minimum allocation to be (10 
CPU, 160 GB)  (1/4 of the dominant resource) and geology to be (20 CPU, 80 GB) 
(again, 1/4 of dominant resource). if either department changed the shape of 
their requests, they wouldn't get full use of their minimum.

so, what if we listed the minimums as simply 1/4 * cluster size, but not 
considering DRF? ie, giving (20 CPU and 160 GB) as the minimum allocation to 
each? well, if the departments continued to submit the different shaped jobs (1 
CPU, 16 GB) and (2 CPU, 8 GB), the design described would continue to see the 
queues as being below their minimum allocation, even after the bottleneck 
resource fully consumed its amount of the minimum allocation. in the extreme 
case, I highly suspect a job could get *more* than its DRF-based fair share, 
simply by having one of its non-dominant resources remain below the amount 
listed in its minimum share. (can you see this? if not, I'll work out an 
example)

the beauty of the fractions approach, in my mind, is that it will apply no 
matter which resource is the bottleneck resource.


hope this example is clear. sorry I haven't had time to look at your code -- 
this is just based on my reading of your design doc. perhaps all is well and 
good in the code itself. :-)


cheers,
Andrew

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, 
 YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-25 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642286#comment-13642286
 ] 

Andrew Ferguson commented on YARN-326:
--

hi Sandy,

I'm wondering if you want minimum and maximum shares to actually be fractions 
of the cluster, rather than resource vectors? that would fit more with the 
fairness aspect of the FairScheduler, but it's completely a design decision.

for example, what happens if the sum of the minimum shares for each queue 
exceeds the size of the cluster? (or the size of the cluster during a failure?)

or, if my queue has been given a minimum share of (2 CPU, 240 GB RAM) -- 
because I was originally using tasks with high-memory, what happens if I decide 
to switch to using tasks with high-CPU and low-memory?  I think a minimum share 
of 1/8 might make more sense since it would allow the queue's users to 
request the resources as they see fit.


anyway, just a thought.


cheers,
Andrew

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2013-02-07 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574187#comment-13574187
 ] 

Andrew Ferguson commented on YARN-3:


[~acmurthy] thanks for the merge Arun!

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-18 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535111#comment-13535111
 ] 

Andrew Ferguson commented on YARN-3:


[~vinodkv] you bet! I will fix these today.

thanks,
Andrew

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-12-02 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v8.patch

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-147-v8.patch, 
 YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-02 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508411#comment-13508411
 ] 

Andrew Ferguson commented on YARN-3:


hi everyone, sorry for the delay on this patch -- the east coast hurricane  
other events set me behind schedule.

I have attached a new version of this work to YARN-147 (v8); it is based on the 
latest version of trunk. as always, you can see my github tree for exact 
changes: https://github.com/adferguson/hadoop-common/

this patch has been tested (and confirmed to work) as follows:
- default executor, no cgroups
- Linux executor, no cgroups
- Linux executor, with cgroups
- Linux executor, mount cgroups automatically
- Linux executor, cgroups already mounted  asked to mount
- error condition: cgroups already mounted  cannot write to cgroup
- error condition: asked to mount cgroups, but cannot mount

both error conditions result in the NodeManager halting, as we have discussed 
above.


[~bikassaha], to answer your first question: mountCgroups is a function in 
LinuxContainerExecutor because that class is simply a Java wrapper for the 
functions provided by the LCE.

[~bikassaha], to answer your second question: if we use cgroups to limit CPU 
and there is only one container running on the machine, the current design will 
allow the container to access all of the CPU resources until other tasks start 
running (a work-conserving design). this design is using the CPU weights 
feature of cgroups, rather than the cpu bandwidth feature (or the entirely 
separate cpusets controller) to limit the bandwidth (a non-work-conserving 
design).


thank you,
Andrew

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483266#comment-13483266
 ] 

Andrew Ferguson commented on YARN-3:


(replying to comments on YARN-147 here instead as per [~acmurthy]'s request)

thanks for catching that bug [~sseth]! I've updated my git repo [1], and will 
post a new patch after addressing the review from [~vinodkone]. I successfully 
tested it quite a bit with and without cgroups back in the summer, but it seems 
the patch has shifted enough since the testing that I should do it again.

[1] https://github.com/adferguson/hadoop-common/commits/adf-yarn-147

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483268#comment-13483268
 ] 

Andrew Ferguson commented on YARN-147:
--

hi [~acmurthy], I've started posting replies on YARN-3 instead. the LCE bug is 
fixed and I'll post a new patch after addressing [~vinodkv]'s comments. thanks!

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483412#comment-13483412
 ] 

Andrew Ferguson commented on YARN-3:


thanks for the review [~vinodkv]. I'll post an updated patch on YARN-147. 
there's a lot of food for thought here (design questions), so here are some 
comments:

bq. yarn.nodemanager.linux-container-executor.cgroups.mount has different 
defaults in code and in yarn-default.xml

yeah -- personally, I think the default should be false since it's not clear 
what a sensible default mount path is. I had changed the line in the code in 
response to Tucu's comment [1], but I'm changing it back to false since true 
doesn't seem sensible to me. if anyone in the community has a sensible default 
mount path, then we can surely change the default to true in both the code and 
yarn-default.xml :-/

bq. Can you explain this? Is this sleep necessary. Depending on its importance, 
we'll need to fix the following Id check, AMs don't always have ID equaling one.

the sleep is necessary as sometimes the LCE reports that the container has 
exited, even though the AM process has not terminated. hence, because the 
process is still running, we can't remove the cgroup yet; therefore, the code 
sleeps briefly.

since the AM doesn't always have the ID of 1, what do you suggest I do to 
determine whether the container has the AM or not? if there isn't a good rule, 
the code can just always sleep before removing the cgroup.

bq. container-executor.c: If a mount-point is already mounted, mount gives a 
EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. 
on NM restarts). We could unmount cgroup fs on shutdown but that isn't always 
guaranteed.

great catch! thanks! I've made this non-fatal. now, the NM will attempt to 
re-mount the cgroup, will print a message that it can't do that because it's 
mounted, and everything will work, because it will simply work as in the case 
where the cluster admin has already mounted the cgroups.

bq. Not sure of the benefit of configurable 
yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just 
always mount to a path that it creates and owns? Similar comment for the 
hierarchy-prefix.

for the hierarchy-prefix, this needs to be configurable since, in the scenario 
where the admin creates the cgroups in advance, the NM doesn't have privileges 
to create its own hierarchy.

for the mount-path, this is a good question. Linux distributions mount the 
cgroup controllers in various locations, so I thought it was better to keep it 
configurable, since I figured it would be confusing if the OS had already 
mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then 
the NM started mounting additional controllers in /path/nm/owns/cgroup/.

bq. CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple 
places - updateCgroup() and createCgroup(). In the later, if cgroups are 
enabled, and we can't create the file, it is a critical error?

I'm fine either way. what would people prefer to see? is it better to launch a 
container even if we can't enforce the limits? or is it better to prevent the 
container from launching? happy to make the necessary quick change.

bq. Make ResourcesHandler top level. I'd like to merge the ContainersMonitor 
functionality with this so as to monitor/enforce memory limits also. 
ContainersMinotor is top-level, we should make ResourcesHandler also top-level 
so that other platforms don't need to create this type-hierarchy all over again 
when they wish to implement some or all of this functionality.

if I'm reading this correctly, yes, that is what I first wanted to do when I 
started this patch (see discussions at the top of this YARN-3 thread, the early 
patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we have 
decided to go another way.



thank you,
Andrew


[1] 
https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13470926page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13470926

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v6.patch

updated as per reviews on comments here and on YARN-3.

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores

2012-10-21 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481081#comment-13481081
 ] 

Andrew Ferguson commented on YARN-2:


hi Arun,

this patch is looking GREAT! in particular, the ResourceCalculator class is 
super useful -- I really like it. :-)  my version, without it, is definitely 
much harder to follow...

before some specific feedback, I want to say that I agree that cores should be 
floats/fractional-units for three reasons:
# they make sense for long-running services, which may require little CPU, but 
should be available on each node, with the ease of having been scheduled by 
YARN.
# this gives us a fine-grained knob for implementing dynamic re-adjustment one 
day; ie, I may want to increase an executing job's weight by 10%, or decrease 
by 15%, etc.
# the publicly released traces of resource requests  usage in Google's cluster 
(to my knowledge, the only traces of their kind) include fractional amounts for 
CPU; having fractional CPU requests in YARN may make it easier to translate 
insights from that dataset to making better resource requests in a YARN cluster.

ok, here are some specific comments on the patch:
* *YarnConfiguration.java*: duplicate import of 
{{com.google.common.base.Joiner}}

* *DefaultContainer.java*: {{divideAndCeil}} explicitly uses the two-argument 
form of {{createResource}} to create a resource with 0 cores, whereas other 
Resources created in this calculator create resources with 1 core. this seems 
counter-intuitive to me, as {{divideAndCeil}} tends to result in an 
_overestimate_ of resource consumption, rather than an _underestimate_. either 
way, perhaps a comment would be helpful, as it is the only time this method is 
used this way in the memory-only comparator

* *MultiResourceCalculator.java*: in {{compare()}}, you are looking to order 
the resources by how dominant they are, and then compare by most-dominant 
resource, second most-dominant, etc. ... I think the boolean flag to 
{{getResourceAsValue()}} doesn't make this clear. with the flag, the question 
in my mind would be wait, why would I want the non-dominant resource?. simply 
having a boolean flag makes extending to three or more resources less clean. I 
implemented this by treating each resource request as a vector, normalizing by 
clusterResources, and then sorting the components by dominance.

* *MultiResourceCalcuator.java*, *DefaultCalculator.java*, *Resources.java*: 
for the {{multiplyAndNormalizeUp}} and {{multiplyAndNormalizeDown}} methods, 
consider renaming the third argument to stepping instead of factor is it's 
not a factor used for the multiplication, rather it's a unit of discretization 
to round to (stepping may not be the best word, but perhaps it's closer). 
just a thought...

* *CSQueueUtils.java*: extra spaces in front of {{@Lock(CSQueue.class)}}

* *CapacityScheduler.java*: in the {{allocate()}} method, there's a call to 
normalize the request (after a comment about sanity checks). currently, it only 
normalizes the memory; I think the patch should also normalize the number of 
CPU's requested, no?

* *LeafQueue.java*: in {{assignReservedContainer}} consider changing 
{{Resources.divide}} to {{Resources.ratio}} when calculating 
{{potentialNewCapacity}} (and the current capacity). While both calls should 
give the same result, {{ratio}} has fewer floating-point operations, and, 
better yet, is semantically what is meant in this case -- we're calculating the 
ratio between (used + requested) and available. Frankly, this is perhaps 
something to take a closer look at (as [~vinodkv] pointed out): whether both 
{{divide}} and {{ratio}} are needed, and if so, which should be used in each 
case.


Also, both *ContainerTokenIdentifier.java* and *BuilderUtils.java* assume that 
memory is the only resource; I'm not certain they should be updated, but I 
wanted to mention them just in case.

Oh, and should *yarn-default.xml* be updated with values for 
{{yarn.scheduler.minimum-allocaiton-cores}} and 
{{yarn.scheduler.maximum-allocation-cores}} ?


Hope this helps, Arun!  depending on how the discussion of integral vs 
fractional cores shakes out, I think this patch is good to go.


cheers,
Andrew

 Enhance CS to schedule accounting for both memory and cpu cores
 ---

 Key: YARN-2
 URL: https://issues.apache.org/jira/browse/YARN-2
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, 
 MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, 
 MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, 
 

[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores

2012-10-21 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481082#comment-13481082
 ] 

Andrew Ferguson commented on YARN-2:


oops, quick typo fix: by *DefaultContainer.java*, I meant 
*DefaultCalculator.java* .. thanks!

 Enhance CS to schedule accounting for both memory and cpu cores
 ---

 Key: YARN-2
 URL: https://issues.apache.org/jira/browse/YARN-2
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, 
 MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, 
 MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, 
 YARN-2.patch, YARN-2.patch, YARN-2.patch


 With YARN being a general purpose system, it would be useful for several 
 applications (MPI et al) to specify not just memory but also CPU (cores) for 
 their resource requirements. Thus, it would be useful to the 
 CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v3.patch

update native code per review by Colin

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479387#comment-13479387
 ] 

Andrew Ferguson commented on YARN-147:
--

hi Colin,

thanks for looking at the native code. since the changes were pretty extensive, 
would you mind taking a careful look again? if it's easier for you, the 
incremental changes can be seen here:
https://github.com/adferguson/hadoop-common/commits/adf-yarn-147

I hope I've faithfully implemented the new key-value API you suggested -- let 
me know if that's not the case.

If the mount fails, I let the exception bubble all the way up to stop the 
NodeManager, as Tucu suggested before about a different error.

The one thing I did not do is change the open / write / close methods to fopen 
/ fprintf / fclose, as the rest of the native code does not use those methods. 
Which would you prefer to see: adjust my patch to use fopen, etc., or fix my 
use of open, etc.?

Yes, I totally agree that it would be better if main.c used getopt_long; it 
definitely smells like another JIRA to me. :-)


thanks!
Andrew

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v4.patch

small fix in two places: don't log  re-throw the same exception -- construct 
new exceptions with better context, and use the previous one as the cause.

thanks Tucu for pointing this out!

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-16 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v2.patch

thanks for the additional comments, Tucu!  I've updated the patch as per your 
review. hopefully I have done everything correctly.

btw, I have this patch in github:
https://github.com/adferguson/hadoop-common/tree/adf-yarn-147
you can see the changes for this patch in the most recent commit.


thanks!
Andrew

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-14 Thread Andrew Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v1.patch

updated patch as per Tucu's review

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-14 Thread Andrew Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475909#comment-13475909
 ] 

Andrew Ferguson commented on YARN-147:
--

hi Tucu,

thanks very much for opening this new jira and reviewing the patch. I've 
updated a new version which addresses most of your comments.

answers to the questions in your review:
.bq cgroupMountPath, if there is no default we should fail if not set, can't we 
have a sensible default?

I've added a check to fail if not set. as far as I can tell, there isn't a 
single default path for cgroups -- some distributions use /sys/fs/cgroup, 
some use /cgroup, others, /cgroups. I've even seen /mnt/cgroup (Debian 
perhaps?); these also vary across releases of the same distro. :-(

.bq default value for cgroupPrefix has '/', here will produce a '//' in the path

yes, I made that choice deliberately. I wanted to convey that cgroupPrefix can 
be a path (which is why I kept the '/') and when I use it, I also added a '/' 
in case the user did not put a '/' at the right place in the prefix. my 
understanding is that on Unix, '//' in a path is interpreted as '/', no?

.bq Nf the filereader cannot be open/read, is this acceptable or should stop 
execution by throwing exception?

eh, we could go either way here, but I think it's reasonable to not throw the 
exception. if the file can't be read, then the map from cgroup controller to 
path isn't built, and we already have existing checks which skip controllers 
which can't be found in the path (say, if the file can be read correctly, but 
the CPU controller isn't mounted).


ok, great. I'm going to mark this as patch available and see if the findbugs 
warning has gone away (I can't seem to get it to run locally).


thanks!!
Andrew


 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira