from:"Andrew Ferguson \(JIRA\)"

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-28 Thread Andrew Ferguson (JIRA)

[
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644208#comment-13644208
]

Andrew Ferguson commented on YARN-326:
--

[~sandyr] bingo. that was exactly the concern I alluded to before. glad we
found it while thinking about the design. :-)

[~kkambatl] yup, that's the idea -- fractional min-share, which would be
interpreted as a fraction of the dominant resource (which wouldn't be
pre-specified, so the queue's dominant resource could adapt based on the jobs
submitted) ... I wrote my example a bit quickly, sorry! let me know if
something's still not clear.

the new plan sounds like a good approach. I like it.

Add multi-resource scheduling to the fair scheduler
---

Key: YARN-326
URL: https://issues.apache.org/jira/browse/YARN-326
Project: Hadoop YARN
Issue Type: New Feature
Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch,
YARN-326.patch

With YARN-2 in, the capacity scheduler has the ability to schedule based on
multiple resources, using dominant resource fairness. The fair scheduler
should be able to do multiple resource scheduling as well, also using
dominant resource fairness.
More details to come on how the corner cases with fair scheduler configs such
as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-28 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644209#comment-13644209
 ] 

Andrew Ferguson commented on YARN-326:
--

ps -- I forgot to include a pointer to the newest paper in the DRF line of 
work: http://www.cs.berkeley.edu/~matei/papers/2013/eurosys_choosy.pdf

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, 
 YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-26 Thread Andrew Ferguson (JIRA)

[
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643070#comment-13643070
]

Andrew Ferguson commented on YARN-326:
--

hey Sandy,

sure, I certainly see the appeal of the absolute values approach -- like I
said, it's a design tradeoff.

however, one point of DRF is that we can sensibly consider fractions of
multidimensional resource vectors since the fraction is defined as the
fraction of the cluster consumed by the most dominant resource. having
single-dimensional fractions like this is nice because we can then a) weight
them, and b) calculate max-min fairness as in the one-dimensional case (eg,
memory) case.

consider the history and geology departments you introduced above. let's say
our policy is that each queue gets equal weight (since the the departments went
in on the purchase of the cluster 50/50), and that each queue should be
guaranteed a minimum of 1/4 of the cluster (so that a queue fresh with jobs
ramps-up to 1/4 of the cluster quickly).

in your proposal, since the departments have different shaped demands (one for
high-memory, the other for high-cpu), we would configure their minimum share
vectors based on these different shaped demands. this would work fine as long
as the departments continued to submit resource requests which had these same,
pre-configured shapes.

however, if we establish the minimums using fractions, then the departments can
easily change between different shaped jobs, and still have the minimums work
out for them sensibly. does this make sense?

let's be concrete:

10 nodes with 8 CPUs and 64 GB of RAM

if history usually submits jobs for (1 CPU, 16 GB) and geology for (2 CPU, 8
GB). with your proposal, we might define history's minimum allocation to be (10
CPU, 160 GB) (1/4 of the dominant resource) and geology to be (20 CPU, 80 GB)
(again, 1/4 of dominant resource). if either department changed the shape of
their requests, they wouldn't get full use of their minimum.

so, what if we listed the minimums as simply 1/4 * cluster size, but not
considering DRF? ie, giving (20 CPU and 160 GB) as the minimum allocation to
each? well, if the departments continued to submit the different shaped jobs (1
CPU, 16 GB) and (2 CPU, 8 GB), the design described would continue to see the
queues as being below their minimum allocation, even after the bottleneck
resource fully consumed its amount of the minimum allocation. in the extreme
case, I highly suspect a job could get *more* than its DRF-based fair share,
simply by having one of its non-dominant resources remain below the amount
listed in its minimum share. (can you see this? if not, I'll work out an
example)

the beauty of the fractions approach, in my mind, is that it will apply no
matter which resource is the bottleneck resource.

hope this example is clear. sorry I haven't had time to look at your code --
this is just based on my reading of your design doc. perhaps all is well and
good in the code itself. :-)

cheers,
Andrew

Add multi-resource scheduling to the fair scheduler
---

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-04-25 Thread Andrew Ferguson (JIRA)

[
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642286#comment-13642286
]

Andrew Ferguson commented on YARN-326:
--

hi Sandy,

I'm wondering if you want minimum and maximum shares to actually be fractions
of the cluster, rather than resource vectors? that would fit more with the
fairness aspect of the FairScheduler, but it's completely a design decision.

for example, what happens if the sum of the minimum shares for each queue
exceeds the size of the cluster? (or the size of the cluster during a failure?)

or, if my queue has been given a minimum share of (2 CPU, 240 GB RAM) --
because I was originally using tasks with high-memory, what happens if I decide
to switch to using tasks with high-CPU and low-memory? I think a minimum share
of 1/8 might make more sense since it would allow the queue's users to
request the resources as they see fit.

anyway, just a thought.

cheers,
Andrew

Add multi-resource scheduling to the fair scheduler
---

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2013-02-07 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574187#comment-13574187
 ] 

Andrew Ferguson commented on YARN-3:


[~acmurthy] thanks for the merge Arun!

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-18 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535111#comment-13535111
 ] 

Andrew Ferguson commented on YARN-3:


[~vinodkv] you bet! I will fix these today.

thanks,
Andrew

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-12-02 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v8.patch

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-147-v8.patch, 
 YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-12-02 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508411#comment-13508411
 ] 

Andrew Ferguson commented on YARN-3:


hi everyone, sorry for the delay on this patch -- the east coast hurricane  
other events set me behind schedule.

I have attached a new version of this work to YARN-147 (v8); it is based on the 
latest version of trunk. as always, you can see my github tree for exact 
changes: https://github.com/adferguson/hadoop-common/

this patch has been tested (and confirmed to work) as follows:
- default executor, no cgroups
- Linux executor, no cgroups
- Linux executor, with cgroups
- Linux executor, mount cgroups automatically
- Linux executor, cgroups already mounted  asked to mount
- error condition: cgroups already mounted  cannot write to cgroup
- error condition: asked to mount cgroups, but cannot mount

both error conditions result in the NodeManager halting, as we have discussed 
above.


[~bikassaha], to answer your first question: mountCgroups is a function in 
LinuxContainerExecutor because that class is simply a Java wrapper for the 
functions provided by the LCE.

[~bikassaha], to answer your second question: if we use cgroups to limit CPU 
and there is only one container running on the machine, the current design will 
allow the container to access all of the CPU resources until other tasks start 
running (a work-conserving design). this design is using the CPU weights 
feature of cgroups, rather than the cpu bandwidth feature (or the entirely 
separate cpusets controller) to limit the bandwidth (a non-work-conserving 
design).


thank you,
Andrew

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483266#comment-13483266
 ] 

Andrew Ferguson commented on YARN-3:


(replying to comments on YARN-147 here instead as per [~acmurthy]'s request)

thanks for catching that bug [~sseth]! I've updated my git repo [1], and will 
post a new patch after addressing the review from [~vinodkone]. I successfully 
tested it quite a bit with and without cgroups back in the summer, but it seems 
the patch has shifted enough since the testing that I should do it again.

[1] https://github.com/adferguson/hadoop-common/commits/adf-yarn-147

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-3
 URL: https://issues.apache.org/jira/browse/YARN-3
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
 Attachments: mapreduce-4334-design-doc.txt, 
 mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, 
 MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
 MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
 MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
 MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
 MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483268#comment-13483268
 ] 

Andrew Ferguson commented on YARN-147:
--

hi [~acmurthy], I've started posting replies on YARN-3 instead. the LCE bug is 
fixed and I'll post a new patch after addressing [~vinodkv]'s comments. thanks!

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483412#comment-13483412
]

Andrew Ferguson commented on YARN-3:

thanks for the review [~vinodkv]. I'll post an updated patch on YARN-147.
there's a lot of food for thought here (design questions), so here are some
comments:

bq. yarn.nodemanager.linux-container-executor.cgroups.mount has different
defaults in code and in yarn-default.xml

yeah -- personally, I think the default should be false since it's not clear
what a sensible default mount path is. I had changed the line in the code in
response to Tucu's comment [1], but I'm changing it back to false since true
doesn't seem sensible to me. if anyone in the community has a sensible default
mount path, then we can surely change the default to true in both the code and
yarn-default.xml :-/

bq. Can you explain this? Is this sleep necessary. Depending on its importance,
we'll need to fix the following Id check, AMs don't always have ID equaling one.

the sleep is necessary as sometimes the LCE reports that the container has
exited, even though the AM process has not terminated. hence, because the
process is still running, we can't remove the cgroup yet; therefore, the code
sleeps briefly.

since the AM doesn't always have the ID of 1, what do you suggest I do to
determine whether the container has the AM or not? if there isn't a good rule,
the code can just always sleep before removing the cgroup.

bq. container-executor.c: If a mount-point is already mounted, mount gives a
EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g.
on NM restarts). We could unmount cgroup fs on shutdown but that isn't always
guaranteed.

great catch! thanks! I've made this non-fatal. now, the NM will attempt to
re-mount the cgroup, will print a message that it can't do that because it's
mounted, and everything will work, because it will simply work as in the case
where the cluster admin has already mounted the cgroups.

bq. Not sure of the benefit of configurable
yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just
always mount to a path that it creates and owns? Similar comment for the
hierarchy-prefix.

for the hierarchy-prefix, this needs to be configurable since, in the scenario
where the admin creates the cgroups in advance, the NM doesn't have privileges
to create its own hierarchy.

for the mount-path, this is a good question. Linux distributions mount the
cgroup controllers in various locations, so I thought it was better to keep it
configurable, since I figured it would be confusing if the OS had already
mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then
the NM started mounting additional controllers in /path/nm/owns/cgroup/.

bq. CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple
places - updateCgroup() and createCgroup(). In the later, if cgroups are
enabled, and we can't create the file, it is a critical error?

I'm fine either way. what would people prefer to see? is it better to launch a
container even if we can't enforce the limits? or is it better to prevent the
container from launching? happy to make the necessary quick change.

bq. Make ResourcesHandler top level. I'd like to merge the ContainersMonitor
functionality with this so as to monitor/enforce memory limits also.
ContainersMinotor is top-level, we should make ResourcesHandler also top-level
so that other platforms don't need to create this type-hierarchy all over again
when they wish to implement some or all of this functionality.

if I'm reading this correctly, yes, that is what I first wanted to do when I
started this patch (see discussions at the top of this YARN-3 thread, the early
patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we have
decided to go another way.

thank you,
Andrew

[1]
https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13470926page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13470926

Add support for CPU isolation/monitoring of containers
--

Key: YARN-3
URL: https://issues.apache.org/jira/browse/YARN-3
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Andrew Ferguson
Attachments: mapreduce-4334-design-doc.txt,
mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch,
MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch,
MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch,
MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch,
MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch,
MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch,

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-24 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v6.patch

updated as per reviews on comments here and on YARN-3.

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-147-v5.patch, YARN-147-v6.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores

2012-10-21 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481081#comment-13481081
 ] 

Andrew Ferguson commented on YARN-2:


hi Arun,

this patch is looking GREAT! in particular, the ResourceCalculator class is 
super useful -- I really like it. :-)  my version, without it, is definitely 
much harder to follow...

before some specific feedback, I want to say that I agree that cores should be 
floats/fractional-units for three reasons:
# they make sense for long-running services, which may require little CPU, but 
should be available on each node, with the ease of having been scheduled by 
YARN.
# this gives us a fine-grained knob for implementing dynamic re-adjustment one 
day; ie, I may want to increase an executing job's weight by 10%, or decrease 
by 15%, etc.
# the publicly released traces of resource requests  usage in Google's cluster 
(to my knowledge, the only traces of their kind) include fractional amounts for 
CPU; having fractional CPU requests in YARN may make it easier to translate 
insights from that dataset to making better resource requests in a YARN cluster.

ok, here are some specific comments on the patch:
* *YarnConfiguration.java*: duplicate import of 
{{com.google.common.base.Joiner}}

* *DefaultContainer.java*: {{divideAndCeil}} explicitly uses the two-argument 
form of {{createResource}} to create a resource with 0 cores, whereas other 
Resources created in this calculator create resources with 1 core. this seems 
counter-intuitive to me, as {{divideAndCeil}} tends to result in an 
_overestimate_ of resource consumption, rather than an _underestimate_. either 
way, perhaps a comment would be helpful, as it is the only time this method is 
used this way in the memory-only comparator

* *MultiResourceCalculator.java*: in {{compare()}}, you are looking to order 
the resources by how dominant they are, and then compare by most-dominant 
resource, second most-dominant, etc. ... I think the boolean flag to 
{{getResourceAsValue()}} doesn't make this clear. with the flag, the question 
in my mind would be wait, why would I want the non-dominant resource?. simply 
having a boolean flag makes extending to three or more resources less clean. I 
implemented this by treating each resource request as a vector, normalizing by 
clusterResources, and then sorting the components by dominance.

* *MultiResourceCalcuator.java*, *DefaultCalculator.java*, *Resources.java*: 
for the {{multiplyAndNormalizeUp}} and {{multiplyAndNormalizeDown}} methods, 
consider renaming the third argument to stepping instead of factor is it's 
not a factor used for the multiplication, rather it's a unit of discretization 
to round to (stepping may not be the best word, but perhaps it's closer). 
just a thought...

* *CSQueueUtils.java*: extra spaces in front of {{@Lock(CSQueue.class)}}

* *CapacityScheduler.java*: in the {{allocate()}} method, there's a call to 
normalize the request (after a comment about sanity checks). currently, it only 
normalizes the memory; I think the patch should also normalize the number of 
CPU's requested, no?

* *LeafQueue.java*: in {{assignReservedContainer}} consider changing 
{{Resources.divide}} to {{Resources.ratio}} when calculating 
{{potentialNewCapacity}} (and the current capacity). While both calls should 
give the same result, {{ratio}} has fewer floating-point operations, and, 
better yet, is semantically what is meant in this case -- we're calculating the 
ratio between (used + requested) and available. Frankly, this is perhaps 
something to take a closer look at (as [~vinodkv] pointed out): whether both 
{{divide}} and {{ratio}} are needed, and if so, which should be used in each 
case.


Also, both *ContainerTokenIdentifier.java* and *BuilderUtils.java* assume that 
memory is the only resource; I'm not certain they should be updated, but I 
wanted to mention them just in case.

Oh, and should *yarn-default.xml* be updated with values for 
{{yarn.scheduler.minimum-allocaiton-cores}} and 
{{yarn.scheduler.maximum-allocation-cores}} ?


Hope this helps, Arun!  depending on how the discussion of integral vs 
fractional cores shakes out, I think this patch is good to go.


cheers,
Andrew

 Enhance CS to schedule accounting for both memory and cpu cores
 ---

 Key: YARN-2
 URL: https://issues.apache.org/jira/browse/YARN-2
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, 
 MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, 
 MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch,

[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores

2012-10-21 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481082#comment-13481082
 ] 

Andrew Ferguson commented on YARN-2:


oops, quick typo fix: by *DefaultContainer.java*, I meant 
*DefaultCalculator.java* .. thanks!

 Enhance CS to schedule accounting for both memory and cpu cores
 ---

 Key: YARN-2
 URL: https://issues.apache.org/jira/browse/YARN-2
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, 
 MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, 
 MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, 
 YARN-2.patch, YARN-2.patch, YARN-2.patch


 With YARN being a general purpose system, it would be useful for several 
 applications (MPI et al) to specify not just memory but also CPU (cores) for 
 their resource requirements. Thus, it would be useful to the 
 CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v3.patch

update native code per review by Colin

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479387#comment-13479387
 ] 

Andrew Ferguson commented on YARN-147:
--

hi Colin,

thanks for looking at the native code. since the changes were pretty extensive, 
would you mind taking a careful look again? if it's easier for you, the 
incremental changes can be seen here:
https://github.com/adferguson/hadoop-common/commits/adf-yarn-147

I hope I've faithfully implemented the new key-value API you suggested -- let 
me know if that's not the case.

If the mount fails, I let the exception bubble all the way up to stop the 
NodeManager, as Tucu suggested before about a different error.

The one thing I did not do is change the open / write / close methods to fopen 
/ fprintf / fclose, as the rest of the native code does not use those methods. 
Which would you prefer to see: adjust my patch to use fopen, etc., or fix my 
use of open, etc.?

Yes, I totally agree that it would be better if main.c used getopt_long; it 
definitely smells like another JIRA to me. :-)


thanks!
Andrew

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-18 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v4.patch

small fix in two places: don't log  re-throw the same exception -- construct 
new exceptions with better context, and use the previous one as the cause.

thanks Tucu for pointing this out!

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, 
 YARN-147-v4.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-16 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v2.patch

thanks for the additional comments, Tucu!  I've updated the patch as per your 
review. hopefully I have done everything correctly.

btw, I have this patch in github:
https://github.com/adferguson/hadoop-common/tree/adf-yarn-147
you can see the changes for this patch in the most recent commit.


thanks!
Andrew

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-14 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated YARN-147:
-

Attachment: YARN-147-v1.patch

updated patch as per Tucu's review

 Add support for CPU isolation/monitoring of containers
 --

 Key: YARN-147
 URL: https://issues.apache.org/jira/browse/YARN-147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
 Fix For: 2.0.3-alpha

 Attachments: YARN-147-v1.patch, YARN-3.patch


 This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not 
 show the SUBMIT PATCH button.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

2012-10-14 Thread Andrew Ferguson (JIRA)

[
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475909#comment-13475909
]

Andrew Ferguson commented on YARN-147:
--

hi Tucu,

thanks very much for opening this new jira and reviewing the patch. I've
updated a new version which addresses most of your comments.

answers to the questions in your review:
.bq cgroupMountPath, if there is no default we should fail if not set, can't we
have a sensible default?

I've added a check to fail if not set. as far as I can tell, there isn't a
single default path for cgroups -- some distributions use /sys/fs/cgroup,
some use /cgroup, others, /cgroups. I've even seen /mnt/cgroup (Debian
perhaps?); these also vary across releases of the same distro. :-(

.bq default value for cgroupPrefix has '/', here will produce a '//' in the path

yes, I made that choice deliberately. I wanted to convey that cgroupPrefix can
be a path (which is why I kept the '/') and when I use it, I also added a '/'
in case the user did not put a '/' at the right place in the prefix. my
understanding is that on Unix, '//' in a path is interpreted as '/', no?

.bq Nf the filereader cannot be open/read, is this acceptable or should stop
execution by throwing exception?

eh, we could go either way here, but I think it's reasonable to not throw the
exception. if the file can't be read, then the map from cgroup controller to
path isn't built, and we already have existing checks which skip controllers
which can't be found in the path (say, if the file can be read correctly, but
the CPU controller isn't mounted).

ok, great. I'm going to mark this as patch available and see if the findbugs
warning has gone away (I can't seem to get it to run locally).

thanks!!
Andrew

Add support for CPU isolation/monitoring of containers
--

Key: YARN-147
URL: https://issues.apache.org/jira/browse/YARN-147
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
Fix For: 2.0.3-alpha

Attachments: YARN-147-v1.patch, YARN-3.patch

This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not
show the SUBMIT PATCH button.

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores

[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

[jira] [Updated] (YARN-147) Add support for CPU isolation/monitoring of containers

[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

20 matches

Site Navigation

Mail list logo

Footer information