Re: Node scheduling in 2.1.x

Steve Loughran Mon, 09 Sep 2013 03:09:55 -0700

On 6 September 2013 19:02, hilfi alkaff <[email protected]> wrote:


> Thanks for all the replies. I think I have found the relevant codes that I
> would like to modify. That said, a project that I'm doing now requires
> containers to have network bandwidth as one of its resources (In
> Resource.java: it currently only models memory).
>
>
This is something that's been discussed before, IO bandwidth being the
other constraint.


Limits like this would most benefit mixed workload clusters, where what you
are trying to limit is not the net & IO bandwidth a low-latency service
ness, but the load the batch jobs place on the machines -it's not so much
restrictions on the service bandwidth you want, but the ability to
(dynamically?) throttle back the bandwidth that other containers are using.

However, you need to take into account that a lot of network traffic is
generated on off-host HDFS IO; throttle that back and your remote file IO
will also be restricted. Local HDFS operations will not be restricted -even
if you cgroup-limit your process- because that goes through the local
Datanode.



Since I'm planning to implement it anyway, I hope to be able to help
> Hadoop's development. However, I could not find the relevant JIRA for this.
> If you know of an existing ticket that is relevant to the aforementioned
> issue, let me know. If there is none, should I make my changes first (as
> listed http://wiki.apache.org/hadoop/HowToContribute) and get back after
> I'm done with my code?
>
>
These are pretty big changes, and doing it off on  your own and turning up
with a big set of changes it's unlikely to get in, due to the intimacy of
the changes across the codebase, and the fact that you don't yet have a
track record of working in this area (to be fair, nobody would trust me to
dabble in the scheduler either, even though I have the commit rights).

This would have to be collaborative development process, where even if you
do most of the coding of the feature and its test suite, you need to do it
visibly, get feedback & act on it -starting with the design

-we're obsessive about testing, so try and come up with a design for
testing all this that would measure the effects of the throttling. You
should also set up your own test infrastructure with Jenkins doing local
tests of your branch, ideally with a pool of real/VM servers.

-Having someone act as a mentor would help. I'm not going to volunteer, not
only due to existing commitments, but because its not an area of my
expertise.

-before undertaking a big project, try to pick a few small (existing?)
issues and go through the process of developing patches and nurturing them
in.

One thing I would like to see test-wise is something we can deploy on a
YARN cluster to generate system load: net, IO, HDFS, CPU, etc. I'm doing
something like this for Hoya [
http://www.slideshare.net/steve_l/hoya-hbase-on-yarn-20130820-hbase-hug ],
where I also want to simulate failures; if you want to help me with that
it'd be appreciated. (If the net load can be generated between peer nodes,
then you have a pretty good stress test of the network and a way of
measuring its blsection bandwidth too, though for cluster standup it's best
to use the standard unix tools for isolation of problems, and ease of
comparison with other clusters.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Node scheduling in 2.1.x

Reply via email to