Re: Heterogeneous cluster

Robert Dyer Sat, 08 Dec 2012 10:38:44 -0800

I too am interested in this custom load balancer, as I was actually just
starting to look into writing one that does the same thing for
my heterogeneous cluster!


Is this available somewhere?

On Sat, Dec 8, 2012 at 9:17 AM, James Chang <[email protected]> wrote:

>      By the way, I saw you mentioned that you
> have built a "LoadBalancer", could you kindly
> share some detailed info about it?
>
> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
>
> > Hi,
> >
> > Here is the situation.
> >
> > I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
> > cores CPUs servers. The performances of those different servers allow
> > them to handle different size of load. So far, I built a LoadBalancer
> > which balance the regions over those servers based on the
> > performances. And it’s working quite well. The RowCounter went down
> > from 11 minutes to 6 minutes. However, I can still see that the tasks
> > are run on some servers accessing data on other servers, which
> > overwhelme the bandwidth and slow done the process since some 2 cores
> > servers are assigned to count some rows hosted on 8 cores servers.
> >
> > I’m looking for a way to “force” the tasks to run on the servers where
> > the regions are assigned.
> >
> > I first tried to reject the tasks on the Mapper setup method when the
> > data was not local to see if the tracker will assign it to another
> > server. No. It’s just failing and mostly not re-assigned. I tried
> > IOExceptions, RuntimeExceptions, InterruptionExceptions with no
> > success.
> >
> > So now I have 3 possible options.
> >
> > The first one is to move from the MapReduce to the Coprocessor
> > EndPoint. Running locally on the RegionServer, it’s accessing only the
> > local data and I can manually reject all what is not local. Therefor
> > it’s achieving my needs, but it’s not my preferred options since I
> > would like to keep the MR features.
> >
> > The second option is to tell Hadoop where the tasks should be
> > assigned. Should that be done by HBase? By Hadoop? I don’t know.
> > Where? I don’t know either. I have started to look at JobTracker and
> > JobInProgress code but it seems it will be a big task. Also, doing
> > that will mean I will have to re-patch the distributed code each time
> > I’m upgrading the version, and I will have to redo everything when I
> > will move from 1.0.x to 2.x…
> >
> > Third option is to not process the task if the data is not local. I
> > mean, on the map method, simply have a if (!local) return; right from
> > the beginning and just do nothing. This will not work for things like
> > RowCount since all the entries are required, but for some of my
> > usecases this might work where I don’t necessary need all the data to
> > be processed. I will not be efficient stlil the task will still scan
> > the entire region.
> >
> > My preferred option is definitively the 2nd one, but it seems also to
> > be the most difficult one. The Third one is very easy to implement.
> > Need 2 lines to see if the data is local. But it’s not working for all
> > the scenarios, and is more like a dirty fix. The coprocessor option
> > might be doable too since I already have all the code for my MapReduce
> > jobs. So it might be an acceptable option.
> >
> > I’m wondering if anyone already faced this situation and worked on
> > something, and if not, do you have any other ideas/options to propose,
> > or can someone point me to the right classes to look at to implement
> > the solution 2?
> >
> > Thanks,
> >
> > JM
> >
>



-- 

Robert Dyer
[email protected]

Re: Heterogeneous cluster

Reply via email to