Re: Managing MapReduce jobs with concurrent client reads

Eric Czech Fri, 07 Sep 2012 08:29:15 -0700

Neither right now -- I'm just assuming that it would be a problem
since I would definitely have to support both in a hypothetical
HBase+Hadoop installment that isn't actually built yet.


Did you ever try corralling those jobs by just reducing the number of
available map/reduce tasks or did you find that that isn't a reliable
throttling mechanism?

Also, is replication to that batch cluster done via HBase replication
or some other approach?

On Thu, Sep 6, 2012 at 4:08 PM, Stack <[email protected]> wrote:
>
> On Wed, Sep 5, 2012 at 6:25 AM, Eric Czech <[email protected]> wrote:
> > Hi everyone,
> >
> > Does anyone have any recommendations on how to maintain low latency for
> > small, individual reads from HBase while MapReduce jobs are being run?  Is
> > replication a good way to handle this (i.e. run small, low-latency queries
> > against a replicated copy of the data and run the MapReduce jobs on the
> > master copy)?
>
> MapReduce is blowing your caches or higher i/o is sending up latency
> when you have cache miss?  Or its using all the CPU?
>
> Dependent on how its impinges, you could trying corralling mapreduce
> (cgroups/jail) or go to an extreme and keep a low latency OLTP cluster
> running well-known, well-behaved mapreduce jobs replicating into a
> batch cluster where mapreduce is allowed free rein (This is what we do
> where I work.  We also cgroup mapreduce cluster even on our batch
> cluster so random big MR doesn't make the pagers go off during sleepy
> time).
>
> St.Ack

Re: Managing MapReduce jobs with concurrent client reads

Reply via email to