Re: Batch Get performance degrades from within Mapreduce

Himanish Kushary Mon, 20 Feb 2012 13:12:36 -0800

Thanks for the reply.Here are the info :

What is your concurrency like (How many concurrent mappers?).  - We have 8
concurrent mappers running

Where is the time being spent? In the server, in the mapper?  - The most
time is spent in calling Htable.batch(...) inside the mapper

Why are you having scanner timeouts if you are doing big batch Gets? - We
are getting scanner timeout from the original Scan which serves the input
records to the mapper.The scanner caching is set to 100 .

                             I think because the mapper is taking too
long(because of the batch Gets inside it) to process initial 100 records ,
the next batch of scanned records throws the exception

Also, could it be happening due to concurrency ? I am currently on a single
region-server. When i run the test case the batch Gets happen sequentially
whereas from the map-reduce the batch Gets happen concurrently on the same
region server. Could this be the reason that during map-reduce the
performance degrades due to thrashing on the same region server ? Thoughts ?

- Thanks
Himanish

On Mon, Feb 20, 2012 at 3:39 PM, Stack <[email protected]> wrote:

> On Mon, Feb 20, 2012 at 12:04 PM, Himanish Kushary <[email protected]>
> wrote:
> > Also to add , from the map-reduce we have started seeing
> >
> > org.apache.hadoop.hbase.client.ScannerTimeoutException: 360388ms
> > passed since the last invocation, timeout is currently set to 300000
> >
> > due to the extremely high time spent on firing the batch Gets
> >
>
> What is your concurrency like (How many concurrent mappers?).  Where
> is the time being spent?  In the server, in the mapper?  Why are you
> having scanner timeouts if you are doing big batch Gets?
>
> St.Ack
>

-- 
Thanks & Regards
Himanish

Re: Batch Get performance degrades from within Mapreduce

Reply via email to