Re: Batch Get performance degrades from within Mapreduce

Stack Mon, 20 Feb 2012 13:38:11 -0800

On Mon, Feb 20, 2012 at 1:12 PM, Himanish Kushary <[email protected]> wrote:
> Where is the time being spent? In the server, in the mapper?  - The most
> time is spent in calling Htable.batch(...) inside the mapper
>


So, 100 Gets at a time?


> Why are you having scanner timeouts if you are doing big batch Gets? - We
> are getting scanner timeout from the original Scan which serves the input
> records to the mapper.The scanner caching is set to 100 .
>
>                             I think because the mapper is taking too
> long(because of the batch Gets inside it) to process initial 100 records ,
> the next batch of scanned records throws the exception
>
>


How big are these 100 rows?

How many regions on this single RegionServer?


> Also, could it be happening due to concurrency ? I am currently on a single
> region-server. When i run the test case the batch Gets happen sequentially
> whereas from the map-reduce the batch Gets happen concurrently on the same
> region server. Could this be the reason that during map-reduce the
> performance degrades due to thrashing on the same region server ? Thoughts ?
>

A single regionserver?  How many datanodes?

Whats it look like on the machine running the regionserver?  Is it
working hard?  Seems odd that 100 gets would take longer than a a
minute to complete.

You've checked out the performance section of the reference guide?

St.Ack

Re: Batch Get performance degrades from within Mapreduce

Reply via email to