Re: Batch Get performance degrades from within Mapreduce

Himanish Kushary Mon, 20 Feb 2012 13:54:29 -0800

Actually 1200-2000 GETs at a time from inside the mapper.From unit test
case 120,000 GETs at a time. From Mapper it takes 20-30 secs whereas from
test case ( the much bigger GETs call) takes just 5-6 secs.


The rows are pretty small(one LONG value)..7 regions in the single region
server for the concerned table

When the map-reduce is running ..On the regionserver sometimes the free
memory ( using free -m ) coming down to 100 MB , CPU usage (using top
command ) for hbase going between 300-400%

Here is the regionserver configuration :

RAM - 20G for the entire box with 8G heap size for the
reionserver(Swapiness = 0)
CPU - 8 Core
RegionServer handler count = 50
Compression - Enabled (Snappy)
Table File Max Size = 3G
Block Cache,Memstore left at default settings
No of datanodes = 3


My main concern is - Why does the same batch gets perform much much better
when run through test cases compared with running it through Map-Reduce.It
may make sense probably only due to the concurrency issue.

- Thank
Himanish

On Mon, Feb 20, 2012 at 4:37 PM, Stack <[email protected]> wrote:

> On Mon, Feb 20, 2012 at 1:12 PM, Himanish Kushary <[email protected]>
> wrote:
> > Where is the time being spent? In the server, in the mapper?  - The most
> > time is spent in calling Htable.batch(...) inside the mapper
> >
>
> So, 100 Gets at a time?
>
>
> > Why are you having scanner timeouts if you are doing big batch Gets? - We
> > are getting scanner timeout from the original Scan which serves the input
> > records to the mapper.The scanner caching is set to 100 .
> >
> >                             I think because the mapper is taking too
> > long(because of the batch Gets inside it) to process initial 100 records
> ,
> > the next batch of scanned records throws the exception
> >
> >
>
>
> How big are these 100 rows?
>
> How many regions on this single RegionServer?
>
>
> > Also, could it be happening due to concurrency ? I am currently on a
> single
> > region-server. When i run the test case the batch Gets happen
> sequentially
> > whereas from the map-reduce the batch Gets happen concurrently on the
> same
> > region server. Could this be the reason that during map-reduce the
> > performance degrades due to thrashing on the same region server ?
> Thoughts ?
> >
>
> A single regionserver?  How many datanodes?
>
> Whats it look like on the machine running the regionserver?  Is it
> working hard?  Seems odd that 100 gets would take longer than a a
> minute to complete.
>
> You've checked out the performance section of the reference guide?
>
> St.Ack
>



-- 
Thanks & Regards
Himanish

Re: Batch Get performance degrades from within Mapreduce

Reply via email to