Re: Batch Get performance degrades from within Mapreduce

Himanish Kushary Mon, 20 Feb 2012 17:42:43 -0800

Could somebody help me figure out whats the difference while running
through map-reduce..is it just the concurrency that causing the issue.Will
increasing the number of region servers help ?


BTW, the master is also on the same server as the regionserver.Is it just a
environment issue or there is some other configuration that me improve the
read performance from within the mapper.

Thanks
Himanish

On Mon, Feb 20, 2012 at 4:54 PM, Himanish Kushary <[email protected]>wrote:

> Actually 1200-2000 GETs at a time from inside the mapper.From unit test
> case 120,000 GETs at a time. From Mapper it takes 20-30 secs whereas from
> test case ( the much bigger GETs call) takes just 5-6 secs.
>
> The rows are pretty small(one LONG value)..7 regions in the single region
> server for the concerned table
>
> When the map-reduce is running ..On the regionserver sometimes the free
> memory ( using free -m ) coming down to 100 MB , CPU usage (using top
> command ) for hbase going between 300-400%
>
> Here is the regionserver configuration :
>
> RAM - 20G for the entire box with 8G heap size for the
> reionserver(Swapiness = 0)
> CPU - 8 Core
> RegionServer handler count = 50
> Compression - Enabled (Snappy)
> Table File Max Size = 3G
> Block Cache,Memstore left at default settings
> No of datanodes = 3
>
>
> My main concern is - Why does the same batch gets perform much much better
> when run through test cases compared with running it through Map-Reduce.It
> may make sense probably only due to the concurrency issue.
>
> - Thank
> Himanish
>
>
> On Mon, Feb 20, 2012 at 4:37 PM, Stack <[email protected]> wrote:
>
>> On Mon, Feb 20, 2012 at 1:12 PM, Himanish Kushary <[email protected]>
>> wrote:
>> > Where is the time being spent? In the server, in the mapper?  - The most
>> > time is spent in calling Htable.batch(...) inside the mapper
>> >
>>
>> So, 100 Gets at a time?
>>
>>
>> > Why are you having scanner timeouts if you are doing big batch Gets? -
>> We
>> > are getting scanner timeout from the original Scan which serves the
>> input
>> > records to the mapper.The scanner caching is set to 100 .
>> >
>> >                             I think because the mapper is taking too
>> > long(because of the batch Gets inside it) to process initial 100
>> records ,
>> > the next batch of scanned records throws the exception
>> >
>> >
>>
>>
>> How big are these 100 rows?
>>
>> How many regions on this single RegionServer?
>>
>>
>> > Also, could it be happening due to concurrency ? I am currently on a
>> single
>> > region-server. When i run the test case the batch Gets happen
>> sequentially
>> > whereas from the map-reduce the batch Gets happen concurrently on the
>> same
>> > region server. Could this be the reason that during map-reduce the
>> > performance degrades due to thrashing on the same region server ?
>> Thoughts ?
>> >
>>
>> A single regionserver?  How many datanodes?
>>
>> Whats it look like on the machine running the regionserver?  Is it
>> working hard?  Seems odd that 100 gets would take longer than a a
>> minute to complete.
>>
>> You've checked out the performance section of the reference guide?
>>
>> St.Ack
>>
>
>
>
> --
> Thanks & Regards
> Himanish
>



-- 
Thanks & Regards
Himanish

Re: Batch Get performance degrades from within Mapreduce

Reply via email to