Re: how to do parallel scanning in map reduce using hbase as input?

Li Li Thu, 26 Jun 2014 18:07:26 -0700

I don't think splitting will help. Adding more mappers in tasktracker
will use more resources(heap memory).
btw, how to view average region size?
I found in web ui:
ServerName      Num. Stores     Num. Storefiles Storefile Size
Uncompressed     Storefile Size  Index Size      Bloom Size
mphbase1,60020,1403089429986    38      119     75857m  75882mb 57839k  189016k
mphbase2,60020,1403089433406    37      124     92252m  92281mb 69323k  248532k
mphbase3,60020,1403088177813    40      53      35603m  35613mb 29471k  63010k
mphbase4,60020,1403088177030    38      118     64880m  64898mb 50425k  167232k
mphbase5,60020,1403088177302    38      99      52233m  52250mb 39737k  138680k


On Thu, Jun 26, 2014 at 9:11 PM, Ted Yu <[email protected]> wrote:
> 80 regions over 5 nodes - that's 16 per server.
>
> How big is average region size ?
> Have you considered splitting existing regions ?
>
> Cheers
>
> On Jun 26, 2014, at 12:34 AM, Li Li <[email protected]> wrote:
>
>> my table has about 700 million rows and about 80 regions. each task
>> tracker is configured with 4 mappers and 4 reducers at the same time.
>> The hadoop/hbase cluster has 5 nodes so at the same time, it has 20
>> mappers running. it takes more than an hour to finish mapper stage.
>> The hbase cluster's load is very low, about 2,000 request per second.
>> I think one mapper for a region is too small. How can I run more than
>> one mapper for a region so that it can take full advantage of
>> computing resources?

Re: how to do parallel scanning in map reduce using hbase as input?

Reply via email to