Be aware that the contract for a scan is to return all rows sorted by rowkey, 
hence it cannot scan regions in parallel by default.I have not played much 
HBase with MapReduce, but if order is not important you can to split the scan 
into multiple scans.


----- Original Message -----
From: Tom Goren <[email protected]>
To: [email protected]
Cc: 
Sent: Sunday, October 9, 2011 8:07 AM
Subject: Re: speeding up rowcount

lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
million rows...

On Sun, Oct 9, 2011 at 7:50 AM, Rita <[email protected]> wrote:

> Hi,
>
> I have been doing a rowcount via mapreduce and its taking about 4-5 hours
> to
> count a 500million rows in a table. I was wondering if there are any map
> reduce tunings I can do so it will go much faster.
>
> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
> tuning
> advice would be much appreciated.
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>

Reply via email to