Opened, https://issues.apache.org/jira/browse/HBASE-4702
Please edit to your liking. On Sun, Oct 9, 2011 at 9:05 PM, Himanshu Vashishtha <[email protected] > wrote: > MapReduce support in HBase inherently provides parallelism such that > each Region is given to one mapper. > > Himanshu > > On Sun, Oct 9, 2011 at 6:44 PM, lars hofhansl <[email protected]> wrote: > > Be aware that the contract for a scan is to return all rows sorted by > rowkey, hence it cannot scan regions in parallel by default.I have not > played much HBase with MapReduce, but if order is not important you can to > split the scan into multiple scans. > > > > > > ----- Original Message ----- > > From: Tom Goren <[email protected]> > > To: [email protected] > > Cc: > > Sent: Sunday, October 9, 2011 8:07 AM > > Subject: Re: speeding up rowcount > > > > lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5 > > million rows... > > > > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[email protected]> wrote: > > > >> Hi, > >> > >> I have been doing a rowcount via mapreduce and its taking about 4-5 > hours > >> to > >> count a 500million rows in a table. I was wondering if there are any map > >> reduce tunings I can do so it will go much faster. > >> > >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any > >> tuning > >> advice would be much appreciated. > >> > >> > >> -- > >> --- Get your facts first, then you can distort them as you please.-- > >> > > > > > -- --- Get your facts first, then you can distort them as you please.--
