Re: speeding up rowcount

Rita Sat, 29 Oct 2011 07:30:10 -0700

Opened, https://issues.apache.org/jira/browse/HBASE-4702



Please edit to your liking.


On Sun, Oct 9, 2011 at 9:05 PM, Himanshu Vashishtha <[email protected]
> wrote:

> MapReduce support in HBase inherently provides parallelism such that
> each Region is given to one mapper.
>
> Himanshu
>
> On Sun, Oct 9, 2011 at 6:44 PM, lars hofhansl <[email protected]> wrote:
> > Be aware that the contract for a scan is to return all rows sorted by
> rowkey, hence it cannot scan regions in parallel by default.I have not
> played much HBase with MapReduce, but if order is not important you can to
> split the scan into multiple scans.
> >
> >
> > ----- Original Message -----
> > From: Tom Goren <[email protected]>
> > To: [email protected]
> > Cc:
> > Sent: Sunday, October 9, 2011 8:07 AM
> > Subject: Re: speeding up rowcount
> >
> > lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
> > million rows...
> >
> > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> I have been doing a rowcount via mapreduce and its taking about 4-5
> hours
> >> to
> >> count a 500million rows in a table. I was wondering if there are any map
> >> reduce tunings I can do so it will go much faster.
> >>
> >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
> >> tuning
> >> advice would be much appreciated.
> >>
> >>
> >> --
> >> --- Get your facts first, then you can distort them as you please.--
> >>
> >
> >
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: speeding up rowcount

Reply via email to