Re: speeding up rowcount

Ted Yu Sun, 09 Oct 2011 08:44:46 -0700

Excellent question.
There seems to be a bug for RowCounter.

In TableInputFormat:
        if (conf.get(SCAN_CACHEDROWS) != null) {
          scan.setCaching(Integer.parseInt(conf.get(SCAN_CACHEDROWS)));
        }
But I don't see SCAN_CACHEDROWS in either TableMapReduceUtil or RowCounter.


Mind filing a bug ?

On Sun, Oct 9, 2011 at 8:30 AM, Rita <[email protected]> wrote:

> Thanks for the responses.
>
> Where do I set the high Scan cache values?
>
>
> On Sun, Oct 9, 2011 at 11:19 AM, Himanshu Vashishtha <
> [email protected]> wrote:
>
> > Since a MapReduce is a separate process, try with a high Scan cache
> value.
> >
> > http://hbase.apache.org/book.html#perf.hbase.client.caching
> >
> > Himanshu
> >
> > On Sun, Oct 9, 2011 at 9:09 AM, Ted Yu <[email protected]> wrote:
> > > I guess your hbase.hregion.max.filesize is quite high.
> > > If possible, lower its value so that you have smaller regions.
> > >
> > > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[email protected]> wrote:
> > >
> > >> Hi,
> > >>
> > >> I have been doing a rowcount via mapreduce and its taking about 4-5
> > hours
> > >> to
> > >> count a 500million rows in a table. I was wondering if there are any
> map
> > >> reduce tunings I can do so it will go much faster.
> > >>
> > >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
> > >> tuning
> > >> advice would be much appreciated.
> > >>
> > >>
> > >> --
> > >> --- Get your facts first, then you can distort them as you please.--
> > >>
> > >
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>

Re: speeding up rowcount

Reply via email to