Re: speeding up rowcount

Himanshu Vashishtha Sun, 09 Oct 2011 09:27:01 -0700

Since a RowCounter uses FirstKeyOnlyFilter, we can have a default Scan
cache value of 500 or so?


Himanshu

On Sun, Oct 9, 2011 at 9:44 AM, Ted Yu <[email protected]> wrote:
> Excellent question.
> There seems to be a bug for RowCounter.
>
> In TableInputFormat:
>        if (conf.get(SCAN_CACHEDROWS) != null) {
>          scan.setCaching(Integer.parseInt(conf.get(SCAN_CACHEDROWS)));
>        }
> But I don't see SCAN_CACHEDROWS in either TableMapReduceUtil or RowCounter.
>
> Mind filing a bug ?
>
> On Sun, Oct 9, 2011 at 8:30 AM, Rita <[email protected]> wrote:
>
>> Thanks for the responses.
>>
>> Where do I set the high Scan cache values?
>>
>>
>> On Sun, Oct 9, 2011 at 11:19 AM, Himanshu Vashishtha <
>> [email protected]> wrote:
>>
>> > Since a MapReduce is a separate process, try with a high Scan cache
>> value.
>> >
>> > http://hbase.apache.org/book.html#perf.hbase.client.caching
>> >
>> > Himanshu
>> >
>> > On Sun, Oct 9, 2011 at 9:09 AM, Ted Yu <[email protected]> wrote:
>> > > I guess your hbase.hregion.max.filesize is quite high.
>> > > If possible, lower its value so that you have smaller regions.
>> > >
>> > > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[email protected]> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> I have been doing a rowcount via mapreduce and its taking about 4-5
>> > hours
>> > >> to
>> > >> count a 500million rows in a table. I was wondering if there are any
>> map
>> > >> reduce tunings I can do so it will go much faster.
>> > >>
>> > >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
>> > >> tuning
>> > >> advice would be much appreciated.
>> > >>
>> > >>
>> > >> --
>> > >> --- Get your facts first, then you can distort them as you please.--
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>

Re: speeding up rowcount

Reply via email to