Since a RowCounter uses FirstKeyOnlyFilter, we can have a default Scan cache value of 500 or so?
Himanshu On Sun, Oct 9, 2011 at 9:44 AM, Ted Yu <[email protected]> wrote: > Excellent question. > There seems to be a bug for RowCounter. > > In TableInputFormat: > if (conf.get(SCAN_CACHEDROWS) != null) { > scan.setCaching(Integer.parseInt(conf.get(SCAN_CACHEDROWS))); > } > But I don't see SCAN_CACHEDROWS in either TableMapReduceUtil or RowCounter. > > Mind filing a bug ? > > On Sun, Oct 9, 2011 at 8:30 AM, Rita <[email protected]> wrote: > >> Thanks for the responses. >> >> Where do I set the high Scan cache values? >> >> >> On Sun, Oct 9, 2011 at 11:19 AM, Himanshu Vashishtha < >> [email protected]> wrote: >> >> > Since a MapReduce is a separate process, try with a high Scan cache >> value. >> > >> > http://hbase.apache.org/book.html#perf.hbase.client.caching >> > >> > Himanshu >> > >> > On Sun, Oct 9, 2011 at 9:09 AM, Ted Yu <[email protected]> wrote: >> > > I guess your hbase.hregion.max.filesize is quite high. >> > > If possible, lower its value so that you have smaller regions. >> > > >> > > On Sun, Oct 9, 2011 at 7:50 AM, Rita <[email protected]> wrote: >> > > >> > >> Hi, >> > >> >> > >> I have been doing a rowcount via mapreduce and its taking about 4-5 >> > hours >> > >> to >> > >> count a 500million rows in a table. I was wondering if there are any >> map >> > >> reduce tunings I can do so it will go much faster. >> > >> >> > >> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any >> > >> tuning >> > >> advice would be much appreciated. >> > >> >> > >> >> > >> -- >> > >> --- Get your facts first, then you can distort them as you please.-- >> > >> >> > > >> > >> >> >> >> -- >> --- Get your facts first, then you can distort them as you please.-- >> >
