So just to get it straight. The reason the scan with setBatch(1) is much much faster is because it returns the only the value for the first column ?
On Wed, Sep 12, 2012 at 5:37 PM, Doug Meil <[email protected]>wrote: > > Hi there, > > See this for info on the block cache in the RegionServer.. > > http://hbase.apache.org/book.html > 9.6.4. Block Cache > > Š and see this for "batching" on the scan parameter... > > http://hbase.apache.org/book.html#perf.reading > 11.8.1. Scan Caching > > > > > > > On 9/12/12 9:55 AM, "Amit Sela" <[email protected]> wrote: > > >I allocate 10GB per RegionServer. > >An average row size is ~200 Bytes. > >The network is 1GB. > > > >It would be great if anyone could elaborate on the difference between > >Cache > >and Batch parameters. > > > >Thanks. > > > >On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel > ><[email protected]>wrote: > > > >> How much memory do you have? > >> What's the size of the underlying row? > >> What does your network look like? 1GBe or 10GBe? > >> > >> There's more to it, and I think that you'll find that YMMV on what is an > >> optimum scan size... > >> > >> HTH > >> > >> -Mike > >> > >> On Sep 12, 2012, at 7:57 AM, Amit Sela <[email protected]> wrote: > >> > >> > Hi all, > >> > > >> > I'm trying to find the sweet spot for the cache size and batch size > >> Scan() > >> > parameters. > >> > > >> > I'm scanning one table using HTable.getScanner() and iterating over > >>the > >> > ResultScanner retrieved. > >> > > >> > I did some testing and got the following results: > >> > > >> > For scanning *1000000* rows. > >> > > >> > * > >> > > >> > Cache > >> > > >> > Batch > >> > > >> > Total execution time (sec) > >> > > >> > 10000 > >> > > >> > -1 (default) > >> > > >> > 112 > >> > > >> > 10000 > >> > > >> > 5000 > >> > > >> > 110 > >> > > >> > 10000 > >> > > >> > 10000 > >> > > >> > 110 > >> > > >> > 10000 > >> > > >> > 20000 > >> > > >> > 110 > >> > > >> > Cache > >> > > >> > Batch > >> > > >> > Total execution time (sec) > >> > > >> > 1000 > >> > > >> > -1 (default) > >> > > >> > 116 > >> > > >> > 10000 > >> > > >> > -1 (default) > >> > > >> > 110 > >> > > >> > 20000 > >> > > >> > -1 (default) > >> > > >> > 115 > >> > > >> > Cache > >> > > >> > Batch > >> > > >> > Total execution time (sec) > >> > > >> > 5000 > >> > > >> > 10 > >> > > >> > 26 > >> > > >> > 20000 > >> > > >> > 10 > >> > > >> > 25 > >> > > >> > 50000 > >> > > >> > 10 > >> > > >> > 26 > >> > > >> > 5000 > >> > > >> > 5 > >> > > >> > 15 > >> > > >> > 20000 > >> > > >> > 5 > >> > > >> > 14 > >> > > >> > 50000 > >> > > >> > 5 > >> > > >> > 14 > >> > > >> > 1000 > >> > > >> > 1 > >> > > >> > 6 > >> > > >> > 5000 > >> > > >> > 1 > >> > > >> > 5 > >> > > >> > 10000 > >> > > >> > 1 > >> > > >> > 4 > >> > > >> > 20000 > >> > > >> > 1 > >> > > >> > 4 > >> > > >> > 50000 > >> > > >> > 1 > >> > > >> > 4 > >> > > >> > * > >> > *I don't understand why a lower batch size gives such an improvement > >>?* > >> > > >> > Thanks, > >> > > >> > Amit. > >> > * > >> > * > >> > >> > > >
