How much memory do you have? What's the size of the underlying row? What does your network look like? 1GBe or 10GBe?
There's more to it, and I think that you'll find that YMMV on what is an optimum scan size... HTH -Mike On Sep 12, 2012, at 7:57 AM, Amit Sela <[email protected]> wrote: > Hi all, > > I'm trying to find the sweet spot for the cache size and batch size Scan() > parameters. > > I'm scanning one table using HTable.getScanner() and iterating over the > ResultScanner retrieved. > > I did some testing and got the following results: > > For scanning *1000000* rows. > > * > > Cache > > Batch > > Total execution time (sec) > > 10000 > > -1 (default) > > 112 > > 10000 > > 5000 > > 110 > > 10000 > > 10000 > > 110 > > 10000 > > 20000 > > 110 > > Cache > > Batch > > Total execution time (sec) > > 1000 > > -1 (default) > > 116 > > 10000 > > -1 (default) > > 110 > > 20000 > > -1 (default) > > 115 > > Cache > > Batch > > Total execution time (sec) > > 5000 > > 10 > > 26 > > 20000 > > 10 > > 25 > > 50000 > > 10 > > 26 > > 5000 > > 5 > > 15 > > 20000 > > 5 > > 14 > > 50000 > > 5 > > 14 > > 1000 > > 1 > > 6 > > 5000 > > 1 > > 5 > > 10000 > > 1 > > 4 > > 20000 > > 1 > > 4 > > 50000 > > 1 > > 4 > > * > *I don't understand why a lower batch size gives such an improvement ?* > > Thanks, > > Amit. > * > *
