Hi all, I'm trying to find the sweet spot for the cache size and batch size Scan() parameters.
I'm scanning one table using HTable.getScanner() and iterating over the ResultScanner retrieved. I did some testing and got the following results: For scanning *1000000* rows. * Cache Batch Total execution time (sec) 10000 -1 (default) 112 10000 5000 110 10000 10000 110 10000 20000 110 Cache Batch Total execution time (sec) 1000 -1 (default) 116 10000 -1 (default) 110 20000 -1 (default) 115 Cache Batch Total execution time (sec) 5000 10 26 20000 10 25 50000 10 26 5000 5 15 20000 5 14 50000 5 14 1000 1 6 5000 1 5 10000 1 4 20000 1 4 50000 1 4 * *I don't understand why a lower batch size gives such an improvement ?* Thanks, Amit. * *
