Hi, I developed an distributed scan, I create an thread for each region. After that, I've tried to get some times Scan vs DistributedScan. I have disabled blockcache in my table. My cluster has 3 region servers with 2 regions each one, in total there are 100.000 rows and execute a complete scan.
My partitions are -01666 -> request 16665 016666-033332 -> request 16666 033332-049998 -> request 16666 049998-066664 -> request 16666 066664-083330 -> request 16666 083330- -> request 16671 14/09/10 09:15:47 INFO hbase.HbaseScanTest: NUM ROWS 100000 14/09/10 09:15:47 INFO util.TimerUtil: SCAN PARALLEL:22089ms,Counter:2 -> Caching 10 14/09/10 09:16:04 INFO hbase.HbaseScanTest: NUM ROWS 100000 14/09/10 09:16:04 INFO util.TimerUtil: SCAN PARALJEL:16598ms,Counter:2 -> Caching 100 14/09/10 09:16:22 INFO hbase.HbaseScanTest: NUM ROWS 100000 14/09/10 09:16:22 INFO util.TimerUtil: SCAN PARALLEL:16497ms,Counter:2 -> Caching 1000 14/09/10 09:17:41 INFO hbase.HbaseScanTest: NUM ROWS 100000 14/09/10 09:17:41 INFO util.TimerUtil: SCAN NORMAL:68288ms,Counter:2 -> Caching 1 14/09/10 09:17:48 INFO hbase.HbaseScanTest: NUM ROWS 100000 14/09/10 09:17:48 INFO util.TimerUtil: SCAN NORMAL:2646ms,Counter:2 -> Caching 100 14/09/10 09:17:58 INFO hbase.HbaseScanTest: NUM ROWS 100000 14/09/10 09:17:58 INFO util.TimerUtil: SCAN NORMAL:3903ms,Counter:2 -> Caching 1000 Parallel scan works much worse than simple scan,, and I don't know why it's so fast,, it's really much faster than execute an "count" from hbase shell, what it doesn't look pretty notmal. The only time that it works better parallel is when I execute a normal scan with caching 1. Any clue about it?
