Hello Guillermo, Sounds like some potential contention going on, how many disks per node you have?
Can you explain further what do you mean by "and I don't know why it's so fast,, it's really much faster than execute an "count" from hbase shell," the count command from the shell uses the FirstKeyOnlyFilter and a caching of 10 which should be close to the behavior of your testing tool if its using the same filter and the same cache settings. cheers, esteban. -- Cloudera, Inc. On Wed, Sep 10, 2014 at 1:40 AM, Guillermo Ortiz <[email protected]> wrote: > Hi, > > I developed an distributed scan, I create an thread for each region. After > that, I've tried to get some times Scan vs DistributedScan. > I have disabled blockcache in my table. My cluster has 3 region servers > with 2 regions each one, in total there are 100.000 rows and execute a > complete scan. > > My partitions are > -01666 -> request 16665 > 016666-033332 -> request 16666 > 033332-049998 -> request 16666 > 049998-066664 -> request 16666 > 066664-083330 -> request 16666 > 083330- -> request 16671 > > > 14/09/10 09:15:47 INFO hbase.HbaseScanTest: NUM ROWS 100000 > 14/09/10 09:15:47 INFO util.TimerUtil: SCAN PARALLEL:22089ms,Counter:2 -> > Caching 10 > > 14/09/10 09:16:04 INFO hbase.HbaseScanTest: NUM ROWS 100000 > 14/09/10 09:16:04 INFO util.TimerUtil: SCAN PARALJEL:16598ms,Counter:2 -> > Caching 100 > > 14/09/10 09:16:22 INFO hbase.HbaseScanTest: NUM ROWS 100000 > 14/09/10 09:16:22 INFO util.TimerUtil: SCAN PARALLEL:16497ms,Counter:2 -> > Caching 1000 > > 14/09/10 09:17:41 INFO hbase.HbaseScanTest: NUM ROWS 100000 > 14/09/10 09:17:41 INFO util.TimerUtil: SCAN NORMAL:68288ms,Counter:2 -> > Caching 1 > > 14/09/10 09:17:48 INFO hbase.HbaseScanTest: NUM ROWS 100000 > 14/09/10 09:17:48 INFO util.TimerUtil: SCAN NORMAL:2646ms,Counter:2 -> > Caching 100 > > 14/09/10 09:17:58 INFO hbase.HbaseScanTest: NUM ROWS 100000 > 14/09/10 09:17:58 INFO util.TimerUtil: SCAN NORMAL:3903ms,Counter:2 -> > Caching 1000 > > Parallel scan works much worse than simple scan,, and I don't know why it's > so fast,, it's really much faster than execute an "count" from hbase shell, > what it doesn't look pretty notmal. The only time that it works better > parallel is when I execute a normal scan with caching 1. > > Any clue about it? >
