What I want to say that I don't understand why a count takes more time than a complete scan without cache. I thought it should take more time to scan the table than to execute a count. Another point is why is slower an distributed scan than a sequential scan. Tomorrow I'll check how many disk we have.
El miércoles, 10 de septiembre de 2014, Esteban Gutierrez < [email protected]> escribió: > Hello Guillermo, > > Sounds like some potential contention going on, how many disks per node you > have? > > Can you explain further what do you mean by "and I don't know why it's so > fast,, it's really much faster than execute an "count" from hbase shell," > the count command from the shell uses the FirstKeyOnlyFilter and a caching > of 10 which should be close to the behavior of your testing tool if its > using the same filter and the same cache settings. > > cheers, > esteban. > > > > > -- > Cloudera, Inc. > > > On Wed, Sep 10, 2014 at 1:40 AM, Guillermo Ortiz <[email protected] > <javascript:;>> > wrote: > > > Hi, > > > > I developed an distributed scan, I create an thread for each region. > After > > that, I've tried to get some times Scan vs DistributedScan. > > I have disabled blockcache in my table. My cluster has 3 region servers > > with 2 regions each one, in total there are 100.000 rows and execute a > > complete scan. > > > > My partitions are > > -01666 -> request 16665 > > 016666-033332 -> request 16666 > > 033332-049998 -> request 16666 > > 049998-066664 -> request 16666 > > 066664-083330 -> request 16666 > > 083330- -> request 16671 > > > > > > 14/09/10 09:15:47 INFO hbase.HbaseScanTest: NUM ROWS 100000 > > 14/09/10 09:15:47 INFO util.TimerUtil: SCAN PARALLEL:22089ms,Counter:2 -> > > Caching 10 > > > > 14/09/10 09:16:04 INFO hbase.HbaseScanTest: NUM ROWS 100000 > > 14/09/10 09:16:04 INFO util.TimerUtil: SCAN PARALJEL:16598ms,Counter:2 -> > > Caching 100 > > > > 14/09/10 09:16:22 INFO hbase.HbaseScanTest: NUM ROWS 100000 > > 14/09/10 09:16:22 INFO util.TimerUtil: SCAN PARALLEL:16497ms,Counter:2 -> > > Caching 1000 > > > > 14/09/10 09:17:41 INFO hbase.HbaseScanTest: NUM ROWS 100000 > > 14/09/10 09:17:41 INFO util.TimerUtil: SCAN NORMAL:68288ms,Counter:2 -> > > Caching 1 > > > > 14/09/10 09:17:48 INFO hbase.HbaseScanTest: NUM ROWS 100000 > > 14/09/10 09:17:48 INFO util.TimerUtil: SCAN NORMAL:2646ms,Counter:2 -> > > Caching 100 > > > > 14/09/10 09:17:58 INFO hbase.HbaseScanTest: NUM ROWS 100000 > > 14/09/10 09:17:58 INFO util.TimerUtil: SCAN NORMAL:3903ms,Counter:2 -> > > Caching 1000 > > > > Parallel scan works much worse than simple scan,, and I don't know why > it's > > so fast,, it's really much faster than execute an "count" from hbase > shell, > > what it doesn't look pretty notmal. The only time that it works better > > parallel is when I execute a normal scan with caching 1. > > > > Any clue about it? > > >
