Re: Scan vs Parallel scan.

Guillermo Ortiz Wed, 10 Sep 2014 13:41:55 -0700

What I want to say that I don't understand why a count takes more time than
a complete scan without cache. I thought it should take more time to scan
the table than to execute a count.
Another point is why is slower an distributed scan than a sequential scan.
Tomorrow I'll check how many disk we have.


El miércoles, 10 de septiembre de 2014, Esteban Gutierrez <
[email protected]> escribió:

> Hello Guillermo,
>
> Sounds like some potential contention going on, how many disks per node you
> have?
>
> Can you explain further what do you mean by "and I don't know why it's so
> fast,, it's really much faster than execute an "count" from hbase shell,"
> the count command from the shell uses the FirstKeyOnlyFilter and a caching
> of 10 which should be close to the behavior of your testing tool if its
> using the same filter and the same cache settings.
>
> cheers,
> esteban.
>
>
>
>
> --
> Cloudera, Inc.
>
>
> On Wed, Sep 10, 2014 at 1:40 AM, Guillermo Ortiz <[email protected]
> <javascript:;>>
> wrote:
>
> > Hi,
> >
> > I developed an distributed scan, I create an thread for each region.
> After
> > that, I've tried to get some times Scan vs DistributedScan.
> > I have disabled blockcache in my table. My cluster has 3 region servers
> > with 2 regions each one, in total there are 100.000 rows and execute a
> > complete scan.
> >
> > My partitions are
> > -01666 -> request 16665
> > 016666-033332 -> request 16666
> > 033332-049998 -> request 16666
> > 049998-066664 -> request 16666
> > 066664-083330 -> request 16666
> > 083330- -> request 16671
> >
> >
> > 14/09/10 09:15:47 INFO hbase.HbaseScanTest: NUM ROWS 100000
> > 14/09/10 09:15:47 INFO util.TimerUtil: SCAN PARALLEL:22089ms,Counter:2 ->
> > Caching 10
> >
> > 14/09/10 09:16:04 INFO hbase.HbaseScanTest: NUM ROWS 100000
> > 14/09/10 09:16:04 INFO util.TimerUtil: SCAN PARALJEL:16598ms,Counter:2 ->
> > Caching 100
> >
> > 14/09/10 09:16:22 INFO hbase.HbaseScanTest: NUM ROWS 100000
> > 14/09/10 09:16:22 INFO util.TimerUtil: SCAN PARALLEL:16497ms,Counter:2 ->
> > Caching 1000
> >
> > 14/09/10 09:17:41 INFO hbase.HbaseScanTest: NUM ROWS 100000
> > 14/09/10 09:17:41 INFO util.TimerUtil: SCAN NORMAL:68288ms,Counter:2 ->
> > Caching 1
> >
> > 14/09/10 09:17:48 INFO hbase.HbaseScanTest: NUM ROWS 100000
> > 14/09/10 09:17:48 INFO util.TimerUtil: SCAN NORMAL:2646ms,Counter:2 ->
> > Caching 100
> >
> > 14/09/10 09:17:58 INFO hbase.HbaseScanTest: NUM ROWS 100000
> > 14/09/10 09:17:58 INFO util.TimerUtil: SCAN NORMAL:3903ms,Counter:2 ->
> > Caching 1000
> >
> > Parallel scan works much worse than simple scan,, and I don't know why
> it's
> > so fast,, it's really much faster than execute an "count" from hbase
> shell,
> > what it doesn't look pretty notmal. The only time that it works better
> > parallel is when I execute a normal scan with caching 1.
> >
> > Any clue about it?
> >
>

Re: Scan vs Parallel scan.

Reply via email to