Re: Row count

Ankit Singhal Wed, 13 Sep 2017 03:54:12 -0700

Best is to do "SELECT COUNT(*) FROM MYTABLE" with index. As index table
will have less data so it can be read faster.
if you have time series data or your data is always incremental with some
ID then you can do incremental count with row_timestamp filters or ID filter



bq. however the result could be non-deterministic if HBase has just been
restarted..

       Results are expected to be deterministic in normal scenarios. can
you elaborate what is the difference you see after HBase restarted?

bq. SELECT SUM(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE PHYSICAL_NAME
= 'MYTABLE';

        We calculate row count till the guidePosts is found in the region
and no count will be stored for a region having a size not enough for
guidepost width or remaining region after the last guidePosts. so this
row_count should not be used against actual count.


On Wed, Sep 13, 2017 at 4:04 PM, Flavio Pompermaier <pomperma...@okkam.it>
wrote:

> Hi to all,
> I'm trying to investigate the best option to have get the row count out of
> a table.
>
> I've tried the following:
>
>
>    1. SELECT COUNT(*) FROM MYTABLE
>       1. very slow without an index, very quick with an index
>       2. however the result could be non-deterministic if HBase has just
>       been restarted..
>    2. SELECT SUM(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE
>    PHYSICAL_NAME = 'MYTABLE';
>    1. the result here is completely different from the first
>       one..323329772 vs 13376168. How is that possible?
>
> Best,
> Flavio
>

Re: Row count

Reply via email to