Best is to do "SELECT COUNT(*) FROM MYTABLE" with index. As index table will have less data so it can be read faster. if you have time series data or your data is always incremental with some ID then you can do incremental count with row_timestamp filters or ID filter
bq. however the result could be non-deterministic if HBase has just been restarted.. Results are expected to be deterministic in normal scenarios. can you elaborate what is the difference you see after HBase restarted? bq. SELECT SUM(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE PHYSICAL_NAME = 'MYTABLE'; We calculate row count till the guidePosts is found in the region and no count will be stored for a region having a size not enough for guidepost width or remaining region after the last guidePosts. so this row_count should not be used against actual count. On Wed, Sep 13, 2017 at 4:04 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi to all, > I'm trying to investigate the best option to have get the row count out of > a table. > > I've tried the following: > > > 1. SELECT COUNT(*) FROM MYTABLE > 1. very slow without an index, very quick with an index > 2. however the result could be non-deterministic if HBase has just > been restarted.. > 2. SELECT SUM(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME = 'MYTABLE'; > 1. the result here is completely different from the first > one..323329772 vs 13376168. How is that possible? > > Best, > Flavio >