The count would change when a major compaction is done. Back in 4.7, it may have changed when a split occurs too (but this is no longer the case). I'd recommend moving to a newer version: 4.7 was release almost two years ago and is six releases back from the current 4.13 release. FWIW, we're getting ready to release a 4.13 for CDH.
On Tue, Dec 19, 2017 at 4:27 PM, Jins George <jins.geo...@aeris.net> wrote: > Thank you James and Mujtaba for the responses. > > I am OK with an estimate count . So using SYSTEM.STATS table looks good in > my case. But when I tried this, it gave me inconsistent results. for > example. > > Time 1: > select count(*) from myschema.mytable => 3474085 > select sum(guide_posts_row_count) from system.stats where physical_name = > 'myschema.device_data' => 3348090 > > Time 2 : ( Time1 + ~10 mins) > select count(*) from myschema.mytable => 3474106 > select sum(guide_posts_row_count) from system.stats where physical_name = > 'myschema.device_data' => 3348080 > > So I was expecting the stats count to go up but surprisingly, the count > went down. Is there a specific configuration or something else that I am > missing? > > I am using phoenix 4.7( on CDH), So cannot try Table sampling feature. > > Thanks, > Jins George > > > > On 12/19/2017 03:43 PM, Mujtaba Chohan wrote: > > Another alternate outside Phoenix is to use > <http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/ > mapreduce/RowCounter.html M/R. > > On Tue, Dec 19, 2017 at 3:18 PM, James Taylor <jamestay...@apache.org> > wrote: > >> If it needs to be 100% accurate, then count(*) is the only way. If your >> data is write-once data, you might be able to track the row count at the >> application level through some kind of atomic counter in a different table >> (but this will likely be brittle). If you can live with an estimate, you >> could enable statistics [1], optionally configuring Phoenix not to use >> stats for parallelization [2], and query the SYSTEM.STATS table to get an >> estimate [3]. >> >> Another interesting alternative if you want the approximate row count >> when you have a where clause would be to use the new table sampling feature >> [4]. You'd also want stats enabled for this to be more accurate too. >> >> Thanks, >> James >> >> >> [1] https://phoenix.apache.org/update_statistics.html >> [2] phoenix.use.stats.parallelization=false >> [3] select sum(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS where >> physical_name='my_schema.my_table' >> and COLUMN_FAMILY='my_first_column_family' -- necessary only if you >> have multiple column families >> [4] https://phoenix.apache.org/tablesample.html >> >> On Tue, Dec 19, 2017 at 2:57 PM, Jins George < <jins.geo...@aeris.net> >> jins.geo...@aeris.net> wrote: >> >>> Hi, >>> >>> Is there a way to get the total row count of a phoenix table without >>> running select count(*) from table ? >>> my use case is to monitor the record count in a table every x minutes, >>> so didn't want to put load on the system by running a select count(*) query. >>> >>> Thanks, >>> Jins George >>> >> >> > >