Re: Efficient way to get the row count of a table

Mujtaba Chohan Tue, 19 Dec 2017 15:43:28 -0800

Another alternate outside Phoenix is to use
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html
M/R.


On Tue, Dec 19, 2017 at 3:18 PM, James Taylor <jamestay...@apache.org>
wrote:

> If it needs to be 100% accurate, then count(*) is the only way. If your
> data is write-once data, you might be able to track the row count at the
> application level through some kind of atomic counter in a different table
> (but this will likely be brittle). If you can live with an estimate, you
> could enable statistics [1], optionally configuring Phoenix not to use
> stats for parallelization [2], and query the SYSTEM.STATS table to get an
> estimate [3].
>
> Another interesting alternative if you want the approximate row count when
> you have a where clause would be to use the new table sampling feature [4].
> You'd also want stats enabled for this to be more accurate too.
>
> Thanks,
> James
>
>
> [1] https://phoenix.apache.org/update_statistics.html
> [2] phoenix.use.stats.parallelization=false
> [3] select sum(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS where
> physical_name='my_schema.my_table'
>      and COLUMN_FAMILY='my_first_column_family' -- necessary only if you
> have multiple column families
> [4] https://phoenix.apache.org/tablesample.html
>
> On Tue, Dec 19, 2017 at 2:57 PM, Jins George <jins.geo...@aeris.net>
> wrote:
>
>> Hi,
>>
>> Is there a way to get the total row count of a phoenix table without
>> running select count(*) from table ?
>> my use case is to monitor the record count in a table every x minutes, so
>> didn't want to put load on the system by running a select count(*) query.
>>
>> Thanks,
>> Jins George
>>
>
>

Re: Efficient way to get the row count of a table

Reply via email to