This is a commonly requested feature, and it remains unimplemented
because it is actually quite hard.  Each HFile knows how many KV
entries there are in it, but this does not map in a general way to the
number of rows, or the number of rows with a specific column. Keeping
track of the row count as new rows are created is also not as easy as
it seems - this is because a Put does not know if a row already exists
or not.  Making it aware of that fact would require doing a get before
a put - not cheap.

-ryan

On Fri, Jun 3, 2011 at 3:20 PM, Jack Levin <[email protected]> wrote:
> I have a feature request:  There should be a native function called
> 'count', that produces count of rows based on specific family filter,
> that is internal to HBASE and won't be required to read CELLs off the
> disk/cache.  Just count up the rows in the most efficient way
> possible.  I realize that family definitions are part of the cells, so
> it would be nice to have an index that somehow can produce low IO/CPU
> hit to hbase when doing a count (for example enabling an index like
> that in table schema would be how you turn it on for a specific
> family).
>
> Best,
>
> -Jack
>

Reply via email to