This is a commonly requested feature, and it remains unimplemented because it is actually quite hard. Each HFile knows how many KV entries there are in it, but this does not map in a general way to the number of rows, or the number of rows with a specific column. Keeping track of the row count as new rows are created is also not as easy as it seems - this is because a Put does not know if a row already exists or not. Making it aware of that fact would require doing a get before a put - not cheap.
-ryan On Fri, Jun 3, 2011 at 3:20 PM, Jack Levin <[email protected]> wrote: > I have a feature request: There should be a native function called > 'count', that produces count of rows based on specific family filter, > that is internal to HBASE and won't be required to read CELLs off the > disk/cache. Just count up the rows in the most efficient way > possible. I realize that family definitions are part of the cells, so > it would be nice to have an index that somehow can produce low IO/CPU > hit to hbase when doing a count (for example enabling an index like > that in table schema would be how you turn it on for a specific > family). > > Best, > > -Jack >
