Re: feature request (count)

Jack Levin Fri, 03 Jun 2011 15:40:56 -0700

"Each HFile knows how many KV entries there are in it, but this does
not map in a general way to the
number of rows, or the number of rows with a specific column."


It would be nice to have an index like that;  Would solve a lot of
issues for people migrating from mysql.  I assume that without the
'count' feature, people are resorting to storing dataset elements in
other engines, which is not great, since you then end up to require a
non-hbase index to be consistent and authoritative for all of your
datasets that require counts.

-Jack


On Fri, Jun 3, 2011 at 3:24 PM, Ryan Rawson <[email protected]> wrote:
> This is a commonly requested feature, and it remains unimplemented
> because it is actually quite hard.  Each HFile knows how many KV
> entries there are in it, but this does not map in a general way to the
> number of rows, or the number of rows with a specific column. Keeping
> track of the row count as new rows are created is also not as easy as
> it seems - this is because a Put does not know if a row already exists
> or not.  Making it aware of that fact would require doing a get before
> a put - not cheap.
>
> -ryan
>
> On Fri, Jun 3, 2011 at 3:20 PM, Jack Levin <[email protected]> wrote:
>> I have a feature request:  There should be a native function called
>> 'count', that produces count of rows based on specific family filter,
>> that is internal to HBASE and won't be required to read CELLs off the
>> disk/cache.  Just count up the rows in the most efficient way
>> possible.  I realize that family definitions are part of the cells, so
>> it would be nice to have an index that somehow can produce low IO/CPU
>> hit to hbase when doing a count (for example enabling an index like
>> that in table schema would be how you turn it on for a specific
>> family).
>>
>> Best,
>>
>> -Jack
>>
>

Re: feature request (count)

Reply via email to