Hi David, Have a look at Coprocessors which can enable you run custom code(Observers) on get/put/delete actions on Table. You can easily implement the counters with the help of that. Here is the description for coprocessors: https://blogs.apache.org/hbase/entry/coprocessor_introduction
HTH, Anil On Wed, May 30, 2012 at 12:17 AM, David Koch <[email protected]> wrote: > Hello, > > I am testing HBase for distinct counters - more concretely, counting > unique users from a fairly large stream of user_ids. For some time to > come the volume will be limited enough to use exact counting rather > than approximation but already it's too big to hold the entire set of > user_ids in memory. > > For now I am basically inserting all elements from the stream into a > "user" table which has row key "user_id" as to enforce the unique > constraint. > > My question: > a) Is there a way to get a quick (i.e with small delay in a user > interface) count of the size of the user table to return the number of > users? Alternatively, is there a way to trigger an increment in > another table (say "count") whenever a row was added to "user"? I > guess this can be picked up eventually by the client application but I > don't want this to delay the actual stream processing. > b) I heard about Bloom filters in HBase but failed to understand if > they are used for row keys as well. Are they? How do I activate it? I > was looking to reduce the work-load of checking set membership for > every user_id in the stream. If this is done by HBase internally even > better. > c) Eventually, I want to store distinct users by day and then do > unions on different days to get the total amount of unique users for a > multi-day period. Is this likely to involve a Map Reduce or is there a > more "light-weight" approach? > > Thank you, > > /David > -- Thanks & Regards, Anil Gupta
