Agree with Azury Ted : He mentions some thing different than HBASE-5982. If the count of the rows maintained in another meta table, then getting the rows count from that will be much faster than the AggregateImplementation getRowNum I think.
Specific to the use case some one can make this using the CP. But a generic implementation might be difficult. How we can handle the versioning. When a new version comes for an existing row, we should not increment this. Also to handle the TTLs.. -Anoop- ________________________________________ From: Azury [[email protected]] Sent: Wednesday, December 12, 2012 9:40 AM To: [email protected] Subject: Re:Re: Counter and Coprocessor Musing Hi Ted, I think he want to table 'meta data', not similar to Coprocessor. such as long rows = table.rows(); just probably, not sure about that. At 2012-12-12 01:11:49,"Ted Yu" <[email protected]> wrote: >Thanks for sharing your thoughts. > >Which HBase version are you currently using ? >Have you looked at AggregateImplementation which is included in hbase jar ? >A count operation (getRowNum) is in AggregateImplementation. > >It would be nice if you can tell us how much difference (in terms of >response time) this aggregation lags your expectation. > >Also take a look at HBASE-5982 HBase Coprocessor Local Aggregation > >Cheers > >On Tue, Dec 11, 2012 at 6:50 AM, nicolas maillard < >[email protected]> wrote: > >> Hi everyone >> >> While working with hbase and looking at what the tables and meta look like >> I >> hava >> thought of a couple things, maybe someone has insights. >> My thoughts are around the count situation it is a current database >> process to >> count entries for a given query. >> for example as a first check to see if everything is written or sometimes >> to get >> a >> feel of a population. >> I was wondering 2 things: >> - Should'nt Hbase keep in the metrics for a table it's total entry count? >> this would not take too much space and often comes in handy. Granted with a >> coprocessor you could easily create a table with counters for all the other >> tables in the system but it would be a nice have as a standard. >> >> - I was also wondering maybe every region could know the number of entries >> it >> contains. Every region already knows the start and endkey of it's entries. >> For a >> count on a given scan this would speed up the count. Every region who's >> start >> and >> and endkey are in the scan would just send back it's population count and >> only a >> region that is wider then the count would need to be scanned and counted. >> >> Wondering if these thoughts are already implemented and if I'm missing >> something >> or would not be a good idea. Altenratly if this is a not a definite No for >> some >> reason could coprocessors allow to implement these thoughts. Can I with a >> coprocessor write in the metrics part, or on a given scan first check if, >> for a >> region smaller than my scan, I already have written somewhere the count >> instead >> of >> scanning and couning. >> >> Thnaks for any thoughts you may have >> >>
