Thanks for replying Vlad. That was my feeling too. We will come with a proposal pretty soon. I have a working exemple that we can make generic and integrate into HBase if we want...
JMS Le mer. 13 mars 2019 à 13:45, Vladimir Rodionov <vladrodio...@gmail.com> a écrit : > Hi, Jean-Marc > > I am mot aware about implementation of #2 in HBase. In RocksDB there is a > Merge operator which does exactly what you need. > It can be done in HBase as well with a help of a specialized coprocessor. > RocksDB Merge: > https://github.com/facebook/rocksdb/wiki/Merge-Operator > > -Vlad > > > On Wed, Mar 13, 2019 at 6:41 AM Jean-Marc Spaggiari < > jean-m...@spaggiari.org> > wrote: > > > Hi, > > > > I have a quick question regarding aggregation. > > > > First, let me explain my understanding. I see two types of aggregation. > > > > First is at the column level. Like, AVG(age) on a table. It will, on the > > server side, for each region, sum the age, and divide by the number of > > rows. Fine. > > > > Second is at the cell level. Imagine I want a counter. I do multiple puts > > for the exact same cell. At compaction time, or at read time, there will > be > > an aggregation that will return only the sum of all those cells. > > > > AggregateImplementation is an implementation of the first case. It runs > as > > a coprocessor EndPoint. > > > > Do we have an implementation of the 2nd one? There can be many different > > implementations. For counters, were we just put what ever and get an > > incremental number. For accumulator, where we put numbers and get the sum > > of all the numbers we have put. For average, where we put numbers and get > > the average of all the puts (cell will store something like "sum|count"). > > etc. I looked at the existing coprocessors and I don't see anything like > > that. Before starting to implement my own, I'm wondering if there is > > already an existing solution. > > > > Thanks, > > > > JMS > > >