For our solution we are doing some aggregation on the server via coprocessors. In general, for each row there are 8 columns: 7 columns that contain numbers (for summation) and 1 column that contains a hyperloglog counter (about 700bytes). Functionally, this solution works well and ought to scale with the number of region servers. However, the individual request performance leaves a little to be desired. What we've seen is that to scan 40000 rows (aggregated into 3000 rows) takes about 4 seconds.
Our code is in it's early stages (unoptimized) so we hope to see some significant performance improvements when we run our coprocessor under a profiler. Our benchmarks were on underpowered machines (only 2gb RAM) as well. Hope this helps! --Tom On Thu, May 3, 2012 at 6:08 AM, Pere Ferrera <[email protected]> wrote: > Hi, > > Is anybody benchmarking the performance of server-side aggregations through > co-processors in HBase? I am interested to know if HBase could potentially > be used to calculate real-time SQL-like aggregations at a good level of > performance (q < 200ms on high-load, big dataset scenario). Just curious to > know before I implement my own benchmarks. > > Pere.
