IMO You will never get the same flexibility. There are also numerous differences in data modelling approach (TTL, uniformly-distributed ids requirement to scale query volume, etc.)
The most flexibility in that regard we reached so far w.r.t. aggregation queries is OLAPish model (see link on HBase wiki, supported projects, HBase-Lattice). This is for aggregating really high qps RT fact streams and the list of current limitations is huge but it serves our purpose so far. Most obvious benefits are that queries are fast (because of precomputed cuboids in a lattice, similar to cuboid lattice approach in ROLAP), short incremental compilation cycle (one can grow and update the cube in just a few minutes after the fact got fed into system), and one can scale compilation horizontally for high volume fact feeds. There's a fairly limited query language and a basic set of aggregate functions (along with some weighted time series aggregates as well). Most severe limitation right now is lack of commonly used multidimensional query dialect such as MDX which prevents use of the widely used UI pivoting exploratory clients such as excel or JPivot or Tableau etc. So it is either custom UI integration or custom data source providers for canned reports with tools like pentaho and jasper, or some RT decisioning framework that doesn't require any UI at all and can use java API. I also plan to enable R to run queries against it (cause i personally don't beleive in doing ml or analytics using Excel). -d On Wed, Jan 11, 2012 at 10:59 AM, kfarmer <[email protected]> wrote: > > I'm taking a look at moving our datastore from Oracle to HBase, and trying to > understand how HBase could be used for ad-hoc aggregation queries across our > data. > > My understanding is MapReduce is more of a batch framework, so if we want a > query to come back to the user's request in a few seconds, that won't work > because of the overheard of running MR and because the MR jobs write back to > a new table. Is that correct? > > Instead should we be pre-aggregating data as we load into separate tables, > and then when a user queries instead just do a scan on these pre-aggregated > tables? > > Thanks. > -- > View this message in context: > http://old.nabble.com/HBase-for-ad-hoc-aggregate-queries-tp33123313p33123313.html > Sent from the HBase User mailing list archive at Nabble.com. >
