Bottom line, imo you have to consider how your data is organized. for 90% of relational schema (but perhaps 10% of volume) the move to hbase based solutions is not warranted.
However, for 10% of the schema (and 90% of the volume) you may consider using HBase-based solutions. Most typically time series data feeds. -d On Wed, Jan 11, 2012 at 11:48 AM, Dmitriy Lyubimov <[email protected]> wrote: > IMO You will never get the same flexibility. There are also numerous > differences in data modelling approach (TTL, uniformly-distributed ids > requirement to scale query volume, etc.) > > The most flexibility in that regard we reached so far w.r.t. > aggregation queries is OLAPish model (see link on HBase wiki, > supported projects, HBase-Lattice). > > This is for aggregating really high qps RT fact streams and the list > of current limitations is huge but it serves our purpose so far. > > Most obvious benefits are that queries are fast (because of > precomputed cuboids in a lattice, similar to cuboid lattice approach > in ROLAP), short incremental compilation cycle (one can grow and > update the cube in just a few minutes after the fact got fed into > system), and one can scale compilation horizontally for high volume > fact feeds. There's a fairly limited query language and a basic set of > aggregate functions (along with some weighted time series aggregates > as well). > > Most severe limitation right now is lack of commonly used > multidimensional query dialect such as MDX which prevents use of the > widely used UI pivoting exploratory clients such as excel or JPivot or > Tableau etc. So it is either custom UI integration or custom data > source providers for canned reports with tools like pentaho and > jasper, or some RT decisioning framework that doesn't require any UI > at all and can use java API. I also plan to enable R to run queries > against it (cause i personally don't beleive in doing ml or analytics > using Excel). > > -d > > On Wed, Jan 11, 2012 at 10:59 AM, kfarmer <[email protected]> wrote: >> >> I'm taking a look at moving our datastore from Oracle to HBase, and trying to >> understand how HBase could be used for ad-hoc aggregation queries across our >> data. >> >> My understanding is MapReduce is more of a batch framework, so if we want a >> query to come back to the user's request in a few seconds, that won't work >> because of the overheard of running MR and because the MR jobs write back to >> a new table. Is that correct? >> >> Instead should we be pre-aggregating data as we load into separate tables, >> and then when a user queries instead just do a scan on these pre-aggregated >> tables? >> >> Thanks. >> -- >> View this message in context: >> http://old.nabble.com/HBase-for-ad-hoc-aggregate-queries-tp33123313p33123313.html >> Sent from the HBase User mailing list archive at Nabble.com. >>
