Re: HBase for ad-hoc aggregate queries

Dmitriy Lyubimov Wed, 11 Jan 2012 11:52:45 -0800

Bottom line, imo you have to consider how your data is organized. for
90% of relational schema (but perhaps 10% of volume) the move to hbase
based solutions is not warranted.


However, for 10% of the schema (and 90% of the volume) you may
consider using HBase-based solutions. Most typically time series data
feeds.

-d

On Wed, Jan 11, 2012 at 11:48 AM, Dmitriy Lyubimov <[email protected]> wrote:
> IMO You will never get the same flexibility. There are also numerous
> differences in data modelling approach (TTL, uniformly-distributed ids
> requirement to scale query volume, etc.)
>
> The most flexibility in that regard we reached so far w.r.t.
> aggregation queries is OLAPish model (see link on HBase wiki,
> supported projects, HBase-Lattice).
>
> This is for aggregating really high qps  RT fact streams and the list
> of current limitations is huge but it serves our purpose so far.
>
> Most obvious benefits are that queries are fast (because of
> precomputed cuboids in a lattice, similar to cuboid lattice approach
> in ROLAP), short incremental compilation cycle (one can grow and
> update the cube in just a few minutes after the fact got fed into
> system), and one can scale compilation horizontally for high volume
> fact feeds. There's a fairly limited query language and a basic set of
> aggregate functions (along with some weighted time series aggregates
> as well).
>
> Most severe limitation right now is lack of commonly used
> multidimensional query dialect such as MDX which prevents use of the
> widely used UI pivoting exploratory clients such as excel or JPivot or
> Tableau etc. So it is either custom UI integration or custom data
> source providers for canned reports with tools like pentaho and
> jasper, or some RT decisioning framework that doesn't require any UI
> at all and can use java API. I also plan to enable R to run queries
> against it (cause i personally don't beleive in doing ml or analytics
> using Excel).
>
> -d
>
> On Wed, Jan 11, 2012 at 10:59 AM, kfarmer <[email protected]> wrote:
>>
>> I'm taking a look at moving our datastore from Oracle to HBase, and trying to
>> understand how HBase could be used for ad-hoc aggregation queries across our
>> data.
>>
>> My understanding is MapReduce is more of a batch framework, so if we want a
>> query to come back to the user's request in a few seconds, that won't work
>> because of the overheard of running MR and because the MR jobs write back to
>> a new table.  Is that correct?
>>
>> Instead should we be pre-aggregating data as we load into separate tables,
>> and then when a user queries instead just do a scan on these pre-aggregated
>> tables?
>>
>> Thanks.
>> --
>> View this message in context: 
>> http://old.nabble.com/HBase-for-ad-hoc-aggregate-queries-tp33123313p33123313.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>

Re: HBase for ad-hoc aggregate queries

Reply via email to