Re: Incremental pre-aggregation strategy with MapReduce

Doug Meil Fri, 02 Sep 2011 09:00:11 -0700

What Stack says.  Plus, for other tips see...

http://hbase.apache.org/book.html#mapreduce


http://hbase.apache.org/book.html#schema





On 9/2/11 11:15 AM, "Stack" <[email protected]> wrote:

>Can you rely on versioning?  If MR job runs once a day, only aggregate
>whats changed in last day?
>
>Turn off speculative execution.
>
>You'll need a means of dealing with MR jobs failing; i.e. throw away
>the aggregations done by the failed job rather than have the
>aggregations done by the failed job(s) plus the successful job
>compounded.
>
>St.Ack
>
>On Thu, Sep 1, 2011 at 11:06 PM, Steinmaurer Thomas
><[email protected]> wrote:
>> Hello,
>>
>>
>>
>> we are storing detailed measurement values in a Hadoop/Hbase cluster.
>> For end-user / analysis tasks, we need to provide aggregated values
>> along a date dimension (aggregate by day, month, quarter, year). The
>> aggregates shall be stored in an Oracle database for easier data
>> mangling via different client types (OLAP clients ...)
>>
>>
>>
>> A brute-force approach for generating the aggregates is to run a
>> MapReduce job in the night which process the entire Hbase table and does
>> the aggregation.
>>
>>
>>
>> I wonder, are there any best practices on how to possibly do the
>> pre-aggregation thing via a MapReduce job in an incremental way? For
>> example, how to detect changes in HBase since the last MR-Job run etc
>> ...
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Regards,
>>
>> Thomas
>>
>>
>>
>>

Re: Incremental pre-aggregation strategy with MapReduce

Reply via email to