Re: Incremental pre-aggregation strategy with MapReduce

Stack Fri, 02 Sep 2011 08:16:01 -0700

Can you rely on versioning?  If MR job runs once a day, only aggregate
whats changed in last day?


Turn off speculative execution.

You'll need a means of dealing with MR jobs failing; i.e. throw away
the aggregations done by the failed job rather than have the
aggregations done by the failed job(s) plus the successful job
compounded.

St.Ack

On Thu, Sep 1, 2011 at 11:06 PM, Steinmaurer Thomas
<[email protected]> wrote:
> Hello,
>
>
>
> we are storing detailed measurement values in a Hadoop/Hbase cluster.
> For end-user / analysis tasks, we need to provide aggregated values
> along a date dimension (aggregate by day, month, quarter, year). The
> aggregates shall be stored in an Oracle database for easier data
> mangling via different client types (OLAP clients ...)
>
>
>
> A brute-force approach for generating the aggregates is to run a
> MapReduce job in the night which process the entire Hbase table and does
> the aggregation.
>
>
>
> I wonder, are there any best practices on how to possibly do the
> pre-aggregation thing via a MapReduce job in an incremental way? For
> example, how to detect changes in HBase since the last MR-Job run etc
> ...
>
>
>
> Thanks!
>
>
>
> Regards,
>
> Thomas
>
>
>
>

Re: Incremental pre-aggregation strategy with MapReduce

Reply via email to