Re: Incremental pre-aggregation strategy with MapReduce

Sonal Goyal Fri, 02 Sep 2011 09:07:57 -0700

Could you give some insights into the kind of measurements you are saving
and a sample aggregate?


Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Fri, Sep 2, 2011 at 9:29 PM, Doug Meil <[email protected]>wrote:

>
> What Stack says.  Plus, for other tips see...
>
> http://hbase.apache.org/book.html#mapreduce
>
> http://hbase.apache.org/book.html#schema
>
>
>
>
>
> On 9/2/11 11:15 AM, "Stack" <[email protected]> wrote:
>
> >Can you rely on versioning?  If MR job runs once a day, only aggregate
> >whats changed in last day?
> >
> >Turn off speculative execution.
> >
> >You'll need a means of dealing with MR jobs failing; i.e. throw away
> >the aggregations done by the failed job rather than have the
> >aggregations done by the failed job(s) plus the successful job
> >compounded.
> >
> >St.Ack
> >
> >On Thu, Sep 1, 2011 at 11:06 PM, Steinmaurer Thomas
> ><[email protected]> wrote:
> >> Hello,
> >>
> >>
> >>
> >> we are storing detailed measurement values in a Hadoop/Hbase cluster.
> >> For end-user / analysis tasks, we need to provide aggregated values
> >> along a date dimension (aggregate by day, month, quarter, year). The
> >> aggregates shall be stored in an Oracle database for easier data
> >> mangling via different client types (OLAP clients ...)
> >>
> >>
> >>
> >> A brute-force approach for generating the aggregates is to run a
> >> MapReduce job in the night which process the entire Hbase table and does
> >> the aggregation.
> >>
> >>
> >>
> >> I wonder, are there any best practices on how to possibly do the
> >> pre-aggregation thing via a MapReduce job in an incremental way? For
> >> example, how to detect changes in HBase since the last MR-Job run etc
> >> ...
> >>
> >>
> >>
> >> Thanks!
> >>
> >>
> >>
> >> Regards,
> >>
> >> Thomas
> >>
> >>
> >>
> >>
>
>

Re: Incremental pre-aggregation strategy with MapReduce

Reply via email to