Incremental pre-aggregation strategy with MapReduce

Steinmaurer Thomas Thu, 01 Sep 2011 23:07:34 -0700

Hello,


we are storing detailed measurement values in a Hadoop/Hbase cluster.
For end-user / analysis tasks, we need to provide aggregated values
along a date dimension (aggregate by day, month, quarter, year). The
aggregates shall be stored in an Oracle database for easier data
mangling via different client types (OLAP clients ...)

 

A brute-force approach for generating the aggregates is to run a
MapReduce job in the night which process the entire Hbase table and does
the aggregation.

 

I wonder, are there any best practices on how to possibly do the
pre-aggregation thing via a MapReduce job in an incremental way? For
example, how to detect changes in HBase since the last MR-Job run etc
...

 

Thanks!

 

Regards,

Thomas

Incremental pre-aggregation strategy with MapReduce

Reply via email to