Hello,

 

we are storing detailed measurement values in a Hadoop/Hbase cluster.
For end-user / analysis tasks, we need to provide aggregated values
along a date dimension (aggregate by day, month, quarter, year). The
aggregates shall be stored in an Oracle database for easier data
mangling via different client types (OLAP clients ...)

 

A brute-force approach for generating the aggregates is to run a
MapReduce job in the night which process the entire Hbase table and does
the aggregation.

 

I wonder, are there any best practices on how to possibly do the
pre-aggregation thing via a MapReduce job in an incremental way? For
example, how to detect changes in HBase since the last MR-Job run etc
...

 

Thanks!

 

Regards,

Thomas

 

Reply via email to