What Stack says. Plus, for other tips see... http://hbase.apache.org/book.html#mapreduce
http://hbase.apache.org/book.html#schema On 9/2/11 11:15 AM, "Stack" <[email protected]> wrote: >Can you rely on versioning? If MR job runs once a day, only aggregate >whats changed in last day? > >Turn off speculative execution. > >You'll need a means of dealing with MR jobs failing; i.e. throw away >the aggregations done by the failed job rather than have the >aggregations done by the failed job(s) plus the successful job >compounded. > >St.Ack > >On Thu, Sep 1, 2011 at 11:06 PM, Steinmaurer Thomas ><[email protected]> wrote: >> Hello, >> >> >> >> we are storing detailed measurement values in a Hadoop/Hbase cluster. >> For end-user / analysis tasks, we need to provide aggregated values >> along a date dimension (aggregate by day, month, quarter, year). The >> aggregates shall be stored in an Oracle database for easier data >> mangling via different client types (OLAP clients ...) >> >> >> >> A brute-force approach for generating the aggregates is to run a >> MapReduce job in the night which process the entire Hbase table and does >> the aggregation. >> >> >> >> I wonder, are there any best practices on how to possibly do the >> pre-aggregation thing via a MapReduce job in an incremental way? For >> example, how to detect changes in HBase since the last MR-Job run etc >> ... >> >> >> >> Thanks! >> >> >> >> Regards, >> >> Thomas >> >> >> >>
