Could you give some insights into the kind of measurements you are saving and a sample aggregate?
Best Regards, Sonal Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Fri, Sep 2, 2011 at 9:29 PM, Doug Meil <[email protected]>wrote: > > What Stack says. Plus, for other tips see... > > http://hbase.apache.org/book.html#mapreduce > > http://hbase.apache.org/book.html#schema > > > > > > On 9/2/11 11:15 AM, "Stack" <[email protected]> wrote: > > >Can you rely on versioning? If MR job runs once a day, only aggregate > >whats changed in last day? > > > >Turn off speculative execution. > > > >You'll need a means of dealing with MR jobs failing; i.e. throw away > >the aggregations done by the failed job rather than have the > >aggregations done by the failed job(s) plus the successful job > >compounded. > > > >St.Ack > > > >On Thu, Sep 1, 2011 at 11:06 PM, Steinmaurer Thomas > ><[email protected]> wrote: > >> Hello, > >> > >> > >> > >> we are storing detailed measurement values in a Hadoop/Hbase cluster. > >> For end-user / analysis tasks, we need to provide aggregated values > >> along a date dimension (aggregate by day, month, quarter, year). The > >> aggregates shall be stored in an Oracle database for easier data > >> mangling via different client types (OLAP clients ...) > >> > >> > >> > >> A brute-force approach for generating the aggregates is to run a > >> MapReduce job in the night which process the entire Hbase table and does > >> the aggregation. > >> > >> > >> > >> I wonder, are there any best practices on how to possibly do the > >> pre-aggregation thing via a MapReduce job in an incremental way? For > >> example, how to detect changes in HBase since the last MR-Job run etc > >> ... > >> > >> > >> > >> Thanks! > >> > >> > >> > >> Regards, > >> > >> Thomas > >> > >> > >> > >> > >
