https://github.com/dlyubimov/HBase-Lattice
On Wed, Dec 21, 2011 at 12:13 AM, Dmitriy Lyubimov <[email protected]> wrote: > Thomas, > > Sorry for shameless self-promotion. Can you look at our hbase-lattice > project? it is incremental OLAP-ish cube compilation with custom > filtering to optimize for composite key scans. Some rudimental query > language as well. > > Bunch of standard (and not so standard) aggregates for measure data > and ability to relatively easily add user aggregate thru model > definiton. > > Very early stage. But see if it could fit your purpose, maybe even > share some perspectives since i am honestly not an expert on > dimensional data representation. > > (I guess i need to add some query shell so people can try it out more > easily.. ) > > On Mon, Nov 28, 2011 at 1:55 AM, Steinmaurer Thomas > <[email protected]> wrote: >> Hello, >> >> >> >> this has been already discussed a bit in the past, but I'm trying to >> refresh this thread as this is an important design issue in our HBase >> evaluation. >> >> >> >> Basically, the result of our evaluation was that we gonna be happy with >> what Hadoop/HBase offers for managing our measurement/sensor data. >> Although one crucial thing for e.g. backend analysis tasks is, we need >> access to aggregated data very quickly. The idea is to run a MapReduce >> job and store the dialy aggregates in a RDBMS, which allows us to access >> aggregated data more easily via different tools (BI frontends etc.). >> Monthly and yearly aggregates are then handled with RDBMS concepts like >> Materialized Views and Partitioning. >> >> >> >> While it is an option processing the entire HBase table e.g. every night >> when we go live, it probably isn't an option when data volume grows over >> the years. So, what options are there for some kind of incremental >> aggregating only new data? >> >> >> >> - Perhaps using versioning (internal timestamp) might be an option? >> >> - Perhaps having some kind of HBase (daily) staging table which is >> truncated after aggregating data is an option? >> >> - How could Co-processors help here (at the time of the Go-Live, they >> might be available in e.g. Cloudera)? >> >> >> >> etc. >> >> >> >> Any ideas/comments are appreciated. >> >> >> >> Thanks, >> >> Thomas >> >> >>
