I am reading a blog related to HBase and its application in OLAP. http://www.jroller.com/otis/entry/hbase_vs_rdbms_star_schema In that blog, Jean-Daniel mentioned that "If you can afford to denormalize your data by putting the dimension table data into the same table as the fact table, then you can get very good read efficiency. For each dimension, you would have a column family." Can someone give me more details about this comment?
I understand Zohmg did some work in this area, but when I read the thesis related to this project ( http://github.com/zohmg/zohmg/raw/master/doc/report/msc-report.pdf), it does not seem to use the above approach that Jean-Daniel suggested (page 32 -- Storage/Data Model describes how Zohmg stores data). Actually, I am not sure if Zohmg's approach can even scale for a super large dataset with lots of dimensions -- the storage space will blow. Can someone give me some detailed explanation of both of the above approaches to achieve star schema implementation? Let's say we are trying to model the following problem: "(date, store_name, product_name, buyer_age) ---> (number of sale, total number sold)" In other words, we want to build an OLAP cube from the above 4 dimensions: date, the name of store, the product name, the buyer's age (they point out to the dimension tables in the Star Schema world) Thanks, Sean
