Hi Sam, So are you saying that you will have about 30 column families? If so I don't think tit's a good idea.
JM 2013/11/13 Sam Wu <[email protected]> > Hi all, > > I am thinking about using Random Forest to do churn analysis with Hbase as > NoSQL data store. > Currently, we have all the user history (basically many type of event > data) resides in S3 & Redshift (we have one table per date/per event) > Events includes startTime, endTime, and other pertinent information,.. > > We are thinking about converting all the event tables into one fat > table(with other helper parameter tables) with one row per user using Hbase. > > Each row will have user id as key, with some column-family/qualifier, > e.g.: col-family, d1,d2,……d30 (days in the system), and qualifier as > different types of event. Since initially we are more interested in new > user retention, so 30 days might be good to start with. > > We can label record as churning away by no active activity in continuous > 10 days. > > If data schema looks good, ingest data from S3 into HBase. Then do Random > Forest to classifier new profile data. > > Is this types of data a good candidate for Hbase. > Opinion is highly appreciated. > > > BR > > Sam
