Hi all,

I am thinking about using Random Forest to do churn analysis with Hbase as 
NoSQL data store.
Currently,  we have all the user history (basically many type of event data)  
resides in S3 & Redshift (we have one table per date/per event)
Events includes startTime, endTime, and other pertinent information,..

We are thinking about converting all the event tables into one fat table(with 
other helper parameter tables) with one row per user using Hbase.

Each row will have user id as key, with some column-family/qualifier, e.g.: 
col-family, d1,d2,……d30 (days in the system), and qualifier as different types 
of event.  Since initially we are more interested in new user retention, so 30 
days might be good to start with.

We can label record as churning away by no active activity in continuous 10 
days.

If data schema looks good, ingest data from S3 into HBase. Then do Random 
Forest to classifier new profile data.

Is this types of data a good candidate for Hbase.
Opinion is highly appreciated.


BR

Sam

Reply via email to