Hi Sam,

So are you saying that you will have about 30 column families? If so I
don't think tit's a good idea.

JM


2013/11/13 Sam Wu <[email protected]>

> Hi all,
>
> I am thinking about using Random Forest to do churn analysis with Hbase as
> NoSQL data store.
> Currently,  we have all the user history (basically many type of event
> data)  resides in S3 & Redshift (we have one table per date/per event)
> Events includes startTime, endTime, and other pertinent information,..
>
> We are thinking about converting all the event tables into one fat
> table(with other helper parameter tables) with one row per user using Hbase.
>
> Each row will have user id as key, with some column-family/qualifier,
> e.g.: col-family, d1,d2,……d30 (days in the system), and qualifier as
> different types of event.  Since initially we are more interested in new
> user retention, so 30 days might be good to start with.
>
> We can label record as churning away by no active activity in continuous
> 10 days.
>
> If data schema looks good, ingest data from S3 into HBase. Then do Random
> Forest to classifier new profile data.
>
> Is this types of data a good candidate for Hbase.
> Opinion is highly appreciated.
>
>
> BR
>
> Sam

Reply via email to