I think also that essentially all of the power that you are going to get from an HMM model can be captured by other means such as a sparse event-sequence feature. Sparse logistic regression or a large scale random forest modeling system can work wonders on these features if they have good means for regularization (L_1 or Lasso or Elastic band for logistic regression, native characteristics for random forest).
On Wed, Jul 24, 2013 at 10:19 PM, Fernando Fernández < [email protected]> wrote: > If you don't know where to start, I would recommend starting with something > more conventional than HMM that can be tricky to fully understand and > explain. A logistic regression model can perform very well if predictors > are built with care. I wouldn't start also with mahout unless this is a > requirement from a client (some clients are so thrilled about "big data" > that they want to use mahout even if it's overkill for most predictive > analytics tasks...), You will probably not need more than 100k-200k records > to build a pretty good model, an undersampling scheme can also be good for > the model (not necessary, but it won't hurt) and lead you that sample size > anyway. > > If you need to go for mahout, there is an SGD implementation for logistic > regression in mahout. > > The key point for building a good churn model though is in how you build > predictor variables, then any binary classification model would do the > trick. > > > 2013/7/24 <[email protected]> > > > I've not used Mahout to do it, but in the past colleagues have used HMM > to > > create a way for discovering customers who are in an "about to churn" > > state, this was used to populate a target list for winback intervention > > (they're about to curn, contact them and offer something - or just help - > > to keep them). I tried the Mahout HMM earlier in the year, but got > > discouraged by some odd behaviour which I have still not managed to delve > > into. > > > > The problem that we saw with churn analysis for our domain was that most > > churners leave with no event on their account in the recent past. > > Essentially there are external factors that are generating churn over the > > whole population (competitor offers, demographics, economics) which mean > > that the domain model is not accessible from the data. So, while a much > > better than "random" predictor can be built it only barely costed in to > > operate, and is sufficiently far from a conclusive knockdown winner to > > allow homebrew.spreadsheet.witchcraft alternatives to pop up and be given > > air time by people not familiar with the idea that if you flip 1000 coins > > in the air at once some of them are going to keep coming up as heads for > a > > bit. One way round this is "more data, better data" which is kinda where > I > > came in on for Mahout and HMM's. > > > > So, my suggestion would be : > > > > - look at your data; do your churners have events in an actionable period > > (this depends on your domain) that could be the basis of a signal? If > there > > are enough of them in this category to power a business case based on > > intervention and win back you're on... if not then more data, better data > > is needed.. > > - if there are strong correlations between the last event and the churn? > > Then use a decision tree or similar to classify churn prospects from > > stables - if you get a good predictor no need to do more, if not then.. > > - try a HMM, it could help you find groups of sequences of action that > > lead to churning (repeated contacts, escalations, resorting to letter > > writing etc.) But check that Mahouts one is sound and works for you (I am > > not confident that I did enough work to say that my problems weren't a > case > > of "problem between screen and chair" so if you get things working then > > superduper!) > > > > Hope that helps you, > > > > Simon > > > > > > > > ________________________________________ > > From: Sayed Seliman [[email protected]] > > Sent: 24 July 2013 21:37 > > To: [email protected] > > Subject: churn analysis > > > > Hi, > > > > > > > > what are your experiences in building churn analysis system with mahout ? > > > > What do you suggest to implement ? > > > > Any success story implementing churn analysis system with mahout ? > > > > > > > > thanks > > >
