If you don't know where to start, I would recommend starting with something more conventional than HMM that can be tricky to fully understand and explain. A logistic regression model can perform very well if predictors are built with care. I wouldn't start also with mahout unless this is a requirement from a client (some clients are so thrilled about "big data" that they want to use mahout even if it's overkill for most predictive analytics tasks...), You will probably not need more than 100k-200k records to build a pretty good model, an undersampling scheme can also be good for the model (not necessary, but it won't hurt) and lead you that sample size anyway.
If you need to go for mahout, there is an SGD implementation for logistic regression in mahout. The key point for building a good churn model though is in how you build predictor variables, then any binary classification model would do the trick. 2013/7/24 <[email protected]> > I've not used Mahout to do it, but in the past colleagues have used HMM to > create a way for discovering customers who are in an "about to churn" > state, this was used to populate a target list for winback intervention > (they're about to curn, contact them and offer something - or just help - > to keep them). I tried the Mahout HMM earlier in the year, but got > discouraged by some odd behaviour which I have still not managed to delve > into. > > The problem that we saw with churn analysis for our domain was that most > churners leave with no event on their account in the recent past. > Essentially there are external factors that are generating churn over the > whole population (competitor offers, demographics, economics) which mean > that the domain model is not accessible from the data. So, while a much > better than "random" predictor can be built it only barely costed in to > operate, and is sufficiently far from a conclusive knockdown winner to > allow homebrew.spreadsheet.witchcraft alternatives to pop up and be given > air time by people not familiar with the idea that if you flip 1000 coins > in the air at once some of them are going to keep coming up as heads for a > bit. One way round this is "more data, better data" which is kinda where I > came in on for Mahout and HMM's. > > So, my suggestion would be : > > - look at your data; do your churners have events in an actionable period > (this depends on your domain) that could be the basis of a signal? If there > are enough of them in this category to power a business case based on > intervention and win back you're on... if not then more data, better data > is needed.. > - if there are strong correlations between the last event and the churn? > Then use a decision tree or similar to classify churn prospects from > stables - if you get a good predictor no need to do more, if not then.. > - try a HMM, it could help you find groups of sequences of action that > lead to churning (repeated contacts, escalations, resorting to letter > writing etc.) But check that Mahouts one is sound and works for you (I am > not confident that I did enough work to say that my problems weren't a case > of "problem between screen and chair" so if you get things working then > superduper!) > > Hope that helps you, > > Simon > > > > ________________________________________ > From: Sayed Seliman [[email protected]] > Sent: 24 July 2013 21:37 > To: [email protected] > Subject: churn analysis > > Hi, > > > > what are your experiences in building churn analysis system with mahout ? > > What do you suggest to implement ? > > Any success story implementing churn analysis system with mahout ? > > > > thanks >
