Hi all,

I am exploring Mahout's SGD classifier and like some feedback because I
think I didn't properly configure things.

I created an example app that trains an SGD classifier on the 'bank
marketing' dataset from UCI:
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing

My app is at: https://github.com/frankscholten/mahout-sgd-bank-marketing

The app reads a CSV file of telephone calls, encodes the features into a
vector and tries to predict whether a customer answers yes to a business
proposal.

I do a few runs and measure accuracy but I'm I don't trust the results.
When I only use an intercept term as a feature I get around 88% accuracy
and when I add all features it drops to around 85%. Is this perhaps because
the dataset highly unbalanced? Most customers answer no. Or is the
classifier biased to predict 0 as the target code when it doesn't have any
data to go with?

Any other comments about my code or improvements I can make in the app are
welcome! :)

Cheers,

Frank

Reply via email to