How many training examples do you have? Sounds like you have very few. That is definitely not the sweet spot for on-linear regression.
In any case, can you post your test code to github or something? On Mon, Jul 4, 2011 at 11:46 AM, Vijay Santhanam <[email protected]>wrote: > Thank you Ted > > However, even with using the default OnlineLogisiticRegression I'm unable > to > get acceptable results when trying to replicate the gender-guesser > discussed > in the example of http://en.wikipedia.org/wiki/Naive_Bayes_classifier > > For that particular problem, do you recommend I take a > binning/discretization approach with naive bayes? Or continue trying to > fine > tune the SGD algorithm? > > At this stage, I'm just hopelessly guessing parameters > for OnlineLogisiticRegression. > Even when I reiterate over the same data set many thousands of times I'm > unable to get a suitable model that can pick a female or male from a > height,weight and shoe size. > > Thanks again for taking the time to answer me. > > -V > > > On Tue, Jul 5, 2011 at 4:30 AM, Ted Dunning <[email protected]> wrote: > > > The wikipedia page recommends binning if you have a large amount of data > > and > > a supervised variable extraction method if not. These are both ways of > > preprocessing to discretize continuous variables. > > > > On Mon, Jul 4, 2011 at 11:28 AM, Ted Dunning <[email protected]> > > wrote: > > > > > The mahout implementation of Naive_Bayes does not use continuous > > variables > > > well. The best bet is to discretize these variables either > individually > > or > > > together using k-means. Then use the discrete version for the > > classifier. > > > > > > The random forest implementation and the SGD implementation are both > > > happier with continuous variables. > > > > > > > > > On Mon, Jul 4, 2011 at 8:01 AM, Vijay Santhanam < > > [email protected] > > > > wrote: > > > > > >> Hi, > > >> > > >> I'm new to Mahout and many of the machine learning ideas, but from > what > > I > > >> understand of Naive Bayes classifier, it's possible to train a Naive > > Bayes > > >> model with continuous, categorical and word-like features from my > > >> understanding of the wikipedia entry > > >> http://en.wikipedia.org/wiki/Naive_Bayes_classifier > > >> > > >> The 20news and wikipedia examples currently in mahout from what I > gather > > >> only use a target categorical variable and a text-like variables. > > >> > > >> I'm trying to replicate the person-gender-guesser used in the > wikipedia > > >> article using mahout. > > >> > > >> Can anyone give me any tips about how to: > > >> * format input files (train and test) for different data types > > >> * inform the trainer and classifier which features are continuous, > > >> categorical and word-like > > >> > > >> My dataset is quite small, so I'd like to be able to process this in > > code > > >> (using Vectors, Models, etc), but I'm quite confused about how to use > > the > > >> classifier.bayes packages to train/create model with all my feature > > types. > > >> > > >> Thanks in advance for any guidance. > > >> > > >> Cheers, > > >> -- > > >> Vijay Santhanam > > >> Software Engineer > > >> http://au.linkedin.com/in/vijaysanthanam > > >> 0407525087 > > >> > > > > > > > > > > > > -- > Vijay Santhanam > Software Engineer > http://au.linkedin.com/in/vijaysanthanam > 0407525087 >
