SGD vs Naive Bayes for classification

Loic Descotte Fri, 09 Sep 2011 08:44:35 -0700

Hello,

First mail for me on Mahout ML :)

I'm working on a classification problem and I'm trying to know whichalgorythm would be better for my needs.I've read that SGD is better than Naive Bayes for small-medium datasets. Does it mean that learning (train) data may be small or is it forsmall data sets (or both) ?Then, does "better" mean faster or does it also give more accurateresults than Naive Bayes on this size of data sets?

My goal is to make prediction on thousands of text entries, but withsmaller as possible learning datas (categories may often change so Iwill not always have hundreds of entries for training on each category).

Another question, in all exemples I've found, Naive Bayes is used toanalyze sets containing a lot keywords, and to classify them in theright category (e.g wikipedia examples :https://www.ibm.com/developerworks/java/library/j-mahout/#N10412 ).

SGD example are a little different, instead of working on wordsequences, they use many predictors values and each predictor has onlyone value for each entry.


E.G  (in mahout in action) :

 $MAHOUT_HOME/bin/mahout trainlogistic --input donut.csv \
--output ./model \
--target color --categories 2 \
*--predictors x y --types numeric \*
--features 20 --passes 100 --rate 50

In this example, for each entry the x and y predictor has only one value.

My need is more like the naive bayes wikipedia examples : I want toanalyse a text and to automatically find its cateogry. So I have onlyone predictor variable (the words of the text) and this predictorvariable is multivalued (several words).

Is it possible to use the SGD algorythm (maybe better for me because Ihave small datasets) with only text (like blog posts) entries ?

Thanks a lot for your time, tell me if I'm not clear enough in myexplainations :)


Loic

SGD vs Naive Bayes for classification

Reply via email to