I'm on random text (tweets), which are just like blobs of text like the newsgroups dataset.
I was stuck in the 60s as well and then tried playing with the parameters. What worked for me to get up into the upper 70s was to set the "-features" param higher (started at 20, moved up 200 to get 76%). Hope that helps, playing with parameters is always an art in ML, can be time consuming. JP On Thu, Dec 22, 2011 at 1:46 AM, Sreejith S <[email protected]> wrote: > On Thu, Dec 22, 2011 at 12:04 PM, Lance Norskog <[email protected]> wrote: > >> The Bayes in the examples doesn't work very well in the 20 newsgroups >> example. Something is wrong in the data ETL, the tuning options, or >> the Bayes implementation. >> >> On Wed, Dec 21, 2011 at 10:18 PM, Ted Dunning <[email protected]> >> wrote: >> > 97% is not correct. This sounds like you ran it on the training data. >> > > @Ted , yes i ran it on the same training data. > > >> > >> > 63% also sounds low. I don't know what happened there. >> > > Is any one tested same 20newsgrop with SGD and got better results ? > >> > >> > On Wed, Dec 21, 2011 at 9:26 PM, Sreejith S <[email protected]> >> wrote: >> > >> >> Hi all, >> >> >> >> I made a comparison between SGD and Bayes classifiers over 20news-bydate >> >> dataset. >> >> http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz >> >> >> >> The classifier results and confusion matrix seems a bit confused, since >> it >> >> is said that SGD is better for small datasets and Bayes for large >> datasets. >> >> Pls check my test scenario http://pastebin.com/K0cy0ayk >> >> >> >> It seems that even in small dataset like 20news-bydate Bayes gives 97 % >> >> accuracy and SGD gives 63 % :( >> >> Am i missing something?? Pls clarify. >> >> >> >> Thank You, >> >> -- >> >> >> >> >> >> *Sreejith.S* >> >> http://srijiths.wordpress.com/ >> >> * *http://sreejiths.emurse.com/ >> >> >> >> tweet2sree@twitter <http://tweet2Sree> >> >> >> >> >> >> -- >> Lance Norskog >> [email protected] >> > > > > -- > > > *Sreejith.S* > http://srijiths.wordpress.com/ > * *http://sreejiths.emurse.com/ > > tweet2sree@twitter <http://tweet2Sree> -- Twitter: @jpatanooga Solution Architect @ Cloudera hadoop: http://www.cloudera.com
