The Bayes in the examples doesn't work very well in the 20 newsgroups example. Something is wrong in the data ETL, the tuning options, or the Bayes implementation.
On Wed, Dec 21, 2011 at 10:18 PM, Ted Dunning <[email protected]> wrote: > 97% is not correct. This sounds like you ran it on the training data. > > 63% also sounds low. I don't know what happened there. > > On Wed, Dec 21, 2011 at 9:26 PM, Sreejith S <[email protected]> wrote: > >> Hi all, >> >> I made a comparison between SGD and Bayes classifiers over 20news-bydate >> dataset. >> http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz >> >> The classifier results and confusion matrix seems a bit confused, since it >> is said that SGD is better for small datasets and Bayes for large datasets. >> Pls check my test scenario http://pastebin.com/K0cy0ayk >> >> It seems that even in small dataset like 20news-bydate Bayes gives 97 % >> accuracy and SGD gives 63 % :( >> Am i missing something?? Pls clarify. >> >> Thank You, >> -- >> >> >> *Sreejith.S* >> http://srijiths.wordpress.com/ >> * *http://sreejiths.emurse.com/ >> >> tweet2sree@twitter <http://tweet2Sree> >> -- Lance Norskog [email protected]
