THanks. This confirms my suspicions that the AdaptiveLogisticRegression has regressed somehow.
I am munching on the pig interfaces right now and should get back to this before too long. On Fri, Dec 30, 2011 at 10:18 AM, Josh Patterson <[email protected]> wrote: > I'm on random text (tweets), which are just like blobs of text like > the newsgroups dataset. > > I was stuck in the 60s as well and then tried playing with the > parameters. What worked for me to get up into the upper 70s was to set > the "-features" param higher (started at 20, moved up 200 to get 76%). > > Hope that helps, playing with parameters is always an art in ML, can > be time consuming. > > JP > > On Thu, Dec 22, 2011 at 1:46 AM, Sreejith S <[email protected]> wrote: > > On Thu, Dec 22, 2011 at 12:04 PM, Lance Norskog <[email protected]> > wrote: > > > >> The Bayes in the examples doesn't work very well in the 20 newsgroups > >> example. Something is wrong in the data ETL, the tuning options, or > >> the Bayes implementation. > >> > >> On Wed, Dec 21, 2011 at 10:18 PM, Ted Dunning <[email protected]> > >> wrote: > >> > 97% is not correct. This sounds like you ran it on the training data. > >> > > > > @Ted , yes i ran it on the same training data. > > > > > >> > > >> > 63% also sounds low. I don't know what happened there. > >> > > > > Is any one tested same 20newsgrop with SGD and got better results ? > > > >> > > >> > On Wed, Dec 21, 2011 at 9:26 PM, Sreejith S <[email protected]> > >> wrote: > >> > > >> >> Hi all, > >> >> > >> >> I made a comparison between SGD and Bayes classifiers over > 20news-bydate > >> >> dataset. > >> >> > http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz > >> >> > >> >> The classifier results and confusion matrix seems a bit confused, > since > >> it > >> >> is said that SGD is better for small datasets and Bayes for large > >> datasets. > >> >> Pls check my test scenario http://pastebin.com/K0cy0ayk > >> >> > >> >> It seems that even in small dataset like 20news-bydate Bayes gives > 97 % > >> >> accuracy and SGD gives 63 % :( > >> >> Am i missing something?? Pls clarify. > >> >> > >> >> Thank You, > >> >> -- > >> >> > >> >> > >> >> *Sreejith.S* > >> >> http://srijiths.wordpress.com/ > >> >> * *http://sreejiths.emurse.com/ > >> >> > >> >> tweet2sree@twitter <http://tweet2Sree> > >> >> > >> > >> > >> > >> -- > >> Lance Norskog > >> [email protected] > >> > > > > > > > > -- > > > > > > *Sreejith.S* > > http://srijiths.wordpress.com/ > > * *http://sreejiths.emurse.com/ > > > > tweet2sree@twitter <http://tweet2Sree> > > > > -- > Twitter: @jpatanooga > Solution Architect @ Cloudera > hadoop: http://www.cloudera.com >
