I'm on random text (tweets), which are just like blobs of text like
the newsgroups dataset.

I was stuck in the 60s as well and then tried playing with the
parameters. What worked for me to get up into the upper 70s was to set
the "-features" param higher (started at 20, moved up 200 to get 76%).

Hope that helps, playing with parameters is always an art in ML, can
be time consuming.

JP

On Thu, Dec 22, 2011 at 1:46 AM, Sreejith S <[email protected]> wrote:
> On Thu, Dec 22, 2011 at 12:04 PM, Lance Norskog <[email protected]> wrote:
>
>> The Bayes in the examples doesn't work very well in the 20 newsgroups
>> example. Something is wrong  in the data ETL, the tuning options, or
>> the Bayes implementation.
>>
>> On Wed, Dec 21, 2011 at 10:18 PM, Ted Dunning <[email protected]>
>> wrote:
>> > 97% is not correct.  This sounds like you ran it on the training data.
>>
>
> @Ted , yes i ran it on the same training data.
>
>
>> >
>> > 63% also sounds low.  I don't know what happened there.
>>
>
> Is any one tested same 20newsgrop with SGD and got better results ?
>
>> >
>> > On Wed, Dec 21, 2011 at 9:26 PM, Sreejith S <[email protected]>
>> wrote:
>> >
>> >> Hi all,
>> >>
>> >> I made a comparison between SGD and Bayes classifiers over 20news-bydate
>> >> dataset.
>> >> http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz
>> >>
>> >> The classifier results and confusion matrix seems a bit confused, since
>> it
>> >> is said that SGD is better for small datasets and Bayes for large
>> datasets.
>> >> Pls check my test scenario http://pastebin.com/K0cy0ayk
>> >>
>> >> It seems that even in small dataset like 20news-bydate Bayes gives 97 %
>> >> accuracy and SGD gives 63 % :(
>> >> Am i missing something?? Pls clarify.
>> >>
>> >> Thank You,
>> >> --
>> >>
>> >>
>> >> *Sreejith.S*
>> >> http://srijiths.wordpress.com/
>> >> * *http://sreejiths.emurse.com/
>> >>
>> >> tweet2sree@twitter <http://tweet2Sree>
>> >>
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>>
>
>
>
> --
>
>
> *Sreejith.S*
> http://srijiths.wordpress.com/
> * *http://sreejiths.emurse.com/
>
> tweet2sree@twitter <http://tweet2Sree>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Reply via email to