None of this changes the fact that you have tiny data.

On Sat, Jul 7, 2012 at 10:54 PM, Alexander Aristov <
[email protected]> wrote:

> Hi,
>
> I invoke classes directly from java. here is the sequence
>
> 1. create seqence files from my source (wrote this part myself)
>
> 2. ToolRunner.run(new SparseVectorsFromSequenceFiles(), params.toArray(a));
> create vector
>
> 3. ToolRunner.run(new Configuration(), new SplitInput(),
> params.toArray(a));
> split to train data and holdout set
>
> 4. ToolRunner.run(new Configuration(), new TrainNaiveBayesJob(),
> params.toArray(a));
> train
>
> 5. ToolRunner.run(new Configuration(), new TestNaiveBayesDriver(),
> params.toArray(a));
> self test and then I repeat this command with holdout set
>
>
> I use parameters from the classifying 20newsgroup example everythere.
>
>
> Best Regards
> Alexander Aristov
>
>
> On 8 July 2012 01:40, Robin Anil <[email protected]> wrote:
>
> > Can you list down command line used.
> > On Jul 7, 2012 3:48 PM, "Alexander Aristov" <[email protected]
> >
> > wrote:
> >
> > > People,
> > >
> > > I am implementing Naive Bayes classifier on my text data and get poor
> > > results.
> > >
> > > Self-Testing on trained data gives 95% pos and 5% neg results (not
> bad).
> > > But testing on hold out set gives 60-40% that is not good for me.
> > >
> > > I tried to play with vectorizer arguments but setting them randomly
> makes
> > > results only worse. I have 7 categories and about 20-90 docs per
> > category.
> > >
> > > What can you suggest me to do to improve results? Tried complementary
> NB
> > > alg but it gives approximately the same results.
> > >
> > > I use mahout trunk version 0.8.
> > >
> > > Best Regards
> > > Alexander Aristov
> > >
> >
>

Reply via email to