Hi,

I invoke classes directly from java. here is the sequence

1. create seqence files from my source (wrote this part myself)

2. ToolRunner.run(new SparseVectorsFromSequenceFiles(), params.toArray(a));
create vector

3. ToolRunner.run(new Configuration(), new SplitInput(), params.toArray(a));
split to train data and holdout set

4. ToolRunner.run(new Configuration(), new TrainNaiveBayesJob(),
params.toArray(a));
train

5. ToolRunner.run(new Configuration(), new TestNaiveBayesDriver(),
params.toArray(a));
self test and then I repeat this command with holdout set


I use parameters from the classifying 20newsgroup example everythere.


Best Regards
Alexander Aristov


On 8 July 2012 01:40, Robin Anil <[email protected]> wrote:

> Can you list down command line used.
> On Jul 7, 2012 3:48 PM, "Alexander Aristov" <[email protected]>
> wrote:
>
> > People,
> >
> > I am implementing Naive Bayes classifier on my text data and get poor
> > results.
> >
> > Self-Testing on trained data gives 95% pos and 5% neg results (not bad).
> > But testing on hold out set gives 60-40% that is not good for me.
> >
> > I tried to play with vectorizer arguments but setting them randomly makes
> > results only worse. I have 7 categories and about 20-90 docs per
> category.
> >
> > What can you suggest me to do to improve results? Tried complementary NB
> > alg but it gives approximately the same results.
> >
> > I use mahout trunk version 0.8.
> >
> > Best Regards
> > Alexander Aristov
> >
>

Reply via email to