None of this changes the fact that you have tiny data. On Sat, Jul 7, 2012 at 10:54 PM, Alexander Aristov < [email protected]> wrote:
> Hi, > > I invoke classes directly from java. here is the sequence > > 1. create seqence files from my source (wrote this part myself) > > 2. ToolRunner.run(new SparseVectorsFromSequenceFiles(), params.toArray(a)); > create vector > > 3. ToolRunner.run(new Configuration(), new SplitInput(), > params.toArray(a)); > split to train data and holdout set > > 4. ToolRunner.run(new Configuration(), new TrainNaiveBayesJob(), > params.toArray(a)); > train > > 5. ToolRunner.run(new Configuration(), new TestNaiveBayesDriver(), > params.toArray(a)); > self test and then I repeat this command with holdout set > > > I use parameters from the classifying 20newsgroup example everythere. > > > Best Regards > Alexander Aristov > > > On 8 July 2012 01:40, Robin Anil <[email protected]> wrote: > > > Can you list down command line used. > > On Jul 7, 2012 3:48 PM, "Alexander Aristov" <[email protected] > > > > wrote: > > > > > People, > > > > > > I am implementing Naive Bayes classifier on my text data and get poor > > > results. > > > > > > Self-Testing on trained data gives 95% pos and 5% neg results (not > bad). > > > But testing on hold out set gives 60-40% that is not good for me. > > > > > > I tried to play with vectorizer arguments but setting them randomly > makes > > > results only worse. I have 7 categories and about 20-90 docs per > > category. > > > > > > What can you suggest me to do to improve results? Tried complementary > NB > > > alg but it gives approximately the same results. > > > > > > I use mahout trunk version 0.8. > > > > > > Best Regards > > > Alexander Aristov > > > > > >
