Hi Tharindu, If I understand correctly seqdirectory creates labels based on the file name but this is not what you want. What do you want the labels to be?
Cheers, Frank On Tue, Mar 18, 2014 at 2:22 PM, Tharindu Rusira <[email protected]>wrote: > Hi everyone, > I'm developing an application where I need to train a Naive Bayes > classification model and use this model to classify new entities(In this > case text files based on their content) > > I observed that seqdirectory command always adds the file/directory name as > the "key" field for each document which will be used as the label in > classification jobs. > This makes sense when I need to train a model and create the labelindex > since I have organized my training data according to their labels in > separate directories. > > Now I'm trying to use this model and infer the best label for an unknown > document. > My requirement is to ask Mahout to read my new file and output the > predicted category by looking at the labelindex and the tfidf vector of the > new content. > I tried creating vectors from the new content (seqdirectory and > seq2sparse), and then using this vector to run testnb command. But > unfortunately seqdirectory commands adds file names as labels which does > not make sense in classification. > > The following error message will further demonstrate this behavior. > imput0.txt is the file name of my new document. > > [main] ERROR com.me.classifier.mahout.MahoutClassifier - Error while > classifying documents > java.lang.IllegalArgumentException: Label not found: input0.txt > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:125) > at > > org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:182) > at > > org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:205) > at > > org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:209) > at > > org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:173) > at > > org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:70) > at > > org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.analyzeResults(TestNaiveBayesDriver.java:160) > at > > org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:125) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > > org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveBayesDriver.java:66) > > > So how can I achieve what I'm trying to do here? > > Thanks, > > > -- > M.P. Tharindu Rusira Kumara > > Department of Computer Science and Engineering, > University of Moratuwa, > Sri Lanka. > +94757033733 > www.tharindu-rusira.blogspot.com >
