Hi Divya, I am kind of overwhelmed by the flurry of emails from you and the replies. I am currently not able to make head and tail of the problem you are facing. It would be really helpful if you can write a bit more about the input files the command your ran, the output files generated. their sizes, and so on. and maybe use a single email-thread for all Bayes classifier related problems. I guarantee you, I will be able to solve your issues with Bayes classifier much faster.
Regards Robin On Mon, Nov 29, 2010 at 12:54 PM, Divya <[email protected]> wrote: > Hi, > > Steps I followed are below : > > $ bin/mahout wikipediaDataSetCreator -i > > D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput > -o examples/bi > n/work/wikipedia/wikipediaClassification/train-subject -c > $MAHOUT_HOME/examples/src/test/resources/subjects.txt > > $ bin/mahout wikipediaDataSetCreator -i > D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput > -o examples/bin > /work/wikipedia/wikipediaClassification/test-subject -c > $MAHOUT_HOME/examples/src/test/resources/subjects.txt > > $ bin/mahout trainclassifier -i > examples/bin/work/wikipedia/wikipediaClassification/train-subject -o > examples/bin/work/wikipedia/wikip > ediaClassification/wikipedia-subject-model > > $ bin/mahout testclassifier -m > examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model > -d examples/bin/work/wikipedia/wikipediaClassification/test-subject > > > Regards, > Divya > > > > -----Original Message----- > From: Grant Ingersoll [mailto:[email protected]] > Sent: Saturday, November 27, 2010 8:54 PM > To: [email protected] > Subject: Re: NPE in bayes wiki example > > Can you provide all the steps you have done up to this point? > > -Grant > > On Nov 25, 2010, at 12:57 AM, Divya wrote: > > > Hi, > > > > I am getting null pointer exception when I pass my test input data to > > testclassifier > > > > > > > > $ bin/mahout testclassifier -m > > > examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model > > -d examples/bin/work/wikipe > > > > dia/wikipediaClassification/test-subject > > > > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2 > > > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf > > > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from: > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi > > > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes > > > > > tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject} > > > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier > > > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-weights/Sigma_j/part-00000 > > > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-weights/Sigma_k/part-00000 > > > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000 > > > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422 > > > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-thetaNormalizer/part-00000 > > > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000 > > > > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history > > -23722.080627413125 23722.080627413125 -1.0 > > > > Exception in thread "main" java.lang.NullPointerException > > > > at > > > > org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1 > > 02) > > > > at > > > > org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix. > > java:118) > > > > at > > > > org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix. > > java:122) > > > > at > > > > org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav > > a:90) > > > > at > > > > org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java: > > 68) > > > > at > > > > org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla > > ssifier.java:266) > > > > at > > > > org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1 > > 86) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > > at > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > > ) > > > > at > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > > .java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at > > > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver > > .java:68) > > > > at > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > > > at > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > > at > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > > ) > > > > at > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > > .java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > > > > > My classifier is subjects.txt which has two entries History and Science. > > > > > > > > > > > > > > > > but when I pass train input data I get to see the results > > > > > > > > $ bin/mahout testclassifier -m > > > examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model > > -d examples/bin/work/wikipe > > > > dia/wikipediaClassification/train-subject > > > > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2 > > > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf > > > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from: > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi > > > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes > > > > > tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject} > > > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-weights/Sigma_j/part-00000 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-weights/Sigma_k/part-00000 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-thetaNormalizer/part-00000 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki > > pedia-su > > > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000 > > > > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history > > -23722.080627413125 23722.080627413125 -1.0 > > > > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from > > part-r-00000 > > > > 10/11/25 13:51:55 INFO bayes.TestClassifier: > > ======================================================= > > > > Summary > > > > ------------------------------------------------------- > > > > Correctly Classified Instances : 2 100% > > > > Incorrectly Classified Instances : 0 0% > > > > Total Classified Instances : 2 > > > > > > > > ======================================================= > > > > Confusion Matrix > > > > ------------------------------------------------------- > > > > a <--Classified as > > > > 2 | 2 a = history > > > > Default Category: unknown: 1 > > > > > > > > > > > > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms > > > > > > > > > > > > Can someone please explain the reason behind it. > > > > > > > > Thanks > > > > Regards, > > > > Divya > > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem docs using Solr/Lucene: > http://www.lucidimagination.com/search > > >
