Did you try what I mentioned? On Tue, Nov 30, 2010 at 8:11 AM, Robin Anil <[email protected]> wrote:
> > On Tue, Nov 30, 2010 at 7:47 AM, Divya <[email protected]> wrote: > >> Hi, >> >> Thanks for the advice Robin. >> But most of the time I don’t get response of issues I am facing that’s why >> I reframe it and post it again. >> > The responses are usually delayed based on availability of free time for > all of us. Mahout community is made up of people who contribute as much as > they can when they find time as it is not part of our day to day work. So in > the time we get, unless we see details of the problem, we can't do anything > other than ask you again for details and this round trip keeps the > conversation going. I can point to many tutorials(even I read through them > before hacking away on Mahout) like this one > http://www.catb.org/~esr/faqs/smart-questions.html which will help you > understand a bit more of why people behave on mailing lists they way you > would have perceived. > >> > > May someone can understand my problem and would be able to help me. >> As I am new bee to Mahout and don’t have any experience in this field. >> >> We do want more new-bees coming in to Mahout :) > >> I am trying run the Wikipedia classification example. >> I have downloaded Wikipedia data set and created chunks of that data(1 MB >> each). >> I am using one of the chunk file for as my input data for Wikipedia >> example. >> >> >> Steps I followed are : >> 1.Created train input data set using one of the chunk of Wikipedia data >> set and subjects.txt with the help of wikipediaDataSetCreator CLI. >> 2.Repeated the first step but here the used another chunk of Wikipedia >> data set to create test input data. >> 3.Train the classifier by passing train input data set. >> 4.Test the classifier by passing train input data set as model and test >> input data set as testdir. >> >> Now the issue is when I try to testclassifier by passing trained data set >> as model and train input data set as testdir I am able to view the result in >> form of confusion matrix. >> But when I try to test classifier by passing by passing trained data set >> as model and test input data set(which I have created in second step) as >> testdir I get null pointer exception as shown in below mail. >> > Now I get what you are talking about. Can you do one thing. Can you train > the model using the test input dataset and try to classify the test dataset. > I want to check whether there is any corruption in the test dataset which is > causing this NPE > > > > >> >> Name Size >> >> Initial Train input data set 2 MB (two >> chunks) >> Initial Test input data set 1 MB (one >> chunk) >> Train data set after wikipediadatasetcreater(part-r-00000) 154 KB >> Test data set after wikipediadatasetcreater(part-r-00000) 43 KB >> Train model data set(trainer-thetaNormalizer) 1 KB >> Train model data set(trainer-tfIdf) 311 KB >> Train model data set(trainer-weights\Sigma_j) 215 KB >> Train model data set(trainer-weights\Sigma_kSigma_j) 1 KB >> Train model data set(trainer-weights\Sigma_k) 1 KB >> >> >> The model sizes look fine. Infact model loading didnt seem to have any > issue as per the logs you posted > >> Hope I will get solution of my issue now. >> >> Thanks much >> Regards, >> Divya >> >> >> >> >> >> >> >> -----Original Message----- >> From: Robin Anil [mailto:[email protected]] >> Sent: Monday, November 29, 2010 7:34 PM >> To: [email protected] >> Subject: Re: NPE in bayes wiki example >> >> Hi Divya, I am kind of overwhelmed by the flurry of emails from you and >> the >> replies. I am currently not able to make head and tail of the problem you >> are facing. It would be really helpful if you can write a bit more about >> the >> input files the command your ran, the output files generated. their sizes, >> and so on. and maybe use a single email-thread for all Bayes classifier >> related problems. I guarantee you, I will be able to solve your issues >> with >> Bayes classifier much faster. >> >> Regards >> Robin >> >> On Mon, Nov 29, 2010 at 12:54 PM, Divya <[email protected]> >> wrote: >> >> > Hi, >> > >> > Steps I followed are below : >> > >> > $ bin/mahout wikipediaDataSetCreator -i >> > >> > >> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput >> > -o examples/bi >> > n/work/wikipedia/wikipediaClassification/train-subject -c >> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt >> > >> > $ bin/mahout wikipediaDataSetCreator -i >> > >> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput >> > -o examples/bin >> > /work/wikipedia/wikipediaClassification/test-subject -c >> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt >> > >> > $ bin/mahout trainclassifier -i >> > examples/bin/work/wikipedia/wikipediaClassification/train-subject -o >> > examples/bin/work/wikipedia/wikip >> > ediaClassification/wikipedia-subject-model >> > >> > $ bin/mahout testclassifier -m >> > >> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model >> > -d examples/bin/work/wikipedia/wikipediaClassification/test-subject >> > >> > >> > Regards, >> > Divya >> > >> > >> > >> > -----Original Message----- >> > From: Grant Ingersoll [mailto:[email protected]] >> > Sent: Saturday, November 27, 2010 8:54 PM >> > To: [email protected] >> > Subject: Re: NPE in bayes wiki example >> > >> > Can you provide all the steps you have done up to this point? >> > >> > -Grant >> > >> > On Nov 25, 2010, at 12:57 AM, Divya wrote: >> > >> > > Hi, >> > > >> > > I am getting null pointer exception when I pass my test input data to >> > > testclassifier >> > > >> > > >> > > >> > > $ bin/mahout testclassifier -m >> > > >> > >> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model >> > > -d examples/bin/work/wikipe >> > > >> > > dia/wikipediaClassification/test-subject >> > > >> > > Running on hadoop, using >> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2 >> > > >> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf >> > > >> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from: >> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi >> > > >> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, >> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes >> > > >> > > >> > >> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject} >> > > >> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier >> > > >> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-weights/Sigma_j/part-00000 >> > > >> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-weights/Sigma_k/part-00000 >> > > >> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000 >> > > >> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422 >> > > >> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-thetaNormalizer/part-00000 >> > > >> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000 >> > > >> > > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history >> > > -23722.080627413125 23722.080627413125 -1.0 >> > > >> > > Exception in thread "main" java.lang.NullPointerException >> > > >> > > at >> > > >> > >> > >> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1 >> > > 02) >> > > >> > > at >> > > >> > >> > >> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix. >> > > java:118) >> > > >> > > at >> > > >> > >> > >> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix. >> > > java:122) >> > > >> > > at >> > > >> > >> > >> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav >> > > a:90) >> > > >> > > at >> > > >> > >> > >> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java: >> > > 68) >> > > >> > > at >> > > >> > >> > >> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla >> > > ssifier.java:266) >> > > >> > > at >> > > >> > >> > >> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1 >> > > 86) >> > > >> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > > >> > > at >> > > >> > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 >> > > ) >> > > >> > > at >> > > >> > >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl >> > > .java:25) >> > > >> > > at java.lang.reflect.Method.invoke(Method.java:597) >> > > >> > > at >> > > >> > >> > >> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver >> > > .java:68) >> > > >> > > at >> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >> > > >> > > at >> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184) >> > > >> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > > >> > > at >> > > >> > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 >> > > ) >> > > >> > > at >> > > >> > >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl >> > > .java:25) >> > > >> > > at java.lang.reflect.Method.invoke(Method.java:597) >> > > >> > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> > > >> > > >> > > >> > > My classifier is subjects.txt which has two entries History and >> Science. >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > but when I pass train input data I get to see the results >> > > >> > > >> > > >> > > $ bin/mahout testclassifier -m >> > > >> > >> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model >> > > -d examples/bin/work/wikipe >> > > >> > > dia/wikipediaClassification/train-subject >> > > >> > > Running on hadoop, using >> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2 >> > > >> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf >> > > >> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from: >> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi >> > > >> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, >> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes >> > > >> > > >> > >> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject} >> > > >> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier >> > > >> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-weights/Sigma_j/part-00000 >> > > >> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-weights/Sigma_k/part-00000 >> > > >> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000 >> > > >> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422 >> > > >> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-thetaNormalizer/part-00000 >> > > >> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: >> > > >> > >> > >> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki >> > > pedia-su >> > > >> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000 >> > > >> > > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history >> > > -23722.080627413125 23722.080627413125 -1.0 >> > > >> > > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from >> > > part-r-00000 >> > > >> > > 10/11/25 13:51:55 INFO bayes.TestClassifier: >> > > ======================================================= >> > > >> > > Summary >> > > >> > > ------------------------------------------------------- >> > > >> > > Correctly Classified Instances : 2 100% >> > > >> > > Incorrectly Classified Instances : 0 0% >> > > >> > > Total Classified Instances : 2 >> > > >> > > >> > > >> > > ======================================================= >> > > >> > > Confusion Matrix >> > > >> > > ------------------------------------------------------- >> > > >> > > a <--Classified as >> > > >> > > 2 | 2 a = history >> > > >> > > Default Category: unknown: 1 >> > > >> > > >> > > >> > > >> > > >> > > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms >> > > >> > > >> > > >> > > >> > > >> > > Can someone please explain the reason behind it. >> > > >> > > >> > > >> > > Thanks >> > > >> > > Regards, >> > > >> > > Divya >> > > >> > >> > -------------------------- >> > Grant Ingersoll >> > http://www.lucidimagination.com/ >> > >> > Search the Lucene ecosystem docs using Solr/Lucene: >> > http://www.lucidimagination.com/search >> > >> > >> > >> >> >
