Hi,

Thanks for the advice Robin.
But most of the time I don’t get response of issues I am facing that’s why  I 
reframe it and post it again.
May someone can understand my problem and would be able to help me.
As I am new bee to Mahout and don’t have any experience in this field.

I am trying run the Wikipedia classification example.
I have downloaded Wikipedia data set and created chunks of that data(1 MB each).
I am using one of the chunk file for as my input data for Wikipedia example.


Steps I followed are :
1.Created train input data set using one of the chunk of Wikipedia data set and 
subjects.txt with the help of wikipediaDataSetCreator CLI.
2.Repeated the first step but here the  used another chunk of Wikipedia data 
set to create test input data.
3.Train the classifier by passing train input data set.
4.Test the classifier by passing train input data set as model and test input 
data set as testdir.

Now the issue is when I try to testclassifier by passing trained data set as 
model and train input data set as testdir I am able to view the result in form 
of confusion matrix.
But when I try to test classifier by passing by passing trained data set as 
model and test input data set(which I have created in second step) as testdir I 
get null pointer exception as shown in below mail.

           Name                                                 Size 

Initial Train input data set                                 2 MB (two chunks)
Initial Test input data set                                  1 MB (one chunk)
Train data set after wikipediadatasetcreater(part-r-00000)   154 KB
Test data set after wikipediadatasetcreater(part-r-00000)    43 KB
Train model data set(trainer-thetaNormalizer)                1 KB
Train model data set(trainer-tfIdf)                          311 KB 
Train model data set(trainer-weights\Sigma_j)                215 KB
Train model data set(trainer-weights\Sigma_kSigma_j)          1 KB
Train model data set(trainer-weights\Sigma_k)                 1 KB


Hope I will get solution of my issue now.

Thanks much  
Regards,
Divya 







-----Original Message-----
From: Robin Anil [mailto:[email protected]] 
Sent: Monday, November 29, 2010 7:34 PM
To: [email protected]
Subject: Re: NPE in bayes wiki example

Hi Divya, I am kind of overwhelmed by the flurry of emails from you and the
replies. I am currently not able to make head and tail of the problem you
are facing. It would be really helpful if you can write a bit more about the
input files the command your ran, the output files generated. their sizes,
and so on. and maybe use a single email-thread for all Bayes classifier
related problems. I guarantee you, I will be able to solve your issues with
Bayes classifier much faster.

Regards
Robin

On Mon, Nov 29, 2010 at 12:54 PM, Divya <[email protected]> wrote:

> Hi,
>
> Steps I followed are below :
>
> $  bin/mahout wikipediaDataSetCreator  -i
>
> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput
> -o examples/bi
> n/work/wikipedia/wikipediaClassification/train-subject -c
> $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>
> $  bin/mahout wikipediaDataSetCreator  -i
> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput
> -o examples/bin
> /work/wikipedia/wikipediaClassification/test-subject -c
> $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>
> $ bin/mahout trainclassifier -i
> examples/bin/work/wikipedia/wikipediaClassification/train-subject -o
> examples/bin/work/wikipedia/wikip
> ediaClassification/wikipedia-subject-model
>
> $ bin/mahout testclassifier -m
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> -d examples/bin/work/wikipedia/wikipediaClassification/test-subject
>
>
> Regards,
> Divya
>
>
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:[email protected]]
> Sent: Saturday, November 27, 2010 8:54 PM
> To: [email protected]
> Subject: Re: NPE in bayes wiki example
>
> Can you provide all the steps you have done up to this point?
>
> -Grant
>
> On Nov 25, 2010, at 12:57 AM, Divya wrote:
>
> > Hi,
> >
> > I am getting null pointer exception when I pass my test input data to
> > testclassifier
> >
> >
> >
> > $ bin/mahout testclassifier -m
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > -d examples/bin/work/wikipe
> >
> > dia/wikipediaClassification/test-subject
> >
> > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> >
> > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> >
> > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from:
> > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
> >
> > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
> > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
> >
> >
> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject}
> >
> > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_j/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_k/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422
> >
> > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-thetaNormalizer/part-00000
> >
> > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
> >
> > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history
> > -23722.080627413125 23722.080627413125 -1.0
> >
> > Exception in thread "main" java.lang.NullPointerException
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1
> > 02)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
> > java:118)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
> > java:122)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav
> > a:90)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:
> > 68)
> >
> >        at
> >
>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla
> > ssifier.java:266)
> >
> >        at
> >
>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1
> > 86)
> >
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >        at
> >
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >        at
> >
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > .java:25)
> >
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >        at
> >
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
> > .java:68)
> >
> >        at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >
> >        at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> >
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >        at
> >
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >        at
> >
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > .java:25)
> >
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> >
> >
> > My classifier is subjects.txt which has two entries History and Science.
> >
> >
> >
> >
> >
> >
> >
> > but when I pass train input data I get to see the results
> >
> >
> >
> > $ bin/mahout testclassifier -m
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > -d examples/bin/work/wikipe
> >
> > dia/wikipediaClassification/train-subject
> >
> > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> >
> > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> >
> > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from:
> > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
> >
> > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
> > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
> >
> >
> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject}
> >
> > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_j/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_k/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-thetaNormalizer/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
> >
> > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history
> > -23722.080627413125 23722.080627413125 -1.0
> >
> > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from
> > part-r-00000
> >
> > 10/11/25 13:51:55 INFO bayes.TestClassifier:
> > =======================================================
> >
> > Summary
> >
> > -------------------------------------------------------
> >
> > Correctly Classified Instances          :          2           100%
> >
> > Incorrectly Classified Instances        :          0             0%
> >
> > Total Classified Instances              :          2
> >
> >
> >
> > =======================================================
> >
> > Confusion Matrix
> >
> > -------------------------------------------------------
> >
> > a       <--Classified as
> >
> > 2        |  2           a     = history
> >
> > Default Category: unknown: 1
> >
> >
> >
> >
> >
> > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms
> >
> >
> >
> >
> >
> > Can someone please explain the reason behind it.
> >
> >
> >
> > Thanks
> >
> > Regards,
> >
> > Divya
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
>

Reply via email to