There are two answers:

First answer: you are using the "old" Bayes classifier.
TrainNaiveBayes and TestNaiveBayes are newer and apparently work
better (I cannot tell you how). TrainNaiveBayes reads the entire model
in one program at the end of the training pass, so this surprise will
not happen. 'mahout trainnb' and 'mahout testnb'.

Second answer: you are training with too much data.  Try a smaller
corpus, or use the minSupport and minDf parameters to limit the terms
you train against.


2011/12/30 enyun <[email protected]>:
> hi all,
>
> I'm using mahout bayes model to predict some new data.
> After I got the model by 'trainclassifier', I found this model would cause 
> out-of-memory when I was using 'testclassifer'.
> I have tried to enlarge my java heap size to 4g, but it still did not work.
> I felt it was very strange of trainclassifer's working well while 
> testclassifer's not working.
> Do you know how to deal with this issue?
>
> 'java.lang.OutOfMemoryError: Java heap space
>    at 
> org.apache.mahout.math.map.OpenObjectIntHashMap.rehash(OpenObjectIntHashMap.java:435)
>    at 
> org.apache.mahout.math.map.OpenObjectIntHashMap.put(OpenObjectIntHashMap.java:387)
>    at 
> org.apache.mahout.classifier.bayes.InMemoryBayesDatastore.getFeatureID(InMemoryBayesDatastore.java:131)
>    at 
> org.apache.mahout.classifier.bayes.InMemoryBayesDatastore.setSumFeatureWeight(InMemoryBayesDatastore.java:153)
>    at 
> org.apache.mahout.classifier.bayes.SequenceFileModelReader.loadFeatureWeights(SequenceFileModelReader.java:82)
>    at 
> org.apache.mahout.classifier.bayes.SequenceFileModelReader.loadModel(SequenceFileModelReader.java:46)
>    at 
> org.apache.mahout.classifier.bayes.InMemoryBayesDatastore.initialize(InMemoryBayesDatastore.java:72)
>    at 
> org.apache.mahout.classifier.bayes.ClassifierContext.initialize(ClassifierContext.java:44)
>    at 
> org.apache.mahout.classifier.bayes.mapreduce.bayes.BayesClassifierMapper.configure(BayesClassifierMapper.java:130)
>    ... 22 more'
>
> thanks,
>
>
> mahout testclassifier -m /user/mahoutTest//bayes-model -d 
> /user/enyun/mahoutTest//bayes-test-input -type bayes -ng 1 -source hdfs 
> -method mapreduce
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using 
> HADOOP_HOME=/home/work/Programs/hadoop/hadoop-0.20.203.0/
> HADOOP_CONF_DIR=/home/work/Programs/hadoop/hadoop-0.20.203.0//conf
> MAHOUT-JOB: 
> /home/work/code/hadoop/mahout/mahout-trunk/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
> 11/12/30 16:22:59 WARN driver.MahoutDriver: No testclassifier.props found on 
> classpath, will use command-line arguments only
> 11/12/30 16:23:00 INFO common.HadoopUtil: Deleting 
> /user/mahoutTest/bayes-test-input-output
> 11/12/30 16:23:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
> the arguments. Applications should implement Tool for the same.
> 11/12/30 16:23:01 INFO mapred.FileInputFormat: Total input paths to process : 
> 2
> 11/12/30 16:23:02 INFO mapred.JobClient: Running job: job_201112231028_0058
> 11/12/30 16:23:03 INFO mapred.JobClient:  map 0% reduce 0%
> 11/12/30 16:23:28 INFO mapred.JobClient: Task Id : 
> attempt_201112231028_0058_m_000000_0, Status : FAILED
> java.lang.RuntimeException: Error in configuring object
>    at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>    at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>    at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.reflect.InvocationTargetException
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>    ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
>    at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>    at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>    ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>    ... 17 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
>    at 
> org.apache.mahout.math.map.OpenObjectIntHashMap.rehash(OpenObjectIntHashMap.java:435)
>    at 
> org.apache.mahout.math.map.OpenObjectIntHashMap.put(OpenObjectIntHashMap.java:387)
>    at 
> org.apache.mahout.classifier.bayes.InMemoryBayesDatastore.getFeatureID(InMemoryBayesDatastore.java:131)
>    at 
> org.apache.mahout.classifier.bayes.InMemoryBayesDatastore.setSumFeatureWeight(InMemoryBayesDatastore.java:153)
>    at 
> org.apache.mahout.classifier.bayes.SequenceFileModelReader.loadFeatureWeights(SequenceFileModelReader.java:82)
>    at 
> org.apache.mahout.classifier.bayes.SequenceFileModelReader.loadModel(SequenceFileModelReader.java:46)
>    at 
> org.apache.mahout.classifier.bayes.InMemoryBayesDatastore.initialize(InMemoryBayesDatastore.java:72)
>    at 
> org.apache.mahout.classifier.bayes.ClassifierContext.initialize(ClassifierContext.java:44)
>    at 
> org.apache.mahout.classifier.bayes.mapreduce.bayes.BayesClassifierMapper.configure(BayesClassifierMapper.java:130)
>    ... 22 more
>
> 11/12/30 16:23:28 INFO mapred.JobClient: Task Id : 
> attempt_201112231028_0058_m_000001_0, Status : FAILED
> java.lang.RuntimeException: Error in configuring object
>    at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>    at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>



-- 
Lance Norskog
[email protected]

Reply via email to