Hi All,

I'm having trouble getting the 20News-Groups 
(https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups,
 and https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html)
example to run.

I've downloaded the data and tried to train the Naive Bayes classifier 
but I ran the 'trainclassifier' command and got this error message...

hadoop@kdevlinux:/usr/local/mahout$ mahout trainclassifier -i
examples/bin/work/20news-bydate/bayes-train-input -o
examples/bin/work/20news-bydate/bayes-model -type bayes -ng 1 -source hdfs
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
No HADOOP_CONF_DIR set, using /usr/local/hadoop/src/conf
11/04/13 09:16:29 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.utils.eval.InMemoryFactorizationEvaluator
11/04/13 09:16:29 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.utils.eval.ParallelFactorizationEvaluator
11/04/13 09:16:29 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.utils.eval.DatasetSplitter
11/04/13 09:16:29 INFO bayes.TrainClassifier: Training Bayes Classifier
11/04/13 09:16:29 INFO bayes.BayesDriver: Reading features...
11/04/13 09:16:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
11/04/13 09:16:31 INFO mapred.FileInputFormat: Total input paths to process : 20
Exception in thread "main" java.lang.IllegalArgumentException: 
Illegal Capacity: -40
at java.util.ArrayList.<init>(ArrayList.java:110)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:216)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at
org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob
(BayesFeatureDriver.java:63)
at
org.apache.mahout.classifier.bayes.mapreduce.bayes.BayesDriver.runJob
(BayesDriver.java:47)
at
org.apache.mahout.classifier.bayes.TrainClassifier.trainNaiveBayes
(TrainClassifier.java:54)
at org.apache.mahout.classifier.bayes.TrainClassifier.main
(TrainClassifier.java:162)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke
(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


I thought that maybe I had entered a command wrongly, but then I found the
'build-20news-bayes.sh' shell script, and when I try to run this I get the 
same exception.

I've been running Hadoop 0.20.2 on a 4-node cluster smoothly until now, all 
are Debian machines using sun-java6-* packages, and I'm running Mahout 
trunk checked out of the svn repository 
(svn co http://svn.apache.org/repos/asf/mahout/trunk) today.

All the <newsgroup>.txt files seem to have been created and uploaded 
to HDFS correctly ('hadoop dfs -lsr examples/bin/work'). 

I'm not sure what to try next. Any help would be very welcome.

Ken 



Reply via email to