Trouble running the Wikipedia Bayes Example

David Rahman Tue, 16 Nov 2010 09:19:22 -0800

Hi folks,


I'm trying to run the Wikipedia Bayes Example and I got stuck at step 8.
Train the classifier:

 

$MAHOUT_HOME/bin/mahout trainclassifier -i wikipediainput -o wikipediamodel

[from
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example]

 

All steps before worked just fine.

 

I downloaded the mahout-distribution-0.4.zip -file and I'm running it with
Hadoop on Ubuntu 10.04.

 

That's the exception I get:

Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
hdfs://localhost:9000/user/david/wikipediamodel/trainer-termDocCount

Input path does not exist:
hdfs://localhost:9000/user/david/wikipediamodel/trainer-wordFreq

Input path does not exist:
hdfs://localhost:9000/user/david/wikipediamodel/trainer-featureCount

 

I tried to created those folders manually with mkdir from the
hadoop-shell-commands-page, but when I re-run the Train the
classifier-command, the wikipediamodel where deleted and recreated, but
without the trainer-*-folders.

 

I'm not using the full wikipedia data, because 27GB (+chunk-files) is too
much for my HDD, I'm using the chunk-000*.xml-Files provided in the sources
from [http://www.ibm.com/developerworks/java/library/j-mahout/]. I do hope
that this is not the reason... 

 

Thanks and regards,

David

 

Ps: This is the full output:

da...@david-lenovotop:~$ $MAHOUT_HOME/bin/mahout trainclassifier -i
wikipediainput -o wikipediamodel

Running on hadoop, using HADOOP_HOME=/home/david/Programme/hadoop

HADOOP_CONF_DIR=/home/david/Programme/hadoop/conf

10/11/16 16:37:36 INFO bayes.TrainClassifier: Training Bayes Classifier

10/11/16 16:37:36 INFO common.HadoopUtil: Deleting wikipediamodel

10/11/16 16:37:36 INFO bayes.BayesDriver: Reading features...

10/11/16 16:37:36 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

10/11/16 16:37:37 INFO mapred.FileInputFormat: Total input paths to process
: 1

10/11/16 16:37:37 INFO mapred.JobClient: Running job: job_201011161633_0003

10/11/16 16:37:38 INFO mapred.JobClient:  map 0% reduce 0%

10/11/16 16:37:47 INFO mapred.JobClient:  map 100% reduce 0%

10/11/16 16:37:56 INFO mapred.JobClient:  map 100% reduce 100%

10/11/16 16:37:58 INFO mapred.JobClient: Job complete: job_201011161633_0003

10/11/16 16:37:58 INFO mapred.JobClient: Counters: 15

10/11/16 16:37:58 INFO mapred.JobClient:   Job Counters 

10/11/16 16:37:58 INFO mapred.JobClient:     Launched reduce tasks=1

10/11/16 16:37:58 INFO mapred.JobClient:     Launched map tasks=1

10/11/16 16:37:58 INFO mapred.JobClient:   FileSystemCounters

10/11/16 16:37:58 INFO mapred.JobClient:     FILE_BYTES_READ=6

10/11/16 16:37:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44

10/11/16 16:37:58 INFO mapred.JobClient:   Map-Reduce Framework

10/11/16 16:37:58 INFO mapred.JobClient:     Reduce input groups=0

10/11/16 16:37:58 INFO mapred.JobClient:     Combine output records=0

10/11/16 16:37:58 INFO mapred.JobClient:     Map input records=0

10/11/16 16:37:58 INFO mapred.JobClient:     Reduce shuffle bytes=6

10/11/16 16:37:58 INFO mapred.JobClient:     Reduce output records=0

10/11/16 16:37:58 INFO mapred.JobClient:     Spilled Records=0

10/11/16 16:37:58 INFO mapred.JobClient:     Map output bytes=0

10/11/16 16:37:58 INFO mapred.JobClient:     Map input bytes=0

10/11/16 16:37:58 INFO mapred.JobClient:     Combine input records=0

10/11/16 16:37:58 INFO mapred.JobClient:     Map output records=0

10/11/16 16:37:58 INFO mapred.JobClient:     Reduce input records=0

10/11/16 16:37:58 INFO bayes.BayesDriver: Calculating Tf-Idf...

10/11/16 16:37:58 INFO common.BayesTfIdfDriver: Counts of documents in Each
Label

10/11/16 16:37:58 INFO common.BayesTfIdfDriver: {}

10/11/16 16:37:58 INFO common.BayesTfIdfDriver: {dataSource=hdfs,
alpha_i=1.0, minDf=1, gramSize=1}

10/11/16 16:37:58 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
hdfs://localhost:9000/user/david/wikipediamodel/trainer-termDocCount

Input path does not exist:
hdfs://localhost:9000/user/david/wikipediamodel/trainer-wordFreq

Input path does not exist:
hdfs://localhost:9000/user/david/wikipediamodel/trainer-featureCount

                at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190
)

                at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInpu
tFormat.java:44)

                at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)

                at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)

                at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)

                at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)

                at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)

                at
org.apache.mahout.classifier.bayes.mapreduce.common.BayesTfIdfDriver.runJob(
BayesTfIdfDriver.java:130)

                at
org.apache.mahout.classifier.bayes.mapreduce.bayes.BayesDriver.runJob(BayesD
river.java:49)

                at
org.apache.mahout.classifier.bayes.TrainClassifier.trainNaiveBayes(TrainClas
sifier.java:54)

                at
org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java
:162)

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

                at java.lang.reflect.Method.invoke(Method.java:597)

                at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
.java:68)

                at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)

                at
org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

                at java.lang.reflect.Method.invoke(Method.java:597)

                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Trouble running the Wikipedia Bayes Example

Reply via email to