Thanks Sean, I tried to wrap my mind around where an incompatible version of Lucene could be coming from. I'm assuming that this happens during the Hadoop job execution based on the output below:
MAHOUT-JOB: /home/ubuntu/mahout/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar 12/04/06 14:38:03 WARN driver.MahoutDriver: No wikipediaDataSetCreator.props found on classpath, will use command-line arguments only 12/04/06 14:38:04 INFO bayes.WikipediaDatasetCreatorDriver: Input: /raid0/wikipedia/chunks Out: wikipediainput Categories: /home/ubuntu/mahout/examples/src/test/resources/country.txt 12/04/06 14:38:04 INFO common.HadoopUtil: Deleting wikipediainput 12/04/06 14:38:05 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/04/06 14:38:06 INFO input.FileInputFormat: Total input paths to process : 555 12/04/06 14:38:07 INFO mapred.JobClient: Running job: job_201204051216_0017 12/04/06 14:38:08 INFO mapred.JobClient: map 0% reduce 0% 12/04/06 14:38:26 INFO mapred.JobClient: Task Id : attempt_201204051216_0017_m_000005_0, Status : FAILED Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; 12/04/06 14:38:28 INFO mapred.JobClient: Task Id : attempt_201204051216_0017_m_000002_0, Status : FAILED Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; So, I thought that including a HADOOP_CLASSPATH=/home/ubuntu/.m2/repository/org/apache/lucene/lucene-analyzers/3.5.0 directly in the mahout command would force the mahout lucene to be used. sudo env JAVA_HOME=$JAVA_HOME HADOOP_HOME=/xxx/xxx/hadoop HADOOP_CONF_DIR=/xxx/xxx/hadoop HADOOP_CLASSPATH=/home/ubuntu/.m2/repository/org/apache/lucene/lucene-analyzers/3.5.0 ./mahout wikipediaDataSetCreator -i wikipedia/chunks -o wikipediainput -c /home/ubuntu/mahout/examples/src/test/resources/country.txt That's the lucene that maven pulled in when building from source, so I'm assuming it's the correct one. However, still getting the error. Between `mahout` being called, and that in turn calling `hadoop`, how do I ensure that correct lucene is used? On Fri, Apr 6, 2012 at 08:31, Sean Owen <[email protected]> wrote: > This means you have an incompatible version of Lucene in your app at > runtime. Use the same one Mahout uses. > > On Fri, Apr 6, 2012 at 2:21 PM, Tristan Slominski > <[email protected]> wrote: > > Hello group, > > > > I managed to get Mahout running.. awesome! But I keep on running into > > issues that break Hadoop jobs that Mahout launches. > > > > For example, when I follow the wikipedia Naive Bayes example, during the > > wikipediaDataSetCreator step, my Hadoop jobs fail due to: > > > > Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides > > final method > > > tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; > > > > So, I decided to try the examples in the example folder within Mahout. > > > > The classify-20newsgroups.sh example works just fine. > > > > Then I try to run the cluster-reuters.sh example and Hadoop jobs break > with: > > > > Error: class org.apache.mahout.vectorizer.DefaultAnalyzer overrides final > > method > > > tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; > > > > I did this on latest Mahout 7.0 Snapshot built from source, and on the > > packaged Mahout 6.0. > > > > From reading about it, it appears that the problem stems from the Lucene > > project enforcing a final restriction on > > org.apache.lucene.analysis.TokenStream . So, in order to try to at least > > get it to run despite that restriction, I attempted to find a way to > build > > lucene-analysis project from scratch to generate a separate jar that > > doesn't have the final restriction, but I'm sort of lost in the size of > > that project right now. > > > > What are you doing to get around this issue? Am I doing something wrong? > > Using a wrong version of something perhaps? Again, I've build latest 7.0 > > Snapshot from source and I used packaged Mahout 6.0 with same problems. > > > > Cheers, > > > > Tristan >
