Hello group, I managed to get Mahout running.. awesome! But I keep on running into issues that break Hadoop jobs that Mahout launches.
For example, when I follow the wikipedia Naive Bayes example, during the wikipediaDataSetCreator step, my Hadoop jobs fail due to: Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; So, I decided to try the examples in the example folder within Mahout. The classify-20newsgroups.sh example works just fine. Then I try to run the cluster-reuters.sh example and Hadoop jobs break with: Error: class org.apache.mahout.vectorizer.DefaultAnalyzer overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream; I did this on latest Mahout 7.0 Snapshot built from source, and on the packaged Mahout 6.0. >From reading about it, it appears that the problem stems from the Lucene project enforcing a final restriction on org.apache.lucene.analysis.TokenStream . So, in order to try to at least get it to run despite that restriction, I attempted to find a way to build lucene-analysis project from scratch to generate a separate jar that doesn't have the final restriction, but I'm sort of lost in the size of that project right now. What are you doing to get around this issue? Am I doing something wrong? Using a wrong version of something perhaps? Again, I've build latest 7.0 Snapshot from source and I used packaged Mahout 6.0 with same problems. Cheers, Tristan
