Thanks Sean,

I tried to wrap my mind around where an incompatible version of Lucene
could be coming from. I'm assuming that this happens during the Hadoop job
execution based on the output below:

MAHOUT-JOB:
/home/ubuntu/mahout/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar
12/04/06 14:38:03 WARN driver.MahoutDriver: No
wikipediaDataSetCreator.props found on classpath, will use command-line
arguments only
12/04/06 14:38:04 INFO bayes.WikipediaDatasetCreatorDriver: Input:
/raid0/wikipedia/chunks Out: wikipediainput Categories:
/home/ubuntu/mahout/examples/src/test/resources/country.txt
12/04/06 14:38:04 INFO common.HadoopUtil: Deleting wikipediainput
12/04/06 14:38:05 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
12/04/06 14:38:06 INFO input.FileInputFormat: Total input paths to process
: 555
12/04/06 14:38:07 INFO mapred.JobClient: Running job: job_201204051216_0017
12/04/06 14:38:08 INFO mapred.JobClient:  map 0% reduce 0%
12/04/06 14:38:26 INFO mapred.JobClient: Task Id :
attempt_201204051216_0017_m_000005_0, Status : FAILED
Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides
final method
tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;
12/04/06 14:38:28 INFO mapred.JobClient: Task Id :
attempt_201204051216_0017_m_000002_0, Status : FAILED
Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides
final method
tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;

So, I thought that including
a 
HADOOP_CLASSPATH=/home/ubuntu/.m2/repository/org/apache/lucene/lucene-analyzers/3.5.0
directly in the mahout command would force the mahout lucene to be used.

sudo env JAVA_HOME=$JAVA_HOME HADOOP_HOME=/xxx/xxx/hadoop
HADOOP_CONF_DIR=/xxx/xxx/hadoop
HADOOP_CLASSPATH=/home/ubuntu/.m2/repository/org/apache/lucene/lucene-analyzers/3.5.0
./mahout wikipediaDataSetCreator -i wikipedia/chunks -o wikipediainput -c
/home/ubuntu/mahout/examples/src/test/resources/country.txt

That's the lucene that maven pulled in when building from source, so I'm
assuming it's the correct one. However, still getting the error. Between
`mahout` being called, and that in turn calling `hadoop`, how do I ensure
that correct lucene is used?

On Fri, Apr 6, 2012 at 08:31, Sean Owen <[email protected]> wrote:

> This means you have an incompatible version of Lucene in your app at
> runtime. Use the same one Mahout uses.
>
> On Fri, Apr 6, 2012 at 2:21 PM, Tristan Slominski
> <[email protected]> wrote:
> > Hello group,
> >
> > I managed to get Mahout running.. awesome! But I keep on running into
> > issues that break Hadoop jobs that Mahout launches.
> >
> > For example, when I follow the wikipedia Naive Bayes example, during the
> > wikipediaDataSetCreator step, my Hadoop jobs fail due to:
> >
> > Error: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides
> > final method
> >
> tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;
> >
> > So, I decided to try the examples in the example folder within Mahout.
> >
> > The classify-20newsgroups.sh example works just fine.
> >
> > Then I try to run the cluster-reuters.sh example and Hadoop jobs break
> with:
> >
> > Error: class org.apache.mahout.vectorizer.DefaultAnalyzer overrides final
> > method
> >
> tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;
> >
> > I did this on latest Mahout 7.0 Snapshot built from source, and on the
> > packaged Mahout 6.0.
> >
> > From reading about it, it appears that the problem stems from the Lucene
> > project enforcing a final restriction on
> > org.apache.lucene.analysis.TokenStream . So, in order to try to at least
> > get it to run despite that restriction, I attempted to find a way to
> build
> > lucene-analysis project from scratch to generate a separate jar that
> > doesn't have the final restriction, but I'm sort of lost in the size of
> > that project right now.
> >
> > What are you doing to get around this issue? Am I doing something wrong?
> > Using a wrong version of something perhaps? Again, I've build latest 7.0
> > Snapshot from source and I used packaged Mahout 6.0 with same problems.
> >
> > Cheers,
> >
> > Tristan
>

Reply via email to