I see what's happening here. Good catch Chris. (1) First, SnowballAnalyzer has been deprecated and will be retired in Lucene 5.0 and the Javadocs suggest that we should be using the language-specific analyzers.
http://lucene.apache.org/core/4_2_1/analyzers-common/org/apache/lucene/analysis/snowball/SnowballAnalyzer.html (2) The constructor for SnowballAnalyzer expects additional arguments in addition to the Lucene Version (see the link in (1)). public SnowballAnalyzer(Version matchVersion, String name); public SnowballAnalyzer(Version matchVersion, String name, CharArraySet stopWords); Presently what's being invoked is the default constructor (which expects only the Lucene Version) see AnalyzerUtils.createAnalyzer(Class<? extends Analyzer> analyzerClass, Version version). If the constructor expects additional arguments this obviously fails which is what's happening here. Fix: (1) Read additional constructor arguments from the seq2sparse CLI and pass them through to AnalyzerUtils. (2) Modify AnalyzerUtils to be able to read additional params required to instantiate a specific Analyzer. ________________________________ From: Chris Harrington <[email protected]> To: [email protected]; Suneel Marthi <[email protected]> Sent: Tuesday, April 23, 2013 4:57 AM Subject: Re: seq2sparse in 0.8 throwing class not found for analyzers Nice that the fix for that is just a package name change There's also an issue with the Snowball analyzer Exception in thread "main" java.lang.IllegalStateException: java.lang.NoSuchMethodException: org.apache.lucene.analysis.snowball.SnowballAnalyzer.<init>() On 22 Apr 2013, at 20:15, Suneel Marthi wrote: > Phew,... > > The fix for this was a DUD. > > In Lucene 4.2.1 the package name for this class was changed to > org.apache.lucene.analysis.en.EnglishAnalyzer. > > Notice 'en' in the package path. > > This should work. > > > > > ________________________________ > From: Chris Harrington <[email protected]> > To: [email protected] > Sent: Monday, April 22, 2013 6:08 AM > Subject: seq2sparse in 0.8 throwing class not found for analyzers > > > HI all, > > I'm trying to run the seq2sparse tool with one of the lucene analyzers but it > throws a class not found exception > > mahout seq2sparse -i ./contentDataDir/sequenced -o > ./contentDataDir/sparseVectors --namedVector -wt tf -a > org.apache.lucene.analysis.EnglishAnalyzer > > java.lang.ClassNotFoundException: org.apache.lucene.analysis.EnglishAnalyzer > > Looking at the output from bin/mahout classpath > > it shows that lucene-analyzers-common-4.2.1.jar is in there as a dependancy > so any idea why is the above throwing an exception.
