I see what's happening here. Good catch Chris.

(1)  First, SnowballAnalyzer has been deprecated and will be retired in 
Lucene 5.0 and the Javadocs suggest that we should be using the 
language-specific analyzers.

      
http://lucene.apache.org/core/4_2_1/analyzers-common/org/apache/lucene/analysis/snowball/SnowballAnalyzer.html
    

(2)  The constructor for SnowballAnalyzer expects additional arguments in 
addition to the Lucene Version (see the link in (1)).

        public SnowballAnalyzer(Version matchVersion, String name);
        public SnowballAnalyzer(Version matchVersion, String name, CharArraySet 
stopWords);

       Presently what's being invoked is the default constructor (which expects 
only the Lucene Version) 

      see   AnalyzerUtils.createAnalyzer(Class<? extends Analyzer> 
analyzerClass, Version version).

      If the constructor expects additional arguments this
 obviously fails which is what's happening here.

 Fix:

    (1) Read additional constructor arguments from the seq2sparse CLI and pass 
them through to AnalyzerUtils.
    (2) Modify AnalyzerUtils to be able to read additional params required to 
instantiate a specific Analyzer.

 



________________________________
 From: Chris Harrington <[email protected]>
To: [email protected]; Suneel Marthi <[email protected]> 
Sent: Tuesday, April 23, 2013 4:57 AM
Subject: Re: seq2sparse in 0.8 throwing class not found for analyzers
 

Nice that the fix for that is just a package name change

There's also an issue with the Snowball analyzer 

Exception in thread "main" java.lang.IllegalStateException: 
java.lang.NoSuchMethodException: 
org.apache.lucene.analysis.snowball.SnowballAnalyzer.<init>()

On 22 Apr 2013, at 20:15, Suneel Marthi wrote:

> Phew,...
> 
> The fix for this was a DUD.
> 
> In Lucene 4.2.1 the package name for this class was changed to 
> org.apache.lucene.analysis.en.EnglishAnalyzer.
> 
> Notice 'en' in the package path.
> 
> This should work.
> 
> 
> 
> 
> ________________________________
> From: Chris Harrington <[email protected]>
> To: [email protected] 
> Sent: Monday, April 22, 2013 6:08 AM
> Subject: seq2sparse in 0.8 throwing class not found for analyzers
> 
> 
> HI all,
> 
> I'm trying to run the seq2sparse tool with one of the lucene analyzers but it 
> throws a class not found exception
> 
> mahout seq2sparse -i ./contentDataDir/sequenced -o 
> ./contentDataDir/sparseVectors --namedVector -wt tf -a 
> org.apache.lucene.analysis.EnglishAnalyzer
> 
> java.lang.ClassNotFoundException: org.apache.lucene.analysis.EnglishAnalyzer
> 
> Looking at the output from bin/mahout classpath
> 
> it shows that lucene-analyzers-common-4.2.1.jar is in there as a dependancy 
> so any idea why is the above throwing an exception.

Reply via email to