Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
BaseTokenFilterFactory
-------------------------------------------------------------------------------------------

                 Key: SOLR-1677
                 URL: https://issues.apache.org/jira/browse/SOLR-1677
             Project: Solr
          Issue Type: Sub-task
          Components: Schema and Analysis
            Reporter: Uwe Schindler
         Attachments: SOLR-1677.patch

Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
compatibility with old indexes created using older versions of Lucene. The most 
important example is StandardTokenizer, which changed its behaviour with 
posIncr and incorrect host token types in 2.4 and also in 2.9.

In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
much more Unicode support, almost every Tokenizer/TokenFilter needs this 
Version parameter. In 2.9, the deprecated old ctors without Version take 
LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.

This patch adds basic support for the Lucene Version property to the base 
factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) 
/ Parameter (in 2.9) for constructing Tokenstreams. The code currently contains 
a helper map to decode the version strings, but in 3.0 is can be replaced by 
Version.valueOf(String), as the Version is a subclass of Java5 enums. The 
default value is Version.LUCENE_24 (as this is the default for the no-version 
ctors in Lucene).

This patch also removes unneeded conversions to CharArraySet from 
StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to