[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Uwe Schindler (JIRA) Fri, 01 Jan 2010 03:34:22 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795746#action_12795746
 ]


Uwe Schindler commented on SOLR-1677:
-------------------------------------

The problem is the default value. If you leave out the version parameter 
instance-wise, you will get 2.4. And because of that all solr users will get 
stuck with that version and will never upgrade (because they leave the default 
and do not specify a different value). Because of backwards compatibility, we 
are limited to this version number as default value.

The schema/config global version is the global default used by all instances, 
that do not specify a different value. By that we can ship the default 
solconfig/schema.xml with the latest possible lucene version, but users 
upgrading will keep their default value.

I repeat: with instance-wise config, nobody will ever use it for new analyzers. 
With a global default, there is only *one* place that sets the version, which 
is also valid for user-added tokenizer chains.

For the SolrCore problem: For analyzers the idea its, that the default Version 
constant is automatically passed to all tokenizers in the param map 
automatically. Local values overwrite the key in the map. But this would only 
apply the analyzers. Other usages of Version at other places (QP, IW) still 
need SolrCore. But we can move the SolrCoreAware to the schema classes and not 
make every TokenFilter/Tokenizer SolrCoreAware.

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Reply via email to