[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802020#action_12802020
 ] 

Mark Miller commented on SOLR-1677:
-----------------------------------

In my opinion this should be real simple. Having to specify a Lucene version 
for each component is not simple - its beyond most users. I think its beyond me 
(laugh as you see fit). Having to accept Lucene 2.4 behavior by default because 
of Solr back compat issues is also "weak". A new user should get all the bug 
fixes of the latest Lucene with minimal effort. Hopefully no effort. Older 
users should be able to get the newest with minimal effort as well - not having 
to go one by one through each component and upgrading it. I can't imagine 
juggling all these versions for each component - thats ugly enough in Lucene - 
it shouldn't infect Solr for the average case.

Personally, I do think there should be a global default. And I think right next 
to it, it should say, if you change this, you must reindex. No worries about 
action at a distance. The action is to get the latest and greatest Lucene has 
to offer rather than older buggy or back compat behavior. Reindex, get latest 
greatest. Don't reindex and your on your own. Solr might rip your head off.

We should also offer per component for real experts, but I wouldn't be meddling 
that way myself unless in a bind. Solr should be real simple about this - and 
the latest Solr should use the latest bug fixes from Lucene, with previous 
configs out there defaulting to 2.4 compatibility.

I abbreviated the heck out of my arguments and thinking, but damn it thats what 
I think :)

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to