[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Robert Muir (JIRA) Tue, 05 Jan 2010 18:32:18 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796965#action_12796965
 ]


Robert Muir commented on SOLR-1677:
-----------------------------------

{quote}
No, he uses an OS where he can upgrade indivudal things individually with clear 
implications - he sets luceneMatchVersion="2.9" on each and every <analyzer/>, 
<tokenizer/> and <filter/> that he declares in his schema so that he knows 
exactly what behavior is changing when he modifies any of them.
{quote}

Yeah, but this isnt how Version works in lucene either, please see below

{quote}
I'm not advocating that we don't allow a way to specify Version, i'm saying 
that having a global value for it that affects things opaquely sounds dangerous 
- we should certianly have a way for people to specify the Version they want on 
each of the objects that care, but it shouldn't be global. The 
"luceneMatchVersion" property that Uwe added to BaseTokenizerFactory and 
BaseTokenFilterFactory in his patch seems perfect to me, it's just the 
SolrCoreAware / core.getSolrConfig().luceneMatchVersion that i think is a bad 
idea.
{quote}

And I disagree, I think that the per-tokenfilter matchVersion should be the 
expert use, with the default global Version being the standard use. 

I don't think Version is intended so you can use X.Y on this part and Y.Z on 
this part and have any chance of anything working, for example it controls 
position increments on stopfilter but also in queryparser, if you use wacky 
combinations, things might not work.

And I personally don't see anyone putting effort into supporting this either, 
because its enough to supply the back compat for previous versions, but not 
some cross product of all possible versions. this is too much. sometimes things 
interact in ways we cannot detect automatically (such as the query parser 
phrasequery / stopfilter thing), its my understanding that things like this are 
why Version was created in the first place.


> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Reply via email to