[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Hoss Man (JIRA) Thu, 14 Jan 2010 18:59:20 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800514#action_12800514
 ]


Hoss Man commented on SOLR-1677:
--------------------------------

{quote}
Yes. The whole point is to avoid Analyzer mismatches.

Say a stoplist was modified between Lucene versions. Sure, you can hack it
and ask for an old match version, so you get a stoplist other than the one that
was used to build the index... but why would you want to?
{quote}

...but that's no different then using StopFilter(someStopWordSet) at indexing 
and StopFilter(someOtherStopWordSet) at query time -- Solr happily lets you do 
that with it's index/query analyzers ... you may have a very good reason for 
doing that.  Likewise you may have an existing field using the "default" 
stopwords list from Version.LUCENE_24 that you don't want to change because you 
want clients that search on that field to continue to get the same behavior, 
but when you add a new field you want it to have the current default stopwords 
because it's queried by entirely different clients.

That's no differernet then saying i want PorterStemmer on fieldA and 
SnowBall2Stemmer on fieldB.

The implication i got from Robert was that there was (or would soon be) 
expectations in Lucene-Java code that if one object was told to use Version.X 
it wold be assumed that every other object in the application was using 
Version.X.

To be that's the crux of the whole issue:  If that _is_ the expectation 
Lucene-Java has, then we _should_ have a single global config for 
luceneMatchVersion and not support per-object configuration.  If that _is not_ 
the expectation, then we _should not_ have a global luceneMatchVersion.

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Reply via email to