[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Hoss Man (JIRA) Mon, 21 Dec 2009 07:20:41 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793228#action_12793228
 ]


Hoss Man commented on SOLR-1677:
--------------------------------

{quote}
* As a first hack the solrConfig schema has a new element <luceneMatchVersion> 
that contains a solr-wide default luceneMatchVersion value that is used as 
default for QueryParser, Analyzers if not specified different
* On the analyzer side, BaseTokenizerFactory and BaseTokenFilterFactory now 
extend SolrCoreAware (and I also allowed these classes to be SolrCoreAware) and 
get the SolrConfig.
{quote}

I'd really prefer that nothing like this make it into solr.

One: we've worked pretty hard to make sure that nothing in the analysis code is 
SolrCoreAware -- the goal was to try and keep the schema related code reusable 
w/o risk of factories adding tendrals that reach deep into the other solr code 
(it's onbly a matter of time until someone starts refactoring all of the schema 
related code out of Solr and into a Lucene contrib.

If we really want to add a new "global" setting for the default match version, 
it should be in schema.xml, as it pertains to the index itself and how to 
read/write to the index "properly" and not to the paticularities of how a 
particular solr installation might be using that data (schema.xml => the nature 
of the data; solrconfig.xml => the usage of the data)

Two: I really question the need for a configurable default across all analysis 
factories.  This seems like the type of thing that's going to be changed rarely 
if ever, and when it is changed each field will need to be considered very 
carefully to decide wether the "new" behavior is desired over hte "old"

I suspect the only time anyone is going to upgrade all factories at once is 
when we rev lucene jars and update the example configs -- in that case (and in 
the case of a user who is happy to blow away all of their data and take the 
newest, regardless of what it is, for every analyzer) a search and replace seem 
perfectly appropriate.


> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Reply via email to