[
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798916#action_12798916
]
Hoss Man commented on SOLR-1677:
--------------------------------
bq. I don't think Version is intended so you can use X.Y on this part and Y.Z
on this part and have any chance of anything working, for example it controls
position increments on stopfilter but also in queryparser, if you use wacky
combinations, things might not work.
How is that any different from letting users pass any Analyzer they want to the
QueryParser constructor? There's no guarantee that anything will every work if
you do something crazy (like uppercase all terms when indexing, and lowercase
all terms when searching) But lucene exposes that to the devolper and let's
them make the choice -- likewise Solr happily lets you configure a query
analyzer that's completely different from your index analyzer -- if that's what
you want, that's what you get: being able to set different Version params
should be no different. If the QueryParser you are using says that version=X.Y
will only work with StopFilter if it's version=X.Y as well that's fine -- but
maybe you've solved that problem a completely different way with a comppletley
alternate implementation of StopFilter (that doesn't care about version). The
user should be in control.
bq. sometimes things interact in ways we cannot detect automatically
which is why i think it's a bad idea to have a global default for this ...
there may be situations where people explicitly want different behavior in
different instances (ie: in this field i want the legacy 2.4 StopFilter
behavior, but in this field i want the current 2.9 stop filter behavior) and
having a default will mask the ability to do this, and make it easy to
inadvertantly break it.
bq. its my understanding that things like this are why Version was created in
the first place.
My understanding is castly different then yours ... All the discussions i
remember about it were along the lines of preventing Class proliferation --
that people didn't' like the idea of creating StandardAnalyzer2 just because
StandardAnalyzer had some behavior that was considered buggy but couldn't be
removed - so now there is a constructor arg instead, and static constants that
let you pick a fixed behavior, or a constant that let's you pick "current" no
matter what it is -- so applications that always want the "current recommended
behavior" can just upgrade a jar and get it.
But I don't remember any implication that it was expected that every object
would have the same Version settings as every other object -- if that was the
intention then shouldn't there be a standard interface for "Versionable" or
"VersionAware" objects so they can test compatibility with one another (ie:
QueryParser and Analyzers that might wrap StopFilter) ? ... or a "{{public
static void setCurrentOperatingVersion(Version)}} method in the Version class,
instead of letting each constructor take in an independent value?
----
FWIW: Even though I'm still convinced that having any sort of "global" default
value for luceneMatchVersion is a bad idea -- and i'm going to keep trying to
convince other people as well -- I want to make some comments about how i think
it should be implemented if we do wind up doing it (just in case i get hit by a
bus)
Making the Base*Factory analysis classses SolrCoreAware is really overkill for
this -- there was a real conscious choice not to let things declared in
schema.xml be SolrCoreAware, because it pulls back the curtain and exposes a
lot of plumbing related APIs in way that could make it hard to refactor away
SolrCore functionality later. The list of plugin types that can be made
SolrCoreAware is deliberately small, and confined to plugins that are already
exposed to the full SolrCore API at some other time in their life cycle --
being SolrCoreAware just gives them access to the core during initialization.
If there is really going to be one uber-default global "luceneMatchVersion"
then i think the place it makes the most sense to declare something like this
is in the schema.xml -- many differnet solrconfig.xml files might be used with
the same schema.xml, so if we're expecting that the "typical" behavior is to
set this once and have it just work it should propogate from the IndexSchema
object to the SolrCore and not vice-versa.
My suggestion for how to implement this would be...
# Add a new "luceneMatchVersion" attribute to the existing <schema/> tag.
# Add a new getLuceneMatchVersion() to the IndexSchema class ... SolrCore can
use this to get the default.
# When init()ing new objects, include the key=>value pair of
{{"luceneMatchVersion"=>schema.getLuceneMatchVersion()}} to the init method of
the object if it's not already an init param for that particular instance.
This would eliminate the need to make any of the Analysis Factories
SolrCoreAware (or even ResourceLoaderAware) just to know what the
luceneMatchVersion should be -- the Base*Factories could still contain a
{{protected Version luceneMatchVersion}} set by the base init() method that
subclasses could use as needed.
NOTE: This still doesn't doesn't solve the "Analyzers must have no-arg
constructors" part of hte issue -- but it doesn't make it worse. We can make
IndexSchema pass this.getLuceneMatchVersion() to any Analyzer with a single arg
"Version" constructor fairly easily. If/When we provide a more general
mechanism for passing constructor args to Analyzers, any Version params could
be defaulted just like with the factory init() methods.
> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
> Key: SOLR-1677
> URL: https://issues.apache.org/jira/browse/SOLR-1677
> Project: Solr
> Issue Type: Sub-task
> Components: Schema and Analysis
> Reporter: Uwe Schindler
> Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch,
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards
> compatibility with old indexes created using older versions of Lucene. The
> most important example is StandardTokenizer, which changed its behaviour with
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with
> much more Unicode support, almost every Tokenizer/TokenFilter needs this
> Version parameter. In 2.9, the deprecated old ctors without Version take
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently
> contains a helper map to decode the version strings, but in 3.0 is can be
> replaced by Version.valueOf(String), as the Version is a subclass of Java5
> enums. The default value is Version.LUCENE_24 (as this is the default for the
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed
> to match Lucene 3.0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.