[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798916#action_12798916 ]
Hoss Man commented on SOLR-1677: -------------------------------- bq. I don't think Version is intended so you can use X.Y on this part and Y.Z on this part and have any chance of anything working, for example it controls position increments on stopfilter but also in queryparser, if you use wacky combinations, things might not work. How is that any different from letting users pass any Analyzer they want to the QueryParser constructor? There's no guarantee that anything will every work if you do something crazy (like uppercase all terms when indexing, and lowercase all terms when searching) But lucene exposes that to the devolper and let's them make the choice -- likewise Solr happily lets you configure a query analyzer that's completely different from your index analyzer -- if that's what you want, that's what you get: being able to set different Version params should be no different. If the QueryParser you are using says that version=X.Y will only work with StopFilter if it's version=X.Y as well that's fine -- but maybe you've solved that problem a completely different way with a comppletley alternate implementation of StopFilter (that doesn't care about version). The user should be in control. bq. sometimes things interact in ways we cannot detect automatically which is why i think it's a bad idea to have a global default for this ... there may be situations where people explicitly want different behavior in different instances (ie: in this field i want the legacy 2.4 StopFilter behavior, but in this field i want the current 2.9 stop filter behavior) and having a default will mask the ability to do this, and make it easy to inadvertantly break it. bq. its my understanding that things like this are why Version was created in the first place. My understanding is castly different then yours ... All the discussions i remember about it were along the lines of preventing Class proliferation -- that people didn't' like the idea of creating StandardAnalyzer2 just because StandardAnalyzer had some behavior that was considered buggy but couldn't be removed - so now there is a constructor arg instead, and static constants that let you pick a fixed behavior, or a constant that let's you pick "current" no matter what it is -- so applications that always want the "current recommended behavior" can just upgrade a jar and get it. But I don't remember any implication that it was expected that every object would have the same Version settings as every other object -- if that was the intention then shouldn't there be a standard interface for "Versionable" or "VersionAware" objects so they can test compatibility with one another (ie: QueryParser and Analyzers that might wrap StopFilter) ? ... or a "{{public static void setCurrentOperatingVersion(Version)}} method in the Version class, instead of letting each constructor take in an independent value? ---- FWIW: Even though I'm still convinced that having any sort of "global" default value for luceneMatchVersion is a bad idea -- and i'm going to keep trying to convince other people as well -- I want to make some comments about how i think it should be implemented if we do wind up doing it (just in case i get hit by a bus) Making the Base*Factory analysis classses SolrCoreAware is really overkill for this -- there was a real conscious choice not to let things declared in schema.xml be SolrCoreAware, because it pulls back the curtain and exposes a lot of plumbing related APIs in way that could make it hard to refactor away SolrCore functionality later. The list of plugin types that can be made SolrCoreAware is deliberately small, and confined to plugins that are already exposed to the full SolrCore API at some other time in their life cycle -- being SolrCoreAware just gives them access to the core during initialization. If there is really going to be one uber-default global "luceneMatchVersion" then i think the place it makes the most sense to declare something like this is in the schema.xml -- many differnet solrconfig.xml files might be used with the same schema.xml, so if we're expecting that the "typical" behavior is to set this once and have it just work it should propogate from the IndexSchema object to the SolrCore and not vice-versa. My suggestion for how to implement this would be... # Add a new "luceneMatchVersion" attribute to the existing <schema/> tag. # Add a new getLuceneMatchVersion() to the IndexSchema class ... SolrCore can use this to get the default. # When init()ing new objects, include the key=>value pair of {{"luceneMatchVersion"=>schema.getLuceneMatchVersion()}} to the init method of the object if it's not already an init param for that particular instance. This would eliminate the need to make any of the Analysis Factories SolrCoreAware (or even ResourceLoaderAware) just to know what the luceneMatchVersion should be -- the Base*Factories could still contain a {{protected Version luceneMatchVersion}} set by the base init() method that subclasses could use as needed. NOTE: This still doesn't doesn't solve the "Analyzers must have no-arg constructors" part of hte issue -- but it doesn't make it worse. We can make IndexSchema pass this.getLuceneMatchVersion() to any Analyzer with a single arg "Version" constructor fairly easily. If/When we provide a more general mechanism for passing constructor args to Analyzers, any Version params could be defaulted just like with the factory init() methods. > Add support for o.a.lucene.util.Version for BaseTokenizerFactory and > BaseTokenFilterFactory > ------------------------------------------------------------------------------------------- > > Key: SOLR-1677 > URL: https://issues.apache.org/jira/browse/SOLR-1677 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Uwe Schindler > Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, > SOLR-1677.patch > > > Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards > compatibility with old indexes created using older versions of Lucene. The > most important example is StandardTokenizer, which changed its behaviour with > posIncr and incorrect host token types in 2.4 and also in 2.9. > In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with > much more Unicode support, almost every Tokenizer/TokenFilter needs this > Version parameter. In 2.9, the deprecated old ctors without Version take > LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. > This patch adds basic support for the Lucene Version property to the base > factories. Subclasses then can use the luceneMatchVersion decoded enum (in > 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently > contains a helper map to decode the version strings, but in 3.0 is can be > replaced by Version.valueOf(String), as the Version is a subclass of Java5 > enums. The default value is Version.LUCENE_24 (as this is the default for the > no-version ctors in Lucene). > This patch also removes unneeded conversions to CharArraySet from > StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed > to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.