[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796937#action_12796937 ]
Hoss Man commented on SOLR-1677: -------------------------------- bq. Version applies to all of lucene (even more than tokenstreams), so for Carl to imply that you don't need to reindex by bumping Version simply because you aren't using X or Y or Z, for that he should be renamed Oscar. Ok, fair enough ... i was supposing in that example that since i called it {{<luceneAnalyzerVersionDefault/>}} it was clearly specific to analysis objects in schema.xml and didn't affect any of the other things Version is used for (which would be specified in solrconfig.xml) bq. i guess he is probably using Windows 3.1 still too because he doesn't want to upgrade ever. No, he uses an OS where he can upgrade indivudal things individually with clear implications -- he sets {{luceneMatchVersion="2.9"}} on each and every {{<analyzer/>}}, {{<tokenizer/>}} and {{<filter/>}} that he declares in his schema so that he knows exactly what behavior is changing when he modifies any of them. bq. personally I don't want all users to be stuck with Version.LUCENE_24 forever. I still must be missing something? ... why would all users be stuck with Version.LUCENE_24 forever? I'm not advocating that we don't allow a way to specify Version, i'm saying that having a global value for it that affects things opaquely sounds dangerous -- we should certianly have a way for people to specify the Version they want on each of the objects that care, but it shouldn't be global. The "luceneMatchVersion" property that Uwe added to BaseTokenizerFactory and BaseTokenFilterFactory in his patch seems perfect to me, it's just the {{SolrCoreAware}} / {{core.getSolrConfig().luceneMatchVersion}} that i think is a bad idea. If we modify the <analyzer/> initialization to allow constructor args as Erik suggested (I'm pretty sure there's already code in Solr to do this, we just aren't using it for Analyzers) then we should be good to go for everything in schema.xml If anything declared in solrconfig.xml starts caring about Version (QParser, SolrIndexWriter, etc...) then likewise it should get a "luceneMatchVersion" init property as well. No one will ever be "stuck" with LUCENE_24, but they won't be surprised by behavior changes either. bq. If we do not have a default, all users will keep stuck with lucene 2.4, because they do not care about version (it is not required, because it defaults to 2.4 for BW compatibility). So lots of configs will never use the new unicode features of Lucene 3.1. I don't believe that. Almost every solr user on the planet starts with the example configs. if the example configs start specifying "luceneMatchVersion=2.9" on every analyzer and factory then people will care about Version just as much as they care about the stopwords.txt file that ships with solr -- that may be not at all, or it may be a lot, but it will be up to them, and it will be obvious to them, because it's right there in the declaration where they can see it, and easy for them to refrence and recognize that changing that value will affect things. bq. If you really do not want to have a default version in config (not schema, because it applies to all lucene components), then you should go the way like Lucene 3.0: Require a matchVersion for all components. I'm totally on board with that idea in the long run -- but there are ways to get there gradually that are back compatible with existing configs. Individual factories that care about luceneMatchVersion should absolutely start warning on startup that users should set luceneMatchVersion to get newer/better behavior may be available if it is unset (or doesn't match the current value of Version.LUCENE_CURRENT) and provide a URL for a wiki page somewhere where more detail is available. The Analyzer init code can do likewise if if sees an {{<analyzer class=.../>}} being inited w/ a constructor that takes in a "Version" which is using an "old" value. > Add support for o.a.lucene.util.Version for BaseTokenizerFactory and > BaseTokenFilterFactory > ------------------------------------------------------------------------------------------- > > Key: SOLR-1677 > URL: https://issues.apache.org/jira/browse/SOLR-1677 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Uwe Schindler > Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, > SOLR-1677.patch > > > Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards > compatibility with old indexes created using older versions of Lucene. The > most important example is StandardTokenizer, which changed its behaviour with > posIncr and incorrect host token types in 2.4 and also in 2.9. > In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with > much more Unicode support, almost every Tokenizer/TokenFilter needs this > Version parameter. In 2.9, the deprecated old ctors without Version take > LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. > This patch adds basic support for the Lucene Version property to the base > factories. Subclasses then can use the luceneMatchVersion decoded enum (in > 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently > contains a helper map to decode the version strings, but in 3.0 is can be > replaced by Version.valueOf(String), as the Version is a subclass of Java5 > enums. The default value is Version.LUCENE_24 (as this is the default for the > no-version ctors in Lucene). > This patch also removes unneeded conversions to CharArraySet from > StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed > to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.