[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796087#action_12796087 ]
Hoss Man commented on SOLR-1677: -------------------------------- bq. The problem is the default value. If you leave out the version parameter instance-wise, you will get 2.4. And because of that all solr users will get stuck with that version and will never upgrade (because they leave the default and do not specify a different value). That feels like a missleading statement ... the "Version" property on these objects is really more about getting the "recommended" behavior as of a particular version of Lucene ... saying that users will be "stuck with that version" is like saying users will be "stuck with StandardAnalyzer" instead of getting "NewHotnessAnalyzer" because they have to edit their config to use the newer/better analyzer -- Lucene-Java has opted to use a Version property on existing classes instead of adding new classes, but it's still conceptually the same thing: they get the bahavior they've always gotten, unless they change their config to get something different. Besides which: 99.9% of Solr users copy the example config when they first start using Solr: we can set a "version" property on every Analyzer/Factory used in the example schema.xml and update them all when we upgrade the Lucene jars just as easily as we can update a single "global" value (it's a search+replaceAll instead of a search+replace) bq. Why are you so against a default value? My concern is that it introduces action at a distance -- and not in a good way. Here's the scenerio that seems garunteed to happen quite a bit if we add some new {{<luceneAnalyzerVersionDefault/>}} syntax to schema.xml... {panel} {{<luceneAnalyzerVersionDefault>2.9</luceneAnalyzerVersionDefault>}} is added to the example schema.xml, and users start using it as a result of copying/modifying the example configs. Time passes, new bugs are fixed, and the example configs evolve to contain {{<luceneAnalyzerVersionDefault>3.4</luceneAnalyzerVersionDefault>}} A little while after that, User Bob emails solr-user with a question like... {quote} Hey, I'm using FooTokenFilterFactory and i noticed that at query time i see behaviorX when it really seems like i should see BehaviorY {quote} User Carl helpfully replies... {quote} That was identified as a bug with FooTokenFilter that was fixed in Lucene 3.1, but the default behavior was left as is for backcompatibility. If you change your {{<luceneAnalyzerVersionDefault/>}} value to 3.1 (or 3.2) you'll get the newer/better behavior -- but if you used FooTokenFilterFactory in an _index_ analyzer you'll need to reindex. {quote} Bob makes the change to 3.2 that Carl recommended, and is happy to see now his queries work. He only uses FooTokenFilterFactory at _query_ time, so he doens't bother to reindex, and every thing seems fine. What Bob doesn't realize (and what Carl wasn't aware of) is that elsewhere in hi's schema.xml file, Bob is also using the YakTokenizerFactory on a differnet field (yakField), and the behavior of the YakTokenizer changed in Lucene 3.0. Now _some_ documents/queries that use yakField are failing -- and *failing silently.* {panel} Things just get a lot simpler when all of the configuration for an Analyzer, TokenizerFactory, or Tokenizer are all explict in their declaration -- indirect initialization is fine, as long as it's obvious. Ie: <field/> declarations referencing fieldTypes by name -- It's easy to fuck up a bunch of fields by making a single change to one fieldType, but at least you can grep for the name of the fieldType to see all the fields you are affecting. Even if "Carl" knows/remembers to warn "Bob" that changing {{<luceneAnalyzerVersionDefault/>}} might change/break other things in his schema.xml the situation doesn't get much better: Uless Bob (or Carl) skim the code for every Analyzer, Tokenizer, and TokenFilter used in Bob's schema, they can't be sure what might get affected by making a small increase to the "global" luceneAnalyzerVersion setting ... which means the only safe thing for Bob to do is to set the property individual on the one place he really wants to make the change. So why have the "global" in the first place? It really just seems like more trouble then it's worth. > Add support for o.a.lucene.util.Version for BaseTokenizerFactory and > BaseTokenFilterFactory > ------------------------------------------------------------------------------------------- > > Key: SOLR-1677 > URL: https://issues.apache.org/jira/browse/SOLR-1677 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Uwe Schindler > Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, > SOLR-1677.patch > > > Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards > compatibility with old indexes created using older versions of Lucene. The > most important example is StandardTokenizer, which changed its behaviour with > posIncr and incorrect host token types in 2.4 and also in 2.9. > In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with > much more Unicode support, almost every Tokenizer/TokenFilter needs this > Version parameter. In 2.9, the deprecated old ctors without Version take > LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. > This patch adds basic support for the Lucene Version property to the base > factories. Subclasses then can use the luceneMatchVersion decoded enum (in > 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently > contains a helper map to decode the version strings, but in 3.0 is can be > replaced by Version.valueOf(String), as the Version is a subclass of Java5 > enums. The default value is Version.LUCENE_24 (as this is the default for the > no-version ctors in Lucene). > This patch also removes unneeded conversions to CharArraySet from > StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed > to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.