[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805167#action_12805167 ]
Hoss Man commented on SOLR-1677: -------------------------------- bq. And here are the JIRA issues for stemming bugs, since you didnt take my hint to go and actually read them. sigh. I read both those issues when you filed them, and I agreed with your assessment that they are bugs we should fix -- if i had thought you were wrong i would have said so in the issue comments. But that doesn't change the fact that sometimes people depend on buggy behavior -- and sometimes those people depend on the buggy behavior without even realizing it. Bug fixes in a stemmer might make it more correct according to the stemmer algorithm specification, or the language semantics, but in some peculuar use cases an application might find the "correct" implementation less useful then the previous buggy version. This is one reason why things like CHANGES.txt are important: to draw attention to what has changed between two versions of a piece of software, so people can make informed opinions about what they should test in their own applications when they upgrade things under the covers. luceneMatchVersion should be no different. We should try to find a simple way to inform people "when you switch from luceneMatchVersion=X to luceneMatchVersion=Y here are the bug fixes you will get" so they know what to test to determine if they are adversely affected by that bug fix in some way (and find their own work around) bq. Perhaps you should come up with a better example than stemming, as you don't know what you are talking about. 1) It's true, I frequently don't know what i'm talking about ... this issue was a prime example, and i thank you, Uwe, and Miller for helping me realize that i was completely wrong in my understanding about the intended purpose of o.a.l.Version, and that a global setting for it in Solr makes total sense -- But that doesn't make my concerns about documenting the affects of that global setting any less valid. 2) Perhaps you should read the StopFilter example i already posted in my last comment... {quote} bq. Robert mentioned in an earlier comment that StopFilter's position increment behavior changes depending on the luceneMatchVersion -- what if an existing Solr 1.3 user notices a bug in some Tokenizer, and adds {{<luceneMatchVersion>3.0</luceneMatchVersion>}} to his schema.xml to fix it. Without clear documentation n _everything_ that is affected when doing that, he may not realize that StopFilter changed at all -- and even though the position incrememnt behavior may now be more correct, it might drasticly change the results he gets when using dismax with a particular qs or ps value. Hence my point that this becomes a serious documentation concern: finding a way to make it clear to users what they need to consider when modifying luceneMatchVersion. {quote} > Add support for o.a.lucene.util.Version for BaseTokenizerFactory and > BaseTokenFilterFactory > ------------------------------------------------------------------------------------------- > > Key: SOLR-1677 > URL: https://issues.apache.org/jira/browse/SOLR-1677 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Uwe Schindler > Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, > SOLR-1677.patch > > > Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards > compatibility with old indexes created using older versions of Lucene. The > most important example is StandardTokenizer, which changed its behaviour with > posIncr and incorrect host token types in 2.4 and also in 2.9. > In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with > much more Unicode support, almost every Tokenizer/TokenFilter needs this > Version parameter. In 2.9, the deprecated old ctors without Version take > LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. > This patch adds basic support for the Lucene Version property to the base > factories. Subclasses then can use the luceneMatchVersion decoded enum (in > 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently > contains a helper map to decode the version strings, but in 3.0 is can be > replaced by Version.valueOf(String), as the Version is a subclass of Java5 > enums. The default value is Version.LUCENE_24 (as this is the default for the > no-version ctors in Lucene). > This patch also removes unneeded conversions to CharArraySet from > StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed > to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.