[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802612#action_12802612 ]
Robert Muir commented on SOLR-1677: ----------------------------------- Hi Hoss Man, I think I am slightly offended with some of your statements about 'subjective opinion of the Lucene Community' and 'they should do relevancy testing which use some language-specific stemmer whose behavior changed in a small but significant way'. I've personally restricted my contributions of language support to those I have either personally relevance tested, or developing from published relevance results. These results are all listed on each JIRA ticket (MAP values and such). I can give you a list of all these issues if you want. As far as changing stemmers, we have never done this. The only "stemmer changing" I have proposed is fixing bugs, where I have taken the snowball test data and found either bugs in snowball or duplicate implementations we have in our own source tree. And to "fix the bugs" I have only proposed that we simply use snowball itself rather than some duplicate, buggy hand-coded implementatation. So I'm a little confused about what you are referring to... some theoretical situation? > Add support for o.a.lucene.util.Version for BaseTokenizerFactory and > BaseTokenFilterFactory > ------------------------------------------------------------------------------------------- > > Key: SOLR-1677 > URL: https://issues.apache.org/jira/browse/SOLR-1677 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Uwe Schindler > Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, > SOLR-1677.patch > > > Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards > compatibility with old indexes created using older versions of Lucene. The > most important example is StandardTokenizer, which changed its behaviour with > posIncr and incorrect host token types in 2.4 and also in 2.9. > In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with > much more Unicode support, almost every Tokenizer/TokenFilter needs this > Version parameter. In 2.9, the deprecated old ctors without Version take > LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. > This patch adds basic support for the Lucene Version property to the base > factories. Subclasses then can use the luceneMatchVersion decoded enum (in > 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently > contains a helper map to decode the version strings, but in 3.0 is can be > replaced by Version.valueOf(String), as the Version is a subclass of Java5 > enums. The default value is Version.LUCENE_24 (as this is the default for the > no-version ctors in Lucene). > This patch also removes unneeded conversions to CharArraySet from > StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed > to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.