[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796937#action_12796937
 ] 

Hoss Man commented on SOLR-1677:
--------------------------------



bq. Version applies to all of lucene (even more than tokenstreams), so for Carl 
to imply that you don't need to reindex by bumping Version simply because you 
aren't using X or Y or Z, for that he should be renamed Oscar.

Ok, fair enough ... i was supposing in that example that since i called it 
{{<luceneAnalyzerVersionDefault/>}} it was clearly specific to analysis objects 
in schema.xml and didn't affect any of the other things Version is used for 
(which would be specified in solrconfig.xml)

bq. i guess he is probably using Windows 3.1 still too because he doesn't want 
to upgrade ever.

No, he uses an OS where he can upgrade indivudal things individually with clear 
implications -- he sets {{luceneMatchVersion="2.9"}} on each and every 
{{<analyzer/>}}, {{<tokenizer/>}} and {{<filter/>}} that he declares in his 
schema so that he knows exactly what behavior is changing when he modifies any 
of them.

bq. personally I don't want all users to be stuck with Version.LUCENE_24 
forever. 

I still must be missing something? ... why would all users be stuck with 
Version.LUCENE_24 forever?   

I'm not advocating that we don't allow a way to specify Version, i'm saying 
that having a global value for it that affects things opaquely sounds dangerous 
-- we should certianly have a way for people to specify the Version they want 
on each of the objects that care, but it shouldn't be global.  The 
"luceneMatchVersion" property that Uwe added to BaseTokenizerFactory and 
BaseTokenFilterFactory in his patch seems perfect to me, it's just the 
{{SolrCoreAware}} / {{core.getSolrConfig().luceneMatchVersion}} that i think is 
a bad idea.

If we modify the <analyzer/> initialization to allow constructor args as Erik 
suggested (I'm pretty sure there's already code in Solr to do this, we just 
aren't using it for Analyzers) then we should be good to go for everything in 
schema.xml

If anything declared in solrconfig.xml starts caring about Version (QParser, 
SolrIndexWriter, etc...) then likewise it should get a "luceneMatchVersion" 
init property as well.  No one will ever be "stuck" with LUCENE_24, but they 
won't be surprised by behavior changes either.

bq. If we do not have a default, all users will keep stuck with lucene 2.4, 
because they do not care about version (it is not required, because it defaults 
to 2.4 for BW compatibility). So lots of configs will never use the new unicode 
features of Lucene 3.1.

I don't believe that.  Almost every solr user on the planet starts with the 
example configs.  if the example configs start specifying 
"luceneMatchVersion=2.9" on every analyzer and factory then people will care 
about Version just as much as they care about the stopwords.txt file that ships 
with solr -- that may be not at all, or it may be a lot, but it will be up to 
them, and it will be obvious to them, because it's right there in the 
declaration where they can see it, and easy for them to refrence and recognize 
that changing that value will affect things.

bq. If you really do not want to have a default version in config (not schema, 
because it applies to all lucene components), then you should go the way like 
Lucene 3.0: Require a matchVersion for all components.

I'm totally on board with that idea in the long run -- but there are ways to 
get there gradually that are back compatible with existing configs.  Individual 
factories that care about luceneMatchVersion should absolutely start warning on 
startup that users should set luceneMatchVersion to get newer/better behavior 
may be available if it is unset (or doesn't match the current value of 
Version.LUCENE_CURRENT) and provide a URL for a wiki page somewhere where more 
detail is available.  The Analyzer init code can do likewise if if sees an 
{{<analyzer class=.../>}} being inited w/ a constructor that takes in a 
"Version" which is using an "old" value.


> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to