[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800471#action_12800471
 ] 

Hoss Man commented on SOLR-1677:
--------------------------------

bq. And I also can't see anyone really spending time to aggressively ensure 
that the example schema etc is all up to date

I think you are vastly underestimating how much work is spent reviewing the 
example schema.xml prior to releases.  It would be trivial to search/replace 
luceneMatchVersion="X" with luceneMatchVersion="Y" anytime the "current" 
version of Version was updated in Lucene-Java

bq. the hardcoded 2.4 behavior is the action at a distance, because if i do not 
specify Version in my configuration file, then i get this very old behavior.

I don't follow you at all -- you have identified no action, or distance in your 
example.

When i say i'm worried about scary action at a distance, i'm talking about 
editing some thing A in a config file, and having it result in changed behavior 
(action) in things B, C and D that do not directly refer to A in any way 
(distance).  Further more these changes in behavior are silent (thus scary).

If I have {{<fieldType name="A"/>}} and much later in the config {{<field 
name="B" type="A"/>}} the editing A results in and action on B at a distance -- 
but this should not suprise me at all because B explicitly refrences A.

Having a global {{<luceneMatchVersion/>}} tag that affects the behavior of a 
variety of different things when it's modified leads to situations where people 
might change that value triggering changes in many components w/o a clear idea 
of what might have changed -- so they don't even know what things they should 
focus on testing for correctness after makign that change.

The existing {{<schema version="X"/>}} property also leads to action at a 
distance type situations -- but that is a lot less scary to me because at least 
with it there is a uniform set of changes to *all* schema objects between any 
two versions, so it's easy to document what cahnges when you go from 1.1 to 
1.2, or 1.2 to 1.3 ... but with luceneMatchVersion the potential changes are 
unique to every individual Class that cares about it.

{quote}
If this is really your concern, then i have an alternative i propose.

* No default anywhere, not even in the code
* Version is mandatory if the thing requires it
{quote}

This is something Uwe and i both discussed in previous comments...

https://issues.apache.org/jira/browse/SOLR-1677?focusedCommentId=12796872#action_12796872
https://issues.apache.org/jira/browse/SOLR-1677?focusedCommentId=12796937#action_12796937

...as i said: i'm fine with this idea in theory -- as a long term plan -- but 
there has to be a gradual migration process for people. ie: it can be required 
on certain objects in a future release, but for at least the next release it 
needs to be possible to not specify the luceneMatchVersion on all of these 
objects, and when people use them w/o specifying, they can log big fat warnings 
on initi that it is defaulting to 2.4, and they should set the property 
explicitly if that's what they want.

----

bq. I still do not want it in schema.xml, as Version is a global Lucene thing!

Uwe: I think you are missunderstanding the reason for a distinction between 
solrconfig.xml and schema.xml in Solr.  If (for hte sake of argument) 
luceneMatchVersion really should be a "global Lucene thing" then that is 
precisely why it should be in schema.xml.

schema.xml is for configuration that is inheriently part of the index, and must 
be consistent regardless of who/how/why that index is being used.  
solrconfig.xml is where settings are put that are specific to how a a 
particular instance of an index is being used.   If a setting is in 
solrconfig.xml, then it should to be possible for that setting to be completley 
different on differnet solr instances that use the exact same schema.xml -- 
even if they use cloned copies of the same index directory. (ie: master/slave 
distinctions in replication; peer slaves with distinct handler/cache settings 
to serve distinct use cases; etc...)

That's the reason why nothing that hangs off of IndexSchema is currently 
allowed to be SolrCoreAware, or get access to the SolrConfig object (and the 
SolrResourceLoader abstraction was created) ... nothing about the SolrCore 
"instance" should be allowed to influence the resulting index, because that 
index may later be used on a differnet instance with a different config.

As i mentioned before: solrconfig.xml can depend on schema.xml, but schema.xml 
can not depend on solrconfig.xml

So if a global luceneMatchVersion can affect the behavior of an analyzer or 
FieldType in a way that is "persisted" as part of hte index -- and other 
classes (like QueryParser in Robert's example) need to make sure to use the 
same luceneMatchVersion to behave correctly with that index, then that setting 
needs to be in the schema.xml so it is consistent no matter how/where that 
index and schema.xml file are used.

Does that make sense?

----

I'd still like to clarify this whole issue of wether "Lucene-Java", as a 
project, has an expectation that client applications will always use a 
consistent value for Version when constructing objects that interact with an 
index, as Robert alluded to in a previous comment...

bq. I don't think Version is intended so you can use X.Y on this part and Y.Z 
on this part

This was not my impression when Version was added -- but i freely admit I wasn' 
paying that much attention.

In Uwe's comment he implied (but didn't actually state) that he concurred with 
Robert...

bq. ...Version is a global Lucene thing...

*Iff* that expectation really is true in Lucnee-Java, and *iff* there really is 
an expectation that using multiple Version values withing Solr is likely to 
cause people problems as objects interact, then it seems to be that it be a 
very bad idea to offer to any sort of out of the box support for per object 
overriding of luceneMatchVersion in our solrconfig.xml/schema.xml.

i know, i know ... this is a complete 180 from my previous claim that we should 
_only_ have per object configuration -- a claim that i still stand behind if 
Lucene-Java "supports" applications using multiple values of Version, but if 
that is not considered "supported" and if changes are actively being made in 
Lucene-Java that explicitly assume consistent Version usage, then I'm not 
convinced it owuld be a good idea to enable people to tweak things in that way. 
 Anyone who understands the underlying Java code enough to appreciate the 
nuances of using A.B in one place and X.Y in another place can write their own 
Factory that looks at a luceneMatchVersion nit param -- the out of hte box ones 
should stick with the global setting.

BUT!!!!! ... those are Big "IFFs" ... 

* Uwe: do you concur with Robert?
* Are there any threads/docs about the expecations of Version 
homo/hetero-genousness in Lucene-Java?


> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to