At 5:12 PM -0400 7/1/08, Grant Ingersoll wrote:
>You make a good point about the countless hours debugging.  On the flip side, 
>one could ask the question as to whether the Solr schema is stable enough that 
>we should publish an XML Schema for it, thus helping alleviate some of the 
>pain.

That's a very good point: A lot of the internal code-based validation of the 
.xml configuration files could be obviated with parse-time validation, and 
using a well defined .xsd the schema itself would be 
user-extensible/restrictable.

>More below...

-- snip --

>This seems a bit clunky to me, syntax-wise, but the idea seems right.    I 
>suppose another option is that I could just extend the FieldType and have it 
>look for my own attributes.

Well for a specific field type there's already the init(...) method designed to 
allow subclasses to parse and remove attributes before the bad-argument test, 
e.g. as done in CompressibleField.

Where this won't work without a user-extensible dictionary is if one wants a 
new attribute across all field types.  I did, and so had to modify FieldType 
itself, which was a bit clunky in a different way.

Either way, by adding a getAttribute to FieldType such as I described, it's 
only necessary for init (in FieldType or a subclass) to remove the argument 
from initArgs, so the attribute can be retrieved and parsed on demand rather 
than creating an instance variable to store it.

But stepping back, is language-dependent analysis really the goal? As Erik 
Hatcher notes, there is this complication:

>Further on this.... if metadata is added to a field type, it needs to somehow 
>make it down to the tokenizer and filter factories to use if desired.  
>Language, for example, could be attached to a field type, but then could be 
>leveraged by a stop word filter to pick up a language-specific stop word file.

And perhaps what one perhaps really needs is not a static attribute added to 
the field type, but one that can vary across each document, e.g. via a 
different field's value or a payload affixed to the tokens.  I remember a 
thread on payloads being used for that purpose (and I see you contributed to 
the Lucene-side design of payloads), but I don't recall whether it converged on 
a usable Solr-side implementation.

>I'll have to think some more about it...

Me too... the use-cases for the schema.xml-driven extension I proposed may be 
so rare that it's not at all worth considering.

- J.J.

Reply via email to