2011/5/12 Michael McCandless luc...@mikemccandless.com
2011/5/9 Nikola Tanković nikola.tanko...@gmail.com:
Introduction of an FieldType class that will hold all the extra
properties
now stored inside Field instance other than field value itself.
Seems like this is an easy first baby step -- leave current Field
class, but break out the type details into a separate class that can
be shared across Field instances.
Yes, I agree, this could be a good first step. Mike submitted a patch on
issue #2308. I think it's a solid base for this.
Make that Chris.
Ouch, sorry!
New FieldTypeAttribute interface will be added to handle extension
with
new
field properties inspired by IndexWriterConfig.
How would this work? What's an example compelling usage? An app
could use this for extensibility, and then make a matching codec that
picks up this attr? EG, say, maybe for marking that a field is a
primary key field and then codec could optimize accordingly...?
Well that could be very interesting scenario. It didn't rang a bell to me
for possible codec usage, but it seems very reasonable. Attributes
otherwise
don't make much sense, unless propertly used in custom codecs.
How will we ensure attribute and codec compatibility?
I'm just thinking we should have concrete reasons in mind for cutting
over to attributes here... I'd rather see a fixed, well thought out
concrete FieldType hierarchy first...
Yes, I couldn't agree more, and I also think Chris has some great ideas on
this field, given his work on Spatial indexing which tends to have use of
this additional attributes.
Refactoring and dividing of settings for term frequency and
positioning
can
also be done (LUCENE-2048)
Ahh great! So we can omit-positions-but-not-TF.
Discuss possible effects of completion of LUCENE-2310 on this project
This one is badly needed... but we should keep your project focused.
We'll tackle this one afterwards.
Good.
Adequate Factory class for easier configuration of new Field instances
together with manually added new FieldTypeAttributes
FieldType, once instantiated is read-only. Only fields value can be
changed.
OK.
Simple hierarchy of Field classes with core properties logically
predefaulted. E.g.:
NumberField,
Can't this just be our existing NumericField?
Yes, this is classic NumericField with changes proposed in LUCENE-2310.
Tim
Smith mentioned that Fieldable class should be kept for custom
implementations to reduce number of setters (for defaults).
Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it
should be implemented instead of Fieldable for custom implementations, so
both Fieldable and AbstractField are not needed anymore.
In my opinion Field shoud become abstract extended with others.
Another proposal: how about keeping only Field (with no hierarchy) and
move
hierarchy to FieldType, such as NumericFieldType, StringFieldType since
this
hierarchy concerns type information only?
I think hierarchy of both types and the value containers that hold
the corresponding values could make sense?
Hmm, I think we should get more opinions on this one also.
e.g. Usage:
FieldType number = new NumericFieldType();
Field price = new Field();
price.setType(number);
// but this is much cleaner...
Field price = new NumericField();
so maybe whe should have paraller XYZField with XYZFieldType...
Am I complicating?
StringField,
This would be like NOT_ANALYZED?
Yes, strings are often one word only. Or maybe we can name it NameField,
NonAnalyzedField or something.
StringField sounds good actually...
TextField,
This would be ANALYZED?
Yes.
OK.
What is the best way to break this into small baby steps?
Hopefully this becomes clearer as we iterate.
Well, we know the first step: moving type details into FieldType class.
Yes!
Somehow tying into this as well is a stronger decoupling of the
indexer from analysis/document. Ie, what indexer needs of a document
is very minimal -- just an iterable over indexed stored values.
Separately we can still provide a full featured Document class w/
add, get, remove, etc., but that's outside of the indexer.
I'll get back to this one after additional research. Maybe we should do
couple of more interactions, then I'll summarize the conclusions.
Mike
http://blog.mikemccandless.com
Nikola