Re: GSoC: LUCENE-2308: Separately specify a field's type

2011-05-13 Thread Nikola Tanković
2011/5/12 Michael McCandless luc...@mikemccandless.com

 2011/5/9 Nikola Tanković nikola.tanko...@gmail.com:

   Introduction of an FieldType class that will hold all the extra
   properties
   now stored inside Field instance other than field value itself.
 
  Seems like this is an easy first baby step -- leave current Field
  class, but break out the type details into a separate class that can
  be shared across Field instances.
 
  Yes, I agree, this could be a good first step. Mike submitted a patch on
  issue #2308. I think it's a solid base for this.

 Make that Chris.


Ouch, sorry!



   New FieldTypeAttribute interface will be added to handle extension
 with
   new
   field properties inspired by IndexWriterConfig.
 
  How would this work?  What's an example compelling usage?  An app
  could use this for extensibility, and then make a matching codec that
  picks up this attr?  EG, say, maybe for marking that a field is a
  primary key field and then codec could optimize accordingly...?
 
  Well that could be very interesting scenario. It didn't rang a bell to me
  for possible codec usage, but it seems very reasonable. Attributes
 otherwise
  don't make much sense, unless propertly used in custom codecs.
 
  How will we ensure attribute and codec compatibility?

 I'm just thinking we should have concrete reasons in mind for cutting
 over to attributes here... I'd rather see a fixed, well thought out
 concrete FieldType hierarchy first...


Yes, I couldn't agree more, and I also think Chris has some great ideas on
this field, given his work on Spatial indexing which tends to have use of
this additional attributes.



   Refactoring and dividing of settings for term frequency and
 positioning
   can
   also be done (LUCENE-2048)
 
  Ahh great!  So we can omit-positions-but-not-TF.
 
   Discuss possible effects of completion of LUCENE-2310 on this project
 
  This one is badly needed... but we should keep your project focused.
 
 
  We'll tackle this one afterwards.

 Good.

   Adequate Factory class for easier configuration of new Field instances
   together with manually added new FieldTypeAttributes
   FieldType, once instantiated is read-only. Only fields value can be
   changed.
 
  OK.
 
   Simple hierarchy of Field classes with core properties logically
   predefaulted. E.g.:
  
   NumberField,
 
  Can't this just be our existing NumericField?
 
  Yes, this is classic NumericField with changes proposed in LUCENE-2310.
 Tim
  Smith mentioned that Fieldable class should be kept for custom
  implementations to reduce number of setters (for defaults).
  Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it
  should be implemented instead of Fieldable for custom implementations, so
  both Fieldable and AbstractField are not needed anymore.
  In my opinion Field shoud become abstract extended with others.
  Another proposal: how about keeping only Field (with no hierarchy) and
 move
  hierarchy to FieldType, such as NumericFieldType, StringFieldType since
 this
  hierarchy concerns type information only?

 I think hierarchy of both types and the value containers that hold
 the corresponding values could make sense?


Hmm, I think we should get more opinions on this one also.



  e.g. Usage:
  FieldType number = new NumericFieldType();
  Field price = new Field();
  price.setType(number);
  // but this is much cleaner...
  Field price = new NumericField();
  so maybe whe should have paraller XYZField with XYZFieldType...
  Am I complicating?
 
   StringField,
 
  This would be like NOT_ANALYZED?
 
  Yes, strings are often one word only. Or maybe we can name it NameField,
  NonAnalyzedField or something.

 StringField sounds good actually...

   TextField,
 
  This would be ANALYZED?
 
  Yes.
 

 OK.

   What is the best way to break this into small baby steps?
 
  Hopefully this becomes clearer as we iterate.
 
  Well, we know the first step: moving type details into FieldType class.

 Yes!

 Somehow tying into this as well is a stronger decoupling of the
 indexer from analysis/document.  Ie, what indexer needs of a document
 is very minimal -- just an iterable over indexed  stored values.
 Separately we can still provide a full featured Document class w/
 add, get, remove, etc., but that's outside of the indexer.


I'll get back to this one after additional research. Maybe we should do
couple of more interactions, then I'll summarize the conclusions.



 Mike

 http://blog.mikemccandless.com


Nikola


GSoC: LUCENE-2308: Separately specify a field's type

2011-04-13 Thread Nikola Tanković
Hi all,

if everything goes well I'll be delighted to be part of this project this
summer together with my assigned mentor Mike. My task will be to introduce
new classes to Lucene core which will enable to separate Fields' Lucene
properties from it's value (
https://issues.apache.org/jira/browse/LUCENE-2308).

As you assume, this will largely impact lucene  solr, so we need to think
this through thoroughly.

Changes will include:

   - Introduction of an FieldType class that will hold all the extra
   properties now stored inside Field instance other than field value itself.
   - New FieldTypeAttribute interface will be added to handle extension with
   new field properties inspired by IndexWriterConfig.
   - Refactoring and dividing of settings for term frequency and positioning
   can also be done
(LUCENE-2048https://issues.apache.org/jira/browse/LUCENE-2048
   )
   - Discuss possible effects of completion of
LUCENE-2310https://issues.apache.org/jira/browse/LUCENE-2310on this
project
   - Adequate Factory class for easier configuration of new Field instances
   together with manually added new FieldTypeAttributes
   - FieldType, once instantiated is read-only. Only fields value can be
   changed.
   - Simple hierarchy of Field classes with core properties logically
   predefaulted. E.g.:
  - NumberField,
  - StringField,
  - TextField,
  - NonIndexedField,


My questions and issues:

   - Backward compatibility? Will this go to Lucene 3.0?
   - What is the best way to break this into small baby steps?


Kindly,
Nikola Tanković