Re: svn commit: r547493 - in /lucene/solr/trunk: ./ src/java/org/apache/solr/common/ src/java/org/apache/solr/schema/ src/java/org/apache/solr/update/ src/test/org/apache/solr/common/

Ryan McKinley Sun, 17 Jun 2007 10:43:25 -0700


Perhaps:
class DocumentUtils {
   static Document toDocument( SolrInputDocument, Schema );
   static SolrDocument loadFields( SolrDocument, Document, Schema,
boolean skipCopyFields)
}


thoughts?


Would request handlers (that are update handlers) call these directly
before passing a Lucene Document to the UpdateHander, or should
UpdateHandler take a SolrDocument?


In SOLR-139 (updateable/modifiable) documents, I add a new command:

public class IndexDocumentCommand
{
  public enum MODE {

OVERWRITE, // overwrite existing values with the new one. (thedefault behavior)

    APPEND,    // add the new value to existing value
    DISTINCT,  // same as APPEND, but make sure each value is distinct
    INCREMENT  // increment existing value.  Must be a number!
  };

  public boolean overwrite = true;
  public SolrInputDocument doc;
  public Map<String,MODE> mode; // What to do for each field.

public int commitMaxTime = -1; // make sure the document is committedwithin this much time

The MODE enum should be in common (so solrj can link to it). For thedefault behavior (overwrite the whole document) the mode Map is not created.

Perhaps it makes sense to add SOLR-139 so you all can easily see why Imade some of the DocumentBuilder choices. Since XmlUpdateHandler andStaxUpdateHandler are living in parallel now, it would not affect anyonewho does not use the StaxUpdateHandler...


If Object[] is substantially better that could work.  I don't have a
real sense of the performance hit for making an ArrayList.  It could be
initialized as new ArrayList(1)?


ArrayList(1) would help with GC issues at least.


I'll do that.

> SolrInputDocument :
> - it seems like it should convey extra state (currently field and
> document boosts), but not ehavior.  keepDuplicates logic seems like
> extra overhead that will rarely be useful,

Maybe I'm ahead of myself, but this functionality is essential for:
SOLR-139 and SOLR-103.  For modifiable docs, it is MUCH easier for
clients to add tags if the server can worry about duplicates.  For some
database layouts, it is only possible to get all fields for a document
represented as colums/rows if you repeat many fields.


Should keepDuplicates logic be reversed to removeDuplicates?

This code seems to be a problem since if you set a value forkeepDuplicates for

any field, it switches the default of all others, right?
   if( _keepDuplicates == null || Boolean.TRUE ==
_keepDuplicates.get( name )) {

removeDuplicates sounds better. I'll change it and make sure the logicis ok (with tests that touch both cases)


this was totally by design.  The fact you *can* set boosts on field
values in a Document that aren't used is really strange.  What happens
if I set: addField( 'f1', 'v1', 10 ), addField( 'f1', 'v2', 1 )?


The way the Lucene code is written, the boost for a given field is the
product of the document boost and *all* boosts on values of that
field.
See DocumentWriter:
         fieldBoosts[fieldNumber] *= field.getBoost();
Not that that makes the most sense though...

So what should we do when someone sends in XML with more than one
field value with a boost on it?

Sounds like we should keep the accumulated product of multi-valued fieldboosts and set that on the first value in the multi-value field. If Iunderstand correctly, this would be consistent with existing behavior.


ryan

Re: svn commit: r547493 - in /lucene/solr/trunk: ./ src/java/org/apache/solr/common/ src/java/org/apache/solr/schema/ src/java/org/apache/solr/update/ src/test/org/apache/solr/common/

Reply via email to