Jack Krupansky created SOLR-4859:
------------------------------------

             Summary: MinFieldValueUpdateProcessorFactory and 
MaxFieldValueUpdateProcessorFactory don't do numeric comparison for numeric 
fields
                 Key: SOLR-4859
                 URL: https://issues.apache.org/jira/browse/SOLR-4859
             Project: Solr
          Issue Type: Bug
          Components: update
    Affects Versions: 4.3
            Reporter: Jack Krupansky


MinFieldValueUpdateProcessorFactory and MaxFieldValueUpdateProcessorFactory are 
advertised as supporting numeric comparisons, but this doesn't work - only 
string comparison is available - and doesn't seem possible, although the unit 
tests show it is possible at the unit test level.

The problem is that numeric processing is dependent on the SolrInputDocument 
containing a list of numeric values, but at least with both the current XML and 
JSON loaders, only string values can be loaded.

Test scenario.

1. Use Solr 4.3 example.
2. Add following update processor chain to solrconfig:

{code}
  <updateRequestProcessorChain name="max-only-num">
    <processor class="solr.MaxFieldValueUpdateProcessorFactory">
      <str name="fieldName">sizes_i</str>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>
{code}

3. Perform this update request:

{code}
  curl 
"http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"; \
  -H 'Content-type:application/json' -d '
  [{"id": "doc-1",
    "title_s": "Hello World",
    "sizes_i": [200, 999, 101, 199, 1000]}]'
{code}

Note that the values are JSON integer values.

4. Perform this query:

{code}
curl "http://localhost:8983/solr/select/?q=*:*&indent=true&wt=json";
{code}

Shows this result:

{code}
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"doc-1",
        "title_s":"Hello World",
        "sizes_i":999,
        "_version_":1436094187405574144}]
  }}
{code}

sizes_i should be 1000, not 999.

Alternative update tests:

{code}
  curl 
"http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"; \
  -H 'Content-type:application/json' -d '
  [{"id": "doc-1",
    "title_s": "Hello World",
    "sizes_i": 200,
    "sizes_i": 999,
    "sizes_i": 101,
    "sizes_i": 199,
    "sizes_i": 1000}]'
{code}

and

{code}
  curl 
"http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"; \
  -H 'Content-type:application/xml' -d '
  <add>
    <doc>
      <field name="id">doc-1</field>
      <field name="title_s">Hello World</field>
      <field name="sizes_i">42</field>
      <field name="sizes_i">128</field>
      <field name="sizes_i">-3</field>
    </doc>
  </add>'
{code}

In XML, of course, there is no way for the input values to be anything other 
than strings ("text".)

The JSON loader does parse the values with their type, but immediately converts 
the values to strings:

{code}
    private Object parseSingleFieldValue(int ev) throws IOException {
      switch (ev) {
        case JSONParser.STRING:
          return parser.getString();
        case JSONParser.LONG:
        case JSONParser.NUMBER:
        case JSONParser.BIGNUMBER:
          return parser.getNumberChars().toString();
        case JSONParser.BOOLEAN:
          return Boolean.toString(parser.getBoolean()); // for legacy reasons, 
single values s are expected to be strings
        case JSONParser.NULL:
          parser.getNull();
          return null;
        case JSONParser.ARRAY_START:
          return parseArrayFieldValue(ev);
        default:
          throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error 
parsing JSON field value. Unexpected "+JSONParser.getEventString(ev) );
      }
    }

    private List<Object> parseArrayFieldValue(int ev) throws IOException {
      assert ev == JSONParser.ARRAY_START;
  
      ArrayList lst = new ArrayList(2);
      for (;;) {
        ev = parser.nextEvent();
        if (ev == JSONParser.ARRAY_END) {
          return lst;
        }
        Object val = parseSingleFieldValue(ev);
        lst.add(val);
      }
    }
  }
{code}

Originally, I had hoped/expected that the schema type of the field would 
determine the type of min/max comparison - integer for a *_i field in my case.

The comparison logic for min:

{code}
public final class MinFieldValueUpdateProcessorFactory extends 
FieldValueSubsetUpdateProcessorFactory {

  @Override
  @SuppressWarnings("unchecked")
  public Collection pickSubset(Collection values) {
    Collection result = values;
    try {
      result = Collections.singletonList
        (Collections.min(values));
    } catch (ClassCastException e) {
      throw new SolrException
        (BAD_REQUEST, 
         "Field values are not mutually comparable: " + e.getMessage(), e);
    }
    return result;
  }
{code}

Which seems to be completely dependent only on the type of the input values, 
not the field type itself.

It would be nice to at least have a comparison override: compareNumeric="true".


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to