Hi Calvin - thanks for opening the 
https://issues.apache.org/jira/browse/SOLR-17120 issue for this!

From: users@solr.apache.org At: 01/12/24 21:00:34 UTCTo:  users@solr.apache.org
Subject: Re: NullPointerException in UpdateLog.applyOlderUpdates under solr 8&9 
involving partial updates and high update load

Looking into the code referenced in the stack trace in my previous email,
it was the following code in UpdateLog.java:

  /** Add all fields from olderDoc into newerDoc if not already present in
newerDoc */
  private void applyOlderUpdates(
      SolrDocumentBase<?, ?> newerDoc, SolrInputDocument olderDoc,
Set<String> mergeFields) {
    for (String fieldName : olderDoc.getFieldNames()) {
      // if the newerDoc has this field, then this field from olderDoc can
be ignored
      if (!newerDoc.containsKey(fieldName)
          && (mergeFields == null || mergeFields.contains(fieldName))) {
        for (Object val : olderDoc.getFieldValues(fieldName)) {
          newerDoc.addField(fieldName, val);
        }
      }
    }
  }

The `NullPointerException` is being thrown by the inner `for` statement
because the return value of `olderDoc.getFieldValues(fieldName)` is null. I
added some print statements and verified that when I saw the error it was
due to a `NullPointerException` for a field named `camera_unit` and there
had been a partial update of the same document within the last second that
included `"camera_unit": {"set": null}`. There was a lot of other activity
in between those two though, and I wasn't able to reproduce it with simple
documents that made the same updates, so there's more to it than just a
previous partial update of the same doc that sets a field to `null`.

I saw at least one other place in the (non-test) code where it's assumed
that the return value of `getFieldValues` can't be `null` and will throw an
exception if it is. That was the following code in
`DocumentAnalysisRequestHandler.java`:

Collection<Object> fieldValues = document.getFieldValues(name);
NamedList<NamedList<? extends Object>> indexTokens = new
SimpleOrderedMap<>();
for (Object fieldValue : fieldValues) {
  indexTokens.add(
    tring.valueOf(fieldValue), analyzeValue(fieldValue.toString(),
analysisContext));
}

but in other places where I looked at how that method was used, there was
handling of the `null` return value case.

I'm not sure what the correct logic is in the `UpdateLog.java` case, but I
updated it to the following to test:

  /** Add all fields from olderDoc into newerDoc if not already present in
newerDoc */
  private void applyOlderUpdates(
      SolrDocumentBase<?, ?> newerDoc, SolrInputDocument olderDoc,
Set<String> mergeFields) {
    for (String fieldName : olderDoc.getFieldNames()) {
      // if the newerDoc has this field, then this field from olderDoc can
be ignored
      if (!newerDoc.containsKey(fieldName)
          && (mergeFields == null || mergeFields.contains(fieldName))) {
        Collection<Object> values = olderDoc.getFieldValues(fieldName);
        if (values == null) {
            newerDoc.addField(fieldName, null);
        } else {
            for (Object val : values) {
              newerDoc.addField(fieldName, val);
            }
        }
      }
    }
  }

Does that seem like the right decision, to set the field to `null` in the
`newerDoc` in this case?

Thanks for your time,
Calvin


Reply via email to