Hi Calvin - thanks for opening the https://issues.apache.org/jira/browse/SOLR-17120 issue for this!
From: users@solr.apache.org At: 01/12/24 21:00:34 UTCTo: users@solr.apache.org Subject: Re: NullPointerException in UpdateLog.applyOlderUpdates under solr 8&9 involving partial updates and high update load Looking into the code referenced in the stack trace in my previous email, it was the following code in UpdateLog.java: /** Add all fields from olderDoc into newerDoc if not already present in newerDoc */ private void applyOlderUpdates( SolrDocumentBase<?, ?> newerDoc, SolrInputDocument olderDoc, Set<String> mergeFields) { for (String fieldName : olderDoc.getFieldNames()) { // if the newerDoc has this field, then this field from olderDoc can be ignored if (!newerDoc.containsKey(fieldName) && (mergeFields == null || mergeFields.contains(fieldName))) { for (Object val : olderDoc.getFieldValues(fieldName)) { newerDoc.addField(fieldName, val); } } } } The `NullPointerException` is being thrown by the inner `for` statement because the return value of `olderDoc.getFieldValues(fieldName)` is null. I added some print statements and verified that when I saw the error it was due to a `NullPointerException` for a field named `camera_unit` and there had been a partial update of the same document within the last second that included `"camera_unit": {"set": null}`. There was a lot of other activity in between those two though, and I wasn't able to reproduce it with simple documents that made the same updates, so there's more to it than just a previous partial update of the same doc that sets a field to `null`. I saw at least one other place in the (non-test) code where it's assumed that the return value of `getFieldValues` can't be `null` and will throw an exception if it is. That was the following code in `DocumentAnalysisRequestHandler.java`: Collection<Object> fieldValues = document.getFieldValues(name); NamedList<NamedList<? extends Object>> indexTokens = new SimpleOrderedMap<>(); for (Object fieldValue : fieldValues) { indexTokens.add( tring.valueOf(fieldValue), analyzeValue(fieldValue.toString(), analysisContext)); } but in other places where I looked at how that method was used, there was handling of the `null` return value case. I'm not sure what the correct logic is in the `UpdateLog.java` case, but I updated it to the following to test: /** Add all fields from olderDoc into newerDoc if not already present in newerDoc */ private void applyOlderUpdates( SolrDocumentBase<?, ?> newerDoc, SolrInputDocument olderDoc, Set<String> mergeFields) { for (String fieldName : olderDoc.getFieldNames()) { // if the newerDoc has this field, then this field from olderDoc can be ignored if (!newerDoc.containsKey(fieldName) && (mergeFields == null || mergeFields.contains(fieldName))) { Collection<Object> values = olderDoc.getFieldValues(fieldName); if (values == null) { newerDoc.addField(fieldName, null); } else { for (Object val : values) { newerDoc.addField(fieldName, val); } } } } } Does that seem like the right decision, to set the field to `null` in the `newerDoc` in this case? Thanks for your time, Calvin