I stumbled upon this too yesterday and created SOLR-13249. In local unit tests we get String but in distributed unit tests we get a ByteArrayUtf8CharSequence instead.
https://issues.apache.org/jira/browse/SOLR-13249 -----Original message----- > From:Andreas Hubold <andreas.hub...@coremedia.com> > Sent: Friday 15th February 2019 10:10 > To: solr-user@lucene.apache.org > Subject: Re: Solr 7.7 UpdateRequestProcessor broken > > Hi, > > thank you, Jan. > > I've created https://issues.apache.org/jira/browse/SOLR-13255. Maybe you > want to add your patch to that ticket. I did not have time to test it yet. > > So I guess, all SolrJ usages have to handle CharSequence now for string > fields? Well, this really sounds like a major breaking change for custom > code. > > Thanks, > Andreas > > Jan Høydahl schrieb am 15.02.19 um 09:14: > > Hi > > > > This is a subtle change which is not detected by our langid unit tests, as > > I think it only happens when document is trasferred with SolrJ and Javabin > > codec. > > Was introduced in https://issues.apache.org/jira/browse/SOLR-12992 > > > > Please create a new JIRA issue for langid so we can try to fix it in 7.7.1 > > > > Other SolrInputDocument users assuming String type for strings in > > SolrInputDocument would also be vulnerable. > > > > I have a patch ready that you could test: > > > > Index: > > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java > > IDEA additional info: > > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > > <+>UTF-8 > > =================================================================== > > --- > > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java > > (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5) > > +++ > > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java > > (date 1550217809000) > > @@ -60,12 +60,12 @@ > > Collection<Object> fieldValues = doc.getFieldValues(fieldName); > > if (fieldValues != null) { > > for (Object content : fieldValues) { > > - if (content instanceof String) { > > - String stringContent = (String) content; > > + if (content instanceof CharSequence) { > > + CharSequence stringContent = (CharSequence) content; > > if (stringContent.length() > maxFieldValueChars) { > > - detector.append(stringContent.substring(0, > > maxFieldValueChars)); > > + detector.append(stringContent.subSequence(0, > > maxFieldValueChars).toString()); > > } else { > > - detector.append(stringContent); > > + detector.append(stringContent.toString()); > > } > > detector.append(" "); > > } else { > > Index: > > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java > > IDEA additional info: > > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > > <+>UTF-8 > > =================================================================== > > --- > > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java > > (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5) > > +++ > > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java > > (date 1550217691000) > > @@ -413,10 +413,10 @@ > > Collection<Object> fieldValues = doc.getFieldValues(fieldName); > > if (fieldValues != null) { > > for (Object content : fieldValues) { > > - if (content instanceof String) { > > - String stringContent = (String) content; > > + if (content instanceof CharSequence) { > > + CharSequence stringContent = (CharSequence) content; > > if (stringContent.length() > maxFieldValueChars) { > > - sb.append(stringContent.substring(0, maxFieldValueChars)); > > + sb.append(stringContent.subSequence(0, > > maxFieldValueChars)); > > } else { > > sb.append(stringContent); > > } > > @@ -449,8 +449,8 @@ > > Collection<Object> contents = doc.getFieldValues(field); > > if (contents != null) { > > for (Object content : contents) { > > - if (content instanceof String) { > > - docSize += Math.min(((String) content).length(), > > maxFieldValueChars); > > + if (content instanceof CharSequence) { > > + docSize += Math.min(((CharSequence) content).length(), > > maxFieldValueChars); > > } > > } > > > > > > > > -- > > Jan Høydahl, search solution architect > > Cominvent AS - www.cominvent.com > > > >> 14. feb. 2019 kl. 16:02 skrev Andreas Hubold > >> <andreas.hub...@coremedia.com>: > >> > >> Hi, > >> > >> while trying to update from Solr 7.6 to 7.7 I run into some unexpected > >> incompatibilites with UpdateRequestProcessors. > >> > >> The SolrInputDocument passed to UpdateRequestProcessor#processAdd does not > >> return Strings for string fields anymore but instances of > >> org.apache.solr.common.util.ByteArrayUtf8CharSequence. I found some > >> related JIRA issues (SOLR-12983?) but nothing under the "Upgrade Notes" > >> section. > >> > >> I can adapt our UpdateRequestProcessor implementations but at least the > >> org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor > >> is broken now as well and needs to be fixed in Solr. It expects String > >> values and logs messages such as the following now: > >> > >> 2019-02-14 13:14:47.537 WARN (qtp802600647-19) [ x:studio] > >> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized > >> not a String value, not including in detection > >> > >> I wonder what kind of plugins are affected by the change. Does this only > >> affect UpdateRequestProcessors or more plugins? Do I need to handle these > >> ByteArrayUtf8CharSequence instances in SolrJ clients now as well? > >> > >> Cheers, > >> Andreas > >> > >> > > > >