Hey all, I have a proposed update which adds a 7.7 section to our "Upgrade Notes" ref-guide page. I put a mention of this in there, but don't have a ton of context on the issue. Would appreciate a review from anyone more familiar. Check out SOLR-13256 if you get a few minutes.
Best, Jason On Mon, Feb 18, 2019 at 9:06 AM Jan Høydahl <jan....@cominvent.com> wrote: > > Thanks for chiming in Markus. Yea, same with the langid tests, they just work > locally with manually constructed SolrInputDocument objects. > This bug breaking change sounds really scary and we should add an UPGRADE > NOTE somewhere. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 15. feb. 2019 kl. 10:34 skrev Markus Jelsma <markus.jel...@openindex.io>: > > > > I stumbled upon this too yesterday and created SOLR-13249. In local unit > > tests we get String but in distributed unit tests we get a > > ByteArrayUtf8CharSequence instead. > > > > https://issues.apache.org/jira/browse/SOLR-13249 > > > > > > > > -----Original message----- > >> From:Andreas Hubold <andreas.hub...@coremedia.com> > >> Sent: Friday 15th February 2019 10:10 > >> To: solr-user@lucene.apache.org > >> Subject: Re: Solr 7.7 UpdateRequestProcessor broken > >> > >> Hi, > >> > >> thank you, Jan. > >> > >> I've created https://issues.apache.org/jira/browse/SOLR-13255. Maybe you > >> want to add your patch to that ticket. I did not have time to test it yet. > >> > >> So I guess, all SolrJ usages have to handle CharSequence now for string > >> fields? Well, this really sounds like a major breaking change for custom > >> code. > >> > >> Thanks, > >> Andreas > >> > >> Jan Høydahl schrieb am 15.02.19 um 09:14: > >>> Hi > >>> > >>> This is a subtle change which is not detected by our langid unit tests, > >>> as I think it only happens when document is trasferred with SolrJ and > >>> Javabin codec. > >>> Was introduced in https://issues.apache.org/jira/browse/SOLR-12992 > >>> > >>> Please create a new JIRA issue for langid so we can try to fix it in 7.7.1 > >>> > >>> Other SolrInputDocument users assuming String type for strings in > >>> SolrInputDocument would also be vulnerable. > >>> > >>> I have a patch ready that you could test: > >>> > >>> Index: > >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java > >>> IDEA additional info: > >>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > >>> <+>UTF-8 > >>> =================================================================== > >>> --- > >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java > >>> (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5) > >>> +++ > >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java > >>> (date 1550217809000) > >>> @@ -60,12 +60,12 @@ > >>> Collection<Object> fieldValues = doc.getFieldValues(fieldName); > >>> if (fieldValues != null) { > >>> for (Object content : fieldValues) { > >>> - if (content instanceof String) { > >>> - String stringContent = (String) content; > >>> + if (content instanceof CharSequence) { > >>> + CharSequence stringContent = (CharSequence) content; > >>> if (stringContent.length() > maxFieldValueChars) { > >>> - detector.append(stringContent.substring(0, > >>> maxFieldValueChars)); > >>> + detector.append(stringContent.subSequence(0, > >>> maxFieldValueChars).toString()); > >>> } else { > >>> - detector.append(stringContent); > >>> + detector.append(stringContent.toString()); > >>> } > >>> detector.append(" "); > >>> } else { > >>> Index: > >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java > >>> IDEA additional info: > >>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > >>> <+>UTF-8 > >>> =================================================================== > >>> --- > >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java > >>> (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5) > >>> +++ > >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java > >>> (date 1550217691000) > >>> @@ -413,10 +413,10 @@ > >>> Collection<Object> fieldValues = doc.getFieldValues(fieldName); > >>> if (fieldValues != null) { > >>> for (Object content : fieldValues) { > >>> - if (content instanceof String) { > >>> - String stringContent = (String) content; > >>> + if (content instanceof CharSequence) { > >>> + CharSequence stringContent = (CharSequence) content; > >>> if (stringContent.length() > maxFieldValueChars) { > >>> - sb.append(stringContent.substring(0, > >>> maxFieldValueChars)); > >>> + sb.append(stringContent.subSequence(0, > >>> maxFieldValueChars)); > >>> } else { > >>> sb.append(stringContent); > >>> } > >>> @@ -449,8 +449,8 @@ > >>> Collection<Object> contents = doc.getFieldValues(field); > >>> if (contents != null) { > >>> for (Object content : contents) { > >>> - if (content instanceof String) { > >>> - docSize += Math.min(((String) content).length(), > >>> maxFieldValueChars); > >>> + if (content instanceof CharSequence) { > >>> + docSize += Math.min(((CharSequence) content).length(), > >>> maxFieldValueChars); > >>> } > >>> } > >>> > >>> > >>> > >>> -- > >>> Jan Høydahl, search solution architect > >>> Cominvent AS - www.cominvent.com > >>> > >>>> 14. feb. 2019 kl. 16:02 skrev Andreas Hubold > >>>> <andreas.hub...@coremedia.com>: > >>>> > >>>> Hi, > >>>> > >>>> while trying to update from Solr 7.6 to 7.7 I run into some unexpected > >>>> incompatibilites with UpdateRequestProcessors. > >>>> > >>>> The SolrInputDocument passed to UpdateRequestProcessor#processAdd does > >>>> not return Strings for string fields anymore but instances of > >>>> org.apache.solr.common.util.ByteArrayUtf8CharSequence. I found some > >>>> related JIRA issues (SOLR-12983?) but nothing under the "Upgrade Notes" > >>>> section. > >>>> > >>>> I can adapt our UpdateRequestProcessor implementations but at least the > >>>> org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor > >>>> is broken now as well and needs to be fixed in Solr. It expects String > >>>> values and logs messages such as the following now: > >>>> > >>>> 2019-02-14 13:14:47.537 WARN (qtp802600647-19) [ x:studio] > >>>> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field > >>>> name_tokenized not a String value, not including in detection > >>>> > >>>> I wonder what kind of plugins are affected by the change. Does this only > >>>> affect UpdateRequestProcessors or more plugins? Do I need to handle > >>>> these ByteArrayUtf8CharSequence instances in SolrJ clients now as well? > >>>> > >>>> Cheers, > >>>> Andreas > >>>> > >>>> > >>> > >> > >> >