I stumbled upon this too yesterday and created SOLR-13249. In local unit tests 
we get String but in distributed unit tests we get a ByteArrayUtf8CharSequence 
instead.

https://issues.apache.org/jira/browse/SOLR-13249 

 
 
-----Original message-----
> From:Andreas Hubold <andreas.hub...@coremedia.com>
> Sent: Friday 15th February 2019 10:10
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 7.7 UpdateRequestProcessor broken
> 
> Hi,
> 
> thank you, Jan.
> 
> I've created https://issues.apache.org/jira/browse/SOLR-13255. Maybe you 
> want to add your patch to that ticket. I did not have time to test it yet.
> 
> So I guess, all SolrJ usages have to handle CharSequence now for string 
> fields? Well, this really sounds like a major breaking change for custom 
> code.
> 
> Thanks,
> Andreas
> 
> Jan Høydahl schrieb am 15.02.19 um 09:14:
> > Hi
> >
> > This is a subtle change which is not detected by our langid unit tests, as 
> > I think it only happens when document is trasferred with SolrJ and Javabin 
> > codec.
> > Was introduced in https://issues.apache.org/jira/browse/SOLR-12992
> >
> > Please create a new JIRA issue for langid so we can try to fix it in 7.7.1
> >
> > Other SolrInputDocument users assuming String type for strings in 
> > SolrInputDocument would also be vulnerable.
> >
> > I have a patch ready that you could test:
> >
> > Index: 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> > IDEA additional info:
> > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> > <+>UTF-8
> > ===================================================================
> > --- 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> >   (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
> > +++ 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> >   (date 1550217809000)
> > @@ -60,12 +60,12 @@
> >             Collection<Object> fieldValues = doc.getFieldValues(fieldName);
> >             if (fieldValues != null) {
> >               for (Object content : fieldValues) {
> > -              if (content instanceof String) {
> > -                String stringContent = (String) content;
> > +              if (content instanceof CharSequence) {
> > +                CharSequence stringContent = (CharSequence) content;
> >                   if (stringContent.length() > maxFieldValueChars) {
> > -                  detector.append(stringContent.substring(0, 
> > maxFieldValueChars));
> > +                  detector.append(stringContent.subSequence(0, 
> > maxFieldValueChars).toString());
> >                   } else {
> > -                  detector.append(stringContent);
> > +                  detector.append(stringContent.toString());
> >                   }
> >                   detector.append(" ");
> >                 } else {
> > Index: 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> > IDEA additional info:
> > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> > <+>UTF-8
> > ===================================================================
> > --- 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> >     (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
> > +++ 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> >     (date 1550217691000)
> > @@ -413,10 +413,10 @@
> >           Collection<Object> fieldValues = doc.getFieldValues(fieldName);
> >           if (fieldValues != null) {
> >             for (Object content : fieldValues) {
> > -            if (content instanceof String) {
> > -              String stringContent = (String) content;
> > +            if (content instanceof CharSequence) {
> > +              CharSequence stringContent = (CharSequence) content;
> >                 if (stringContent.length() > maxFieldValueChars) {
> > -                sb.append(stringContent.substring(0, maxFieldValueChars));
> > +                sb.append(stringContent.subSequence(0, 
> > maxFieldValueChars));
> >                 } else {
> >                   sb.append(stringContent);
> >                 }
> > @@ -449,8 +449,8 @@
> >           Collection<Object> contents = doc.getFieldValues(field);
> >           if (contents != null) {
> >             for (Object content : contents) {
> > -            if (content instanceof String) {
> > -              docSize += Math.min(((String) content).length(), 
> > maxFieldValueChars);
> > +            if (content instanceof CharSequence) {
> > +              docSize += Math.min(((CharSequence) content).length(), 
> > maxFieldValueChars);
> >               }
> >             }
> >   
> >
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> >> 14. feb. 2019 kl. 16:02 skrev Andreas Hubold 
> >> <andreas.hub...@coremedia.com>:
> >>
> >> Hi,
> >>
> >> while trying to update from Solr 7.6 to 7.7 I run into some unexpected 
> >> incompatibilites with UpdateRequestProcessors.
> >>
> >> The SolrInputDocument passed to UpdateRequestProcessor#processAdd does not 
> >> return Strings for string fields anymore but instances of 
> >> org.apache.solr.common.util.ByteArrayUtf8CharSequence. I found some 
> >> related JIRA issues (SOLR-12983?) but nothing under the "Upgrade Notes" 
> >> section.
> >>
> >> I can adapt our UpdateRequestProcessor implementations but at least the 
> >> org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor
> >>  is broken now as well and needs to be fixed in Solr. It expects String 
> >> values and logs messages such as the following now:
> >>
> >> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> >> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> >> not a String value, not including in detection
> >>
> >> I wonder what kind of plugins are affected by the change. Does this only 
> >> affect UpdateRequestProcessors or more plugins? Do I need to handle these 
> >> ByteArrayUtf8CharSequence instances in SolrJ clients now as well?
> >>
> >> Cheers,
> >> Andreas
> >>
> >>
> >
> 
> 

Reply via email to