[jira] [Comment Edited] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin

2019-02-19 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771824#comment-16771824
 ] 

Noble Paul edited comment on SOLR-13255 at 2/19/19 11:12 AM:
-

Yes, this is a blocker for 8.0. There is a regression which makes URPs fail. 


was (Author: noble.paul):
Yes, this is a blocker for 8.0

> LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
> --
>
> Key: SOLR-13255
> URL: https://issues.apache.org/jira/browse/SOLR-13255
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.7
>Reporter: Andreas Hubold
>Assignee: Noble Paul
>Priority: Blocker
> Fix For: 8.0, 7.7.1
>
> Attachments: SOLR-13255.patch, SOLR-13255.patch
>
>
> 7.7 changed the object type of string field values that are passed to 
> UpdateRequestProcessor implementations from java.lang.String to 
> ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause.
> The LangDetectLanguageIdentifierUpdateProcessor still expects String values, 
> does not work for CharSequences, and logs warnings instead. For example:
> {noformat}
> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> not a String value, not including in detection
> {noformat}
> I'm not sure, but there could be further places where the changed type for 
> string values needs to be handled. (Our custom UpdateRequestProcessor are 
> broken as well since 7.7 and it would be great to have a proper upgrade note 
> as part of the release notes)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin

2019-02-18 Thread Jason Gerlowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771196#comment-16771196
 ] 

Jason Gerlowski edited comment on SOLR-13255 at 2/18/19 4:24 PM:
-

bq. it would be great to have a proper upgrade note as part of the release notes

Hey [~ahubold], I'm working on "Upgrade Notes" for the next release of our 
ref-guide, and I wanted them to include this issue.  I included a short 
paragraph over on SOLR-13256.  Since you mentioned you were interested in 
seeing this get documented, I wanted to give you a heads up.  Feel free to 
chime in over there about anything I got wrong or any suggestions you might 
have.


was (Author: gerlowskija):
bq. it would be great to have a proper upgrade note as part of the release notes

Hey [~ahubold], I'm working on "Upgrade Notes" for users for the next release 
of our ref-guide, and I wanted them to include this issue.  I included a short 
paragraph over on SOLR-13256.  Since you mentioned you were interested in 
seeing this get documented, I wanted to give you a heads up.  Feel free to 
chime in over there about anything I got wrong or any suggestions you might 
have.

> LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
> --
>
> Key: SOLR-13255
> URL: https://issues.apache.org/jira/browse/SOLR-13255
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.7
>Reporter: Andreas Hubold
>Priority: Major
> Fix For: 8.0, 7.7.1
>
> Attachments: SOLR-13255.patch
>
>
> 7.7 changed the object type of string field values that are passed to 
> UpdateRequestProcessor implementations from java.lang.String to 
> ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause.
> The LangDetectLanguageIdentifierUpdateProcessor still expects String values, 
> does not work for CharSequences, and logs warnings instead. For example:
> {noformat}
> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> not a String value, not including in detection
> {noformat}
> I'm not sure, but there could be further places where the changed type for 
> string values needs to be handled. (Our custom UpdateRequestProcessor are 
> broken as well since 7.7 and it would be great to have a proper upgrade note 
> as part of the release notes)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13255) LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin

2019-02-15 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769161#comment-16769161
 ] 

Jan Høydahl edited comment on SOLR-13255 at 2/15/19 10:12 AM:
--

Attached a raw, not tested patch for langid for branch_7_7.

Due to a refactor, the bug will be different in 8.x, probably it will just 
silently fail to detect any languages, since the list of String fields are 
determined through instanceof String. The patch for 8.x and master will thus 
need to fix SolrInputDocumentReader instead.

I think that for 8.0 we should add an UPGRADE NOTE about this breaking change...


was (Author: janhoy):
Attached a raw, not tested patch for langid for branch_7_7. Due to a refactor, 
the patch will be different for master and 8x, where we'll need to fix 
SolrInputDocumentReader instead, which also does instanced String.

I think that for 8.0 we should add an UPGRADE NOTE about this breaking change...

> LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
> --
>
> Key: SOLR-13255
> URL: https://issues.apache.org/jira/browse/SOLR-13255
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.7
>Reporter: Andreas Hubold
>Priority: Major
> Fix For: 8.0, 7.7.1
>
> Attachments: SOLR-13255.patch
>
>
> 7.7 changed the object type of string field values that are passed to 
> UpdateRequestProcessor implementations from java.lang.String to 
> ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause.
> The LangDetectLanguageIdentifierUpdateProcessor still expects String values, 
> does not work for CharSequences, and logs warnings instead. For example:
> {noformat}
> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> not a String value, not including in detection
> {noformat}
> I'm not sure, but there could be further places where the changed type for 
> string values needs to be handled. (Our custom UpdateRequestProcessor are 
> broken as well since 7.7 and it would be great to have a proper upgrade note 
> as part of the release notes)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org