[jira] [Comment Edited] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-16 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909489#comment-16909489
 ] 

Noble Paul edited comment on SOLR-13699 at 8/16/19 11:51 PM:
-

Let's limit all the changes to DocumentBuilder. Eventually, we have to convert 
our tests to run on all the 3 formats
* JSON
* XML
* javabin
Sticking to XML alone in tests can lead to many unforseen problems.


was (Author: noble.paul):
Let's limit all the changes to DocumentBuilder

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13699.patch, SOLR-13699.patch
>
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-16 Thread Chris Troullis (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909330#comment-16909330
 ] 

Chris Troullis edited comment on SOLR-13699 at 8/16/19 6:44 PM:


Indeed. There appears to be a lot of "Instanceof String" in the codebase, so 
there could potentially be a lot of other places that are affected by this same 
issue. I went ahead an uploaded my patch with some unit tests, just so it's 
there if we decide to move forward with the change. Please let me know if I can 
help at all further.


was (Author: ctroullis):
Indeed. There appear to be a lot of "Instanceof String" in the codebase, so 
there could potentially be a lot of other places that are affected by this same 
issue. I went ahead an uploaded my patch with some unit tests, just so it's 
there if we decide to move forward with the change. Please let me know if I can 
help at all further.

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13699.patch
>
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org