[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914711#comment-16914711 ] Noble Paul commented on SOLR-13699: --- You can submit a PR and I can merge it > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-13699.patch, SOLR-13699.patch > > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed, while indexing > via SolrJ. Per the most recent documentation it looks like there have been no > intentional changes as to the functionality of this property, so I assume > this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. I am currently not > sure if this issue is limited to indexing via SolrJ or if it applies to > documents indexed via any means -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914486#comment-16914486 ] Chris Troullis commented on SOLR-13699: --- [~noble.paul] Sounds good. I thought it would be cleaner to only have the 1 instanceOf check, the copyfield change seemed low risk since the only place that method was called was from DocumentBuilder. I fully understand the desire to keep the changes as low impact as possible though. Is there anything else I should do to move these changes forward? Not sure what the process is. Should I enable patch review on the ticket? Also, I agree that it would be good to have the tests running in all 3 formats, the existing CopyField tests would have caught this regression if the test docs had been indexed using javabin. Should we create a ticket to track that work, if one doesn't already exist? > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-13699.patch, SOLR-13699.patch > > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed, while indexing > via SolrJ. Per the most recent documentation it looks like there have been no > intentional changes as to the functionality of this property, so I assume > this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. I am currently not > sure if this issue is limited to indexing via SolrJ or if it applies to > documents indexed via any means -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909489#comment-16909489 ] Noble Paul commented on SOLR-13699: --- Let's limit all the changes to DocumentBuilder > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-13699.patch, SOLR-13699.patch > > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed, while indexing > via SolrJ. Per the most recent documentation it looks like there have been no > intentional changes as to the functionality of this property, so I assume > this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. I am currently not > sure if this issue is limited to indexing via SolrJ or if it applies to > documents indexed via any means -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909330#comment-16909330 ] Chris Troullis commented on SOLR-13699: --- Indeed. There appear to be a lot of "Instanceof String" in the codebase, so there could potentially be a lot of other places that are affected by this same issue. I went ahead an uploaded my patch with some unit tests, just so it's there if we decide to move forward with the change. Please let me know if I can help at all further. > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Assignee: Erick Erickson >Priority: Major > Attachments: SOLR-13699.patch > > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed, while indexing > via SolrJ. Per the most recent documentation it looks like there have been no > intentional changes as to the functionality of this property, so I assume > this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. I am currently not > sure if this issue is limited to indexing via SolrJ or if it applies to > documents indexed via any means -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909162#comment-16909162 ] Jan Høydahl commented on SOLR-13699: In this particular case I guess we can replace {code:java} if( val instanceof String && cf.getMaxChars() > 0 ) {{code} with {code:java} if( val instanceof CharSequence && cf.getMaxChars() > 0 ) {{code} But how do we guard against other code locations expecting {{String}} explicitly? @[~noble.paul] any suggestions? > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Assignee: Erick Erickson >Priority: Major > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed, while indexing > via SolrJ. Per the most recent documentation it looks like there have been no > intentional changes as to the functionality of this property, so I assume > this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. I am currently not > sure if this issue is limited to indexing via SolrJ or if it applies to > documents indexed via any means -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909124#comment-16909124 ] Chris Troullis commented on SOLR-13699: --- The fix I have was just to basically add another instanceOf check to handle the ByteArrayUtf8CharSequence, which I agree is not the best solution. It does seem like there have been a number of regressions created by the optimization. I'm not sure I understand enough about the intent of the original change to make any kind of larger scale refactor. Let me know if you still think it's worth submitting my patch, or if you'd prefer to revisit the design of the original optimization. > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Assignee: Erick Erickson >Priority: Major > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed, while indexing > via SolrJ. Per the most recent documentation it looks like there have been no > intentional changes as to the functionality of this property, so I assume > this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. I am currently not > sure if this issue is limited to indexing via SolrJ or if it applies to > documents indexed via any means -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908850#comment-16908850 ] Jan Høydahl commented on SOLR-13699: So this is yet another regression from the ByteArrayUtf8CharSequence "optimization"? If this surfaces a symptom of bad design, let's fix that rather than patch just the maxChars check for JavaBin with some instanceOf or something? > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Assignee: Erick Erickson >Priority: Major > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed, while indexing > via SolrJ. Per the most recent documentation it looks like there have been no > intentional changes as to the functionality of this property, so I assume > this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. I am currently not > sure if this issue is limited to indexing via SolrJ or if it applies to > documents indexed via any means -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908422#comment-16908422 ] Chris Troullis commented on SOLR-13699: --- [~erickerickson] So after looking through the CopyFieldTest unit tests, I found that we are already testing the maxChars functionality via the testCopyFieldFunctionality() test, and the maxChars functionality is working properly when the test runs! After further digging it seems that the issue only occurs when docs are indexed in Binary format, using the JavaBinCodec, as this is where there change was made to read strings as a ByteArrayUtf8CharSequence instead of a string. It appears that the test framework indexes docs in XML format, which does not use the JavaBinCodec, so the fields are read as strings, and the maxChars works as designed. So, in other words, it's still an issue, but looks like it only effects docs indexed in Binary format. Since it looks like the test framework only supports indexing in XML format (although I didn't look that hard), do you have any suggestions on how to properly unit test this? > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Priority: Major > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed, while indexing > via SolrJ. Per the most recent documentation it looks like there have been no > intentional changes as to the functionality of this property, so I assume > this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. I am currently not > sure if this issue is limited to indexing via SolrJ or if it applies to > documents indexed via any means -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField
[ https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908386#comment-16908386 ] Chris Troullis commented on SOLR-13699: --- Created after discussion with [~erickerickson] on the mailing list. I think I have a fix working, just finishing testing and writing a unit test, then I will attach my patch. > maxChars no longer working as designed on CopyField > --- > > Key: SOLR-13699 > URL: https://issues.apache.org/jira/browse/SOLR-13699 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, > 8.1.2 >Reporter: Chris Troullis >Priority: Major > > We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars > property on a copy field is no longer functioning as designed. Per the most > recent documentation it looks like there have been no intentional changes as > to the functionality of this property, so I assume this is a bug. > > In debugging the issue, it looks like the bug was caused by SOLR-12992. In > DocumentBuilder where the maxChar limit is applied, it first checks if the > value is instanceof String. As of SOLR-12992, string values are now coming in > as ByteArrayUtf8CharSequence (unless they are above a certain size as defined > by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String > check, and the maxChar truncation is not being applied. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org