[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-23 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914711#comment-16914711
 ] 

Noble Paul commented on SOLR-13699:
---

You can submit a PR and I can merge it

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13699.patch, SOLR-13699.patch
>
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-23 Thread Chris Troullis (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914486#comment-16914486
 ] 

Chris Troullis commented on SOLR-13699:
---

[~noble.paul] Sounds good. I thought it would be cleaner to only have the 1 
instanceOf check, the copyfield change seemed low risk since the only place 
that method was called was from DocumentBuilder. I fully understand the desire 
to keep the changes as low impact as possible though. 

Is there anything else I should do to move these changes forward? Not sure what 
the process is. Should I enable patch review on the ticket?

Also, I agree that it would be good to have the tests running in all 3 formats, 
the existing CopyField tests would have caught this regression if the test docs 
had been indexed using javabin. Should we create a ticket to track that work, 
if one doesn't already exist?

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13699.patch, SOLR-13699.patch
>
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-16 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909489#comment-16909489
 ] 

Noble Paul commented on SOLR-13699:
---

Let's limit all the changes to DocumentBuilder

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13699.patch, SOLR-13699.patch
>
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-16 Thread Chris Troullis (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909330#comment-16909330
 ] 

Chris Troullis commented on SOLR-13699:
---

Indeed. There appear to be a lot of "Instanceof String" in the codebase, so 
there could potentially be a lot of other places that are affected by this same 
issue. I went ahead an uploaded my patch with some unit tests, just so it's 
there if we decide to move forward with the change. Please let me know if I can 
help at all further.

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-13699.patch
>
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909162#comment-16909162
 ] 

Jan Høydahl commented on SOLR-13699:


In this particular case I guess we can replace 
{code:java}
if( val instanceof String && cf.getMaxChars() > 0 ) {{code}
with
{code:java}
if( val instanceof CharSequence && cf.getMaxChars() > 0 ) {{code}
But how do we guard against other code locations expecting {{String}} 
explicitly? @[~noble.paul] any suggestions? 

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-16 Thread Chris Troullis (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909124#comment-16909124
 ] 

Chris Troullis commented on SOLR-13699:
---

The fix I have was just to basically add another instanceOf check to handle the 
ByteArrayUtf8CharSequence, which I agree is not the best solution. It does seem 
like there have been a number of regressions created by the optimization. I'm 
not sure I understand enough about the intent of the original change to make 
any kind of larger scale refactor.

Let me know if you still think it's worth submitting my patch, or if you'd 
prefer to revisit the design of the original optimization.

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908850#comment-16908850
 ] 

Jan Høydahl commented on SOLR-13699:


So this is yet another regression from the ByteArrayUtf8CharSequence 
"optimization"? If this surfaces a symptom of bad design, let's fix that rather 
than patch just the maxChars check for JavaBin with some instanceOf or 
something?

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Assignee: Erick Erickson
>Priority: Major
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-15 Thread Chris Troullis (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908422#comment-16908422
 ] 

Chris Troullis commented on SOLR-13699:
---

[~erickerickson] So after looking through the CopyFieldTest unit tests, I found 
that we are already testing the maxChars functionality via the 
testCopyFieldFunctionality() test, and the maxChars functionality is working 
properly when the test runs! 

After further digging it seems that the issue only occurs when docs are indexed 
in Binary format, using the JavaBinCodec, as this is where there change was 
made to read strings as a ByteArrayUtf8CharSequence instead of a string. It 
appears that the test framework indexes docs in XML format, which does not use 
the JavaBinCodec, so the fields are read as strings, and the maxChars works as 
designed. 

So, in other words, it's still an issue, but looks like it only effects docs 
indexed in Binary format. Since it looks like the test framework only supports 
indexing in XML format (although I didn't look that hard), do you have any 
suggestions on how to properly unit test this?

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Priority: Major
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed, while indexing 
> via SolrJ. Per the most recent documentation it looks like there have been no 
> intentional changes as to the functionality of this property, so I assume 
> this is a bug.
>   
>  In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. I am currently not 
> sure if this issue is limited to indexing via SolrJ or if it applies to 
> documents indexed via any means



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13699) maxChars no longer working as designed on CopyField

2019-08-15 Thread Chris Troullis (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908386#comment-16908386
 ] 

Chris Troullis commented on SOLR-13699:
---

Created after discussion with [~erickerickson] on the mailing list. I think I 
have a fix working, just finishing testing and writing a unit test, then I will 
attach my patch.

> maxChars no longer working as designed on CopyField
> ---
>
> Key: SOLR-13699
> URL: https://issues.apache.org/jira/browse/SOLR-13699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.0.1, 8.1, 8.2, 7.7.3, 8.1.1, 
> 8.1.2
>Reporter: Chris Troullis
>Priority: Major
>
> We recently upgraded from Solr 7.3 to 8.1, and noticed that the maxChars 
> property on a copy field is no longer functioning as designed. Per the most 
> recent documentation it looks like there have been no intentional changes as 
> to the functionality of this property, so I assume this is a bug.
>  
> In debugging the issue, it looks like the bug was caused by SOLR-12992. In 
> DocumentBuilder where the maxChar limit is applied, it first checks if the 
> value is instanceof String. As of SOLR-12992, string values are now coming in 
> as ByteArrayUtf8CharSequence (unless they are above a certain size as defined 
> by JavaBinCodec.MAX_UTF8_SZ), so they are failing the instanceof String 
> check, and the maxChar truncation is not being applied. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org