[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
[ https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904395#comment-16904395 ] Chongchen Chen commented on LUCENE-8572: I find the code is relative to [LUCENE-4199|https://issues.apache.org/jira/browse/LUCENE-4199] . Maybe we should implement it like: {code:java} public static final CharSequence escapeWhiteChar(CharSequence str, Locale locale) { ... for (int i = 0; i < escapableWhiteChars.length; i++) { buffer = buffer.toString().replace(escapableWhiteChars[i], "\\"); buffer = buffer.toString().replace(escapableWhiteChars[i].toLowerCase(locale), "\\"); } return buffer; } {code} > StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java > > > Key: LUCENE-8572 > URL: https://issues.apache.org/jira/browse/LUCENE-8572 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 6.3 >Reporter: Octavian Mocanu >Priority: Major > > With "lucene-queryparser-6.3.0", specifically in > "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java" > > when escaping strings containing extended unicode chars, and with a locale > distinct from that of the character set the string uses, the process fails, > with a "java.lang.StringIndexOutOfBoundsException". > > The reason is that the comparison is done by previously converting all of the > characters of the string to lower case chars, and by doing this, the original > string size isn't anymore the same, but less, as of the transformed one, so > that executing > > org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89 > fails with a java.lang.StringIndexOutOfBoundsException. > I wonder whether the transformation to lower case is really needed when > treating the escape chars, since by avoiding it, the error may be avoided. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
[ https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722060#comment-16722060 ] Octavian Mocanu commented on LUCENE-8572: - Hi, in my opinion, the proposal of [~danmuzi] would be a nice solution, mainly for it avoids the problematic use of _toLowerCase_. Best! > StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java > > > Key: LUCENE-8572 > URL: https://issues.apache.org/jira/browse/LUCENE-8572 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 6.3 >Reporter: Octavian Mocanu >Priority: Major > > With "lucene-queryparser-6.3.0", specifically in > "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java" > > when escaping strings containing extended unicode chars, and with a locale > distinct from that of the character set the string uses, the process fails, > with a "java.lang.StringIndexOutOfBoundsException". > > The reason is that the comparison is done by previously converting all of the > characters of the string to lower case chars, and by doing this, the original > string size isn't anymore the same, but less, as of the transformed one, so > that executing > > org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89 > fails with a java.lang.StringIndexOutOfBoundsException. > I wonder whether the transformation to lower case is really needed when > treating the escape chars, since by avoiding it, the error may be avoided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
[ https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705249#comment-16705249 ] Namgyu Kim commented on LUCENE-8572: Hi, [~romseygeek], [~thetaphi]. I checked the issue and it could be a logical problem. First, I think it's not a Locale problem, but a replace algorithm(replaceIgnoreCase) itself. When you see the escapeWhiteChar(), it calls the replaceIgnoreCase() internally. (escapeTerm() -> escapeWhiteChar() -> replaceIgnoreCase()) {code:java} private static CharSequence replaceIgnoreCase(CharSequence string, CharSequence sequence1, CharSequence escapeChar, Locale locale) { // string = "İpone " [304, 112, 111, 110, 101, 32], size = 6 ... while (start < count) { // Convert by toLowerCase as follows. // string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7 // firstIndex will be set 6. if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first, start)) == -1) break; boolean found = true; ... if (found) { // In this line, String.toString() will only have a range of 0 to 5. // So here we get a StringIndexOutOfBoundsException. result.append(string.toString().substring(copyStart, firstIndex)); ... } else { start = firstIndex + 1; } } ... } {code} Solving this may not be a big problem. But what do you think about using {code:java} public static final CharSequence escapeWhiteChar(CharSequence str, Locale locale) { ... for (int i = 0; i < escapableWhiteChars.length; i++) { // Use String's replace method. buffer = buffer.toString().replace(escapableWhiteChars[i], "\\"); } return buffer; } {code} instead of {code:java} public static final CharSequence escapeWhiteChar(CharSequence str, Locale locale) { ... for (int i = 0; i < escapableWhiteChars.length; i++) { // Stay current method. buffer = replaceIgnoreCase(buffer, escapableWhiteChars[i].toLowerCase(locale), "\\", locale); } return buffer; } {code} in the escapeWhiteChar method? > StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java > > > Key: LUCENE-8572 > URL: https://issues.apache.org/jira/browse/LUCENE-8572 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 6.3 >Reporter: Octavian Mocanu >Priority: Major > > With "lucene-queryparser-6.3.0", specifically in > "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java" > > when escaping strings containing extended unicode chars, and with a locale > distinct from that of the character set the string uses, the process fails, > with a "java.lang.StringIndexOutOfBoundsException". > > The reason is that the comparison is done by previously converting all of the > characters of the string to lower case chars, and by doing this, the original > string size isn't anymore the same, but less, as of the transformed one, so > that executing > > org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89 > fails with a java.lang.StringIndexOutOfBoundsException. > I wonder whether the transformation to lower case is really needed when > treating the escape chars, since by avoiding it, the error may be avoided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
[ https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695826#comment-16695826 ] Alan Woodward commented on LUCENE-8572: --- It looks as though FieldQueryNode and PathQueryNode are using {{Locale.getDefault()}} when they should be using {{Locale.ROOT}} cc [~thetaphi] who understands locales better than I do... > StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java > > > Key: LUCENE-8572 > URL: https://issues.apache.org/jira/browse/LUCENE-8572 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 6.3 >Reporter: Octavian Mocanu >Priority: Major > > With "lucene-queryparser-6.3.0", specifically in > "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java" > > when escaping strings containing extended unicode chars, and with a locale > distinct from that of the character set the string uses, the process fails, > with a "java.lang.StringIndexOutOfBoundsException". > > The reason is that the comparison is done by previously converting all of the > characters of the string to lower case chars, and by doing this, the original > string size isn't anymore the same, but less, as of the transformed one, so > that executing > > org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89 > fails with a java.lang.StringIndexOutOfBoundsException. > I wonder whether the transformation to lower case is really needed when > treating the escape chars, since by avoiding it, the error may be avoided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
[ https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695760#comment-16695760 ] Octavian Mocanu commented on LUCENE-8572: - Hi [~romseygeek], Trying e.g. at {code:java} /org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:83 {code} in {code:java} private static final CharSequence escapeTerm(CharSequence term, Locale locale){code} with {code:java} term = "İpone " [304, 112, 111, 110, 101, 32] locale = "us" {code} result -> *StringIndexOutOfBoundsException* (it'll only work when having locale = "tr") > StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java > > > Key: LUCENE-8572 > URL: https://issues.apache.org/jira/browse/LUCENE-8572 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 6.3 >Reporter: Octavian Mocanu >Priority: Major > > With "lucene-queryparser-6.3.0", specifically in > "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java" > > when escaping strings containing extended unicode chars, and with a locale > distinct from that of the character set the string uses, the process fails, > with a "java.lang.StringIndexOutOfBoundsException". > > The reason is that the comparison is done by previously converting all of the > characters of the string to lower case chars, and by doing this, the original > string size isn't anymore the same, but less, as of the transformed one, so > that executing > > org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89 > fails with a java.lang.StringIndexOutOfBoundsException. > I wonder whether the transformation to lower case is really needed when > treating the escape chars, since by avoiding it, the error may be avoided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
[ https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695702#comment-16695702 ] Alan Woodward commented on LUCENE-8572: --- Hi [~tonicava], thanks for opening the issue. Would you be able to provide an example or test case that illustrates the failure? > StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java > > > Key: LUCENE-8572 > URL: https://issues.apache.org/jira/browse/LUCENE-8572 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Affects Versions: 6.3 >Reporter: Octavian Mocanu >Priority: Major > > With "lucene-queryparser-6.3.0", specifically in > "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java" > > when escaping strings containing extended unicode chars, and with a locale > distinct from that of the character set the string uses, the process fails, > with a "java.lang.StringIndexOutOfBoundsException". > > The reason is that the comparison is done by previously converting all of the > characters of the string to lower case chars, and by doing this, the original > string size isn't anymore the same, but less, as of the transformed one, so > that executing > > org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89 > fails with a java.lang.StringIndexOutOfBoundsException. > I wonder whether the transformation to lower case is really needed when > treating the escape chars, since by avoiding it, the error may be avoided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org