[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java

2019-08-10 Thread Chongchen Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904395#comment-16904395
 ] 

Chongchen Chen commented on LUCENE-8572:


I find the code is relative to 
[LUCENE-4199|https://issues.apache.org/jira/browse/LUCENE-4199] . Maybe we 
should implement it like:

{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
  Locale locale) {
...

for (int i = 0; i < escapableWhiteChars.length; i++) {
  buffer = buffer.toString().replace(escapableWhiteChars[i], "\\");
  buffer = 
buffer.toString().replace(escapableWhiteChars[i].toLowerCase(locale), "\\");
}
return buffer;
  }
{code}



> StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
> 
>
> Key: LUCENE-8572
> URL: https://issues.apache.org/jira/browse/LUCENE-8572
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 6.3
>Reporter: Octavian Mocanu
>Priority: Major
>
> With "lucene-queryparser-6.3.0", specifically in
> "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java"
>  
> when escaping strings containing extended unicode chars, and with a locale 
> distinct from that of the character set the string uses, the process fails, 
> with a "java.lang.StringIndexOutOfBoundsException".
>  
> The reason is that the comparison is done by previously converting all of the 
> characters of the string to lower case chars, and by doing this, the original 
> string size isn't anymore the same, but less, as of the transformed one, so 
> that executing
>  
> org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89
> fails with a java.lang.StringIndexOutOfBoundsException.
> I wonder whether the transformation to lower case is really needed when 
> treating the escape chars, since by avoiding it, the error may be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java

2018-12-15 Thread Octavian Mocanu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722060#comment-16722060
 ] 

Octavian Mocanu commented on LUCENE-8572:
-

Hi, in my opinion, the proposal of [~danmuzi] would be a nice solution, mainly 
for it avoids the problematic use of _toLowerCase_.

Best!

> StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
> 
>
> Key: LUCENE-8572
> URL: https://issues.apache.org/jira/browse/LUCENE-8572
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 6.3
>Reporter: Octavian Mocanu
>Priority: Major
>
> With "lucene-queryparser-6.3.0", specifically in
> "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java"
>  
> when escaping strings containing extended unicode chars, and with a locale 
> distinct from that of the character set the string uses, the process fails, 
> with a "java.lang.StringIndexOutOfBoundsException".
>  
> The reason is that the comparison is done by previously converting all of the 
> characters of the string to lower case chars, and by doing this, the original 
> string size isn't anymore the same, but less, as of the transformed one, so 
> that executing
>  
> org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89
> fails with a java.lang.StringIndexOutOfBoundsException.
> I wonder whether the transformation to lower case is really needed when 
> treating the escape chars, since by avoiding it, the error may be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java

2018-11-30 Thread Namgyu Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705249#comment-16705249
 ] 

Namgyu Kim commented on LUCENE-8572:


Hi, [~romseygeek], [~thetaphi].

I checked the issue and it could be a logical problem.

First, I think it's not a Locale problem, but a replace 
algorithm(replaceIgnoreCase) itself.

When you see the escapeWhiteChar(), it calls the replaceIgnoreCase() internally.
(escapeTerm() -> escapeWhiteChar() -> replaceIgnoreCase())

 
{code:java}
private static CharSequence replaceIgnoreCase(CharSequence string,
CharSequence sequence1, CharSequence escapeChar, Locale locale) {
  // string = "İpone " [304, 112, 111, 110, 101, 32],  size = 6
  ...
  while (start < count) {
// Convert by toLowerCase as follows.
// string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7
// firstIndex will be set 6.
if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first,
start)) == -1)
  break;
boolean found = true;
...
if (found) {
  // In this line, String.toString() will only have a range of 0 to 5.
  // So here we get a StringIndexOutOfBoundsException.
  result.append(string.toString().substring(copyStart, firstIndex));
  ...
} else {
  start = firstIndex + 1;
}
  }
  ...
}
{code}
 

Solving this may not be a big problem.


But what do you think about using
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
  Locale locale) {
...

for (int i = 0; i < escapableWhiteChars.length; i++) {
  // Use String's replace method.
  buffer = buffer.toString().replace(escapableWhiteChars[i], "\\");
}
return buffer;
  }
{code}
instead of
{code:java}
public static final CharSequence escapeWhiteChar(CharSequence str,
  Locale locale) {
...

for (int i = 0; i < escapableWhiteChars.length; i++) {
  // Stay current method.
  buffer = replaceIgnoreCase(buffer, 
escapableWhiteChars[i].toLowerCase(locale), "\\", locale);
}
return buffer;
  }
{code}
in the escapeWhiteChar method?

 

> StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
> 
>
> Key: LUCENE-8572
> URL: https://issues.apache.org/jira/browse/LUCENE-8572
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 6.3
>Reporter: Octavian Mocanu
>Priority: Major
>
> With "lucene-queryparser-6.3.0", specifically in
> "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java"
>  
> when escaping strings containing extended unicode chars, and with a locale 
> distinct from that of the character set the string uses, the process fails, 
> with a "java.lang.StringIndexOutOfBoundsException".
>  
> The reason is that the comparison is done by previously converting all of the 
> characters of the string to lower case chars, and by doing this, the original 
> string size isn't anymore the same, but less, as of the transformed one, so 
> that executing
>  
> org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89
> fails with a java.lang.StringIndexOutOfBoundsException.
> I wonder whether the transformation to lower case is really needed when 
> treating the escape chars, since by avoiding it, the error may be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java

2018-11-22 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695826#comment-16695826
 ] 

Alan Woodward commented on LUCENE-8572:
---

It looks as though FieldQueryNode and PathQueryNode are using 
{{Locale.getDefault()}} when they should be using {{Locale.ROOT}}

cc [~thetaphi‍] who understands locales better than I do...

> StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
> 
>
> Key: LUCENE-8572
> URL: https://issues.apache.org/jira/browse/LUCENE-8572
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 6.3
>Reporter: Octavian Mocanu
>Priority: Major
>
> With "lucene-queryparser-6.3.0", specifically in
> "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java"
>  
> when escaping strings containing extended unicode chars, and with a locale 
> distinct from that of the character set the string uses, the process fails, 
> with a "java.lang.StringIndexOutOfBoundsException".
>  
> The reason is that the comparison is done by previously converting all of the 
> characters of the string to lower case chars, and by doing this, the original 
> string size isn't anymore the same, but less, as of the transformed one, so 
> that executing
>  
> org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89
> fails with a java.lang.StringIndexOutOfBoundsException.
> I wonder whether the transformation to lower case is really needed when 
> treating the escape chars, since by avoiding it, the error may be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java

2018-11-22 Thread Octavian Mocanu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695760#comment-16695760
 ] 

Octavian Mocanu commented on LUCENE-8572:
-

Hi [~romseygeek],

Trying e.g. at 
{code:java}
/org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:83
 {code}
in
{code:java}
private static final CharSequence escapeTerm(CharSequence term, Locale 
locale){code}
with

 
{code:java}
term = "İpone " [304, 112, 111, 110, 101, 32]
locale = "us"
{code}

result -> *StringIndexOutOfBoundsException*

(it'll only work when having locale = "tr")

> StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
> 
>
> Key: LUCENE-8572
> URL: https://issues.apache.org/jira/browse/LUCENE-8572
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 6.3
>Reporter: Octavian Mocanu
>Priority: Major
>
> With "lucene-queryparser-6.3.0", specifically in
> "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java"
>  
> when escaping strings containing extended unicode chars, and with a locale 
> distinct from that of the character set the string uses, the process fails, 
> with a "java.lang.StringIndexOutOfBoundsException".
>  
> The reason is that the comparison is done by previously converting all of the 
> characters of the string to lower case chars, and by doing this, the original 
> string size isn't anymore the same, but less, as of the transformed one, so 
> that executing
>  
> org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89
> fails with a java.lang.StringIndexOutOfBoundsException.
> I wonder whether the transformation to lower case is really needed when 
> treating the escape chars, since by avoiding it, the error may be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8572) StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java

2018-11-22 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695702#comment-16695702
 ] 

Alan Woodward commented on LUCENE-8572:
---

Hi [~tonicava], thanks for opening the issue.  Would you be able to provide an 
example or test case that illustrates the failure?

> StringIndexOutOfBoundsException in parser/EscapeQuerySyntaxImpl.java
> 
>
> Key: LUCENE-8572
> URL: https://issues.apache.org/jira/browse/LUCENE-8572
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 6.3
>Reporter: Octavian Mocanu
>Priority: Major
>
> With "lucene-queryparser-6.3.0", specifically in
> "org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java"
>  
> when escaping strings containing extended unicode chars, and with a locale 
> distinct from that of the character set the string uses, the process fails, 
> with a "java.lang.StringIndexOutOfBoundsException".
>  
> The reason is that the comparison is done by previously converting all of the 
> characters of the string to lower case chars, and by doing this, the original 
> string size isn't anymore the same, but less, as of the transformed one, so 
> that executing
>  
> org/apache/lucene/queryparser/flexible/standard/parser/EscapeQuerySyntaxImpl.java:89
> fails with a java.lang.StringIndexOutOfBoundsException.
> I wonder whether the transformation to lower case is really needed when 
> treating the escape chars, since by avoiding it, the error may be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org