[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023884#comment-17023884
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit fd49c903b8193aa27c56655915c1bf741135fa18 in lucene-solr's branch 
refs/heads/gradle-master from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd49c90 ]

SOLR-14189: Add changes entry


> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023883#comment-17023883
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit efd0e8f3e89a954fcb870c9fab18cf19bcdbf97e in lucene-solr's branch 
refs/heads/gradle-master from andywebb1975
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=efd0e8f ]

SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172)



> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Andy Webb (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023821#comment-17023821
 ] 

Andy Webb commented on SOLR-14189:
--

Happy to help - thanks Uwe (and Christine)!

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023770#comment-17023770
 ] 

Uwe Schindler commented on SOLR-14189:
--

Thanks Andy!

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023769#comment-17023769
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit e934c8a7caee42565bd4c3982e6b46a561ebecfe in lucene-solr's branch 
refs/heads/branch_8x from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e934c8a ]

SOLR-14189: Add changes entry


> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023768#comment-17023768
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit 43085edaa6954f212d1a7f19a2f60e3d0de73ae6 in lucene-solr's branch 
refs/heads/branch_8x from andywebb1975
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=43085ed ]

SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172)



> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023767#comment-17023767
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit fd49c903b8193aa27c56655915c1bf741135fa18 in lucene-solr's branch 
refs/heads/master from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd49c90 ]

SOLR-14189: Add changes entry


> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023766#comment-17023766
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit efd0e8f3e89a954fcb870c9fab18cf19bcdbf97e in lucene-solr's branch 
refs/heads/master from andywebb1975
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=efd0e8f ]

SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172)



> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023765#comment-17023765
 ] 

Uwe Schindler commented on SOLR-14189:
--

I will merge this now.

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80=edismax}} vs 
> {{/solr/mycollection/select?q=%20=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org