Re: QueryParser - proposed change may break existing queries.

2020-09-18 Thread Mark Harwood
>You could avoid (some of?) these problems by supporting /(?i)foo/ instead
of /foo/i

That would avoid our parsing dilemma but brings some other concerns. This
inline syntax can normally be used to selectively turn on case sensitivity
for sections of a regex and then turn it off with (?-i).
We could potentially implement this support in the
underlying o.a.l.util.automaton.RegExp class. We changed that class
recently to take a separate global flag alongside the regex string which
can determine case sensitivity. I guess any inline (?i) syntax would
override whatever default option had been passed in the constructor flag.
That might be a hairy change though - the RegExp parser logic is
hand-crafted rather than JavaCC.


On Fri, Sep 18, 2020 at 7:47 AM Dawid Weiss  wrote:

> > If they try to use any other options then 'i' we throow a ParseException
>
> +1. Complex-syntax parsers should throw (human-palatable) exceptions
> on syntax errors. A lenient, "naive user" query parser should be
> separate and accept a very, very
> rudimentary query syntax (so that there are literally no chances of
> making a syntax error).
>
> D.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: QueryParser - proposed change may break existing queries.

2020-09-18 Thread Dawid Weiss
> If they try to use any other options then 'i' we throow a ParseException

+1. Complex-syntax parsers should throw (human-palatable) exceptions
on syntax errors. A lenient, "naive user" query parser should be
separate and accept a very, very
rudimentary query syntax (so that there are literally no chances of
making a syntax error).

D.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: QueryParser - proposed change may break existing queries.

2020-09-17 Thread Chris Hostetter
: And as I understand it, current behavior is the silent misinterpretation.
: To me, the failure to require a space after the regex (and either not
: become a regex in that case or complain about invalid regex) might be
: considered a bug...

I would agree ...

: >> However, today people can search for :
: >>/foo.com/index.html
: >> and not get an error. The searcher may think this is a query for a URL
: >> but it's actually parsed as a regex "foo.com" ORed with a term query.

... i didn't realize that was happening.  To me that seems like it should 
definitely be considered a bug, and the "regex" branch of the grammer 
shouldn't be used if there is any unexpected characters after the closing 
"/" ... the current behavior Mark is describgin seems analogous to the 
grammer assuming "WESS ANDERSON" should be parsed as "WESS +DERSON"

: > You could avoid (some of?) these problems by supporting /(?i)foo/ 
: > instead of /foo/i 
: 
: I like this idea. The only downside is that folks will tend to think
: it's a full Java Pattern and try other options. :)

If they try to use any other options then 'i' we throow a ParseException 





-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: QueryParser - proposed change may break existing queries.

2020-09-17 Thread Dawid Weiss
I like this idea. The only downside is that folks will tend to think
it's a full Java Pattern and try other options. :)

On Thu, Sep 17, 2020 at 9:09 PM Steve Rowe  wrote:
>
> You could avoid (some of?) these problems by supporting /(?i)foo/ instead of 
> /foo/i
>
> --
> Steve
>
> On Sep 17, 2020, at 1:55 PM, Gus Heck  wrote:
>
> And as I understand it, current behavior is the silent misinterpretation. To 
> me, the failure to require a space after the regex (and either not become a 
> regex in that case or complain about invalid regex) might be considered a 
> bug...
>
> On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood  wrote:
>>
>> I think the decision comes down to choosing between silent 
>> (mis)interpratations of ambiguous queries or noisy failures..
>>
>> On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler  wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> My idea would have been not to bee too strict and instead only detect it as 
>>> a regex if its separated. So /foo/bar and /foo/iphone would both go through 
>>> and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would interpret 
>>> the first token as regex.
>>>
>>>
>>>
>>> That’s just my idea, not sure if it makes sense to have this relaxed 
>>> parsing. I was always very skeptical of adding the regexes, as it breaks 
>>> many queries. Now it’s even more.
>>>
>>>
>>>
>>> Uwe
>>>
>>>
>>>
>>> -----
>>>
>>> Uwe Schindler
>>>
>>> Achterdiek 19, D-28357 Bremen
>>>
>>> https://www.thetaphi.de
>>>
>>> eMail: u...@thetaphi.de
>>>
>>>
>>>
>>> From: Mark Harwood 
>>> Sent: Wednesday, September 16, 2020 6:45 PM
>>> To: dev@lucene.apache.org
>>> Subject: Re: QueryParser - proposed change may break existing queries.
>>>
>>>
>>>
>>> The strictness I was thinking of adding was to make all of the following 
>>> error:
>>>
>>>  /foo/bar
>>>
>>>  /foo//bar/
>>>
>>>  /foo/iphone
>>>
>>>  /foo/AND x
>>>
>>>
>>>
>>> These would be allowed:
>>>
>>>  /foo/i bar
>>>
>>>  (/foo/ OR /bar/)
>>>
>>>  (/foo/ OR /bar/i)
>>>
>>>  /foo/^2
>>>
>>>  /foo/i^2
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 16 Sep 2020, at 12:00, Uwe Schindler  wrote:
>>>
>>> 
>>>
>>> In my opinion, the proposed syntax change should enforce to have whitespace 
>>> or any other separator chat after the regex “i” parameter.
>>>
>>>
>>>
>>> Uwe
>>>
>>>
>>>
>>> -
>>>
>>> Uwe Schindler
>>>
>>> Achterdiek 19, D-28357 Bremen
>>>
>>> https://www.thetaphi.de
>>>
>>> eMail: u...@thetaphi.de
>>>
>>>
>>>
>>> From: Mark Harwood 
>>> Sent: Wednesday, September 16, 2020 11:04 AM
>>> To: dev@lucene.apache.org
>>> Subject: QueryParser - proposed change may break existing queries.
>>>
>>>
>>>
>>> In Lucene-9445 we'd like to add a case insensitive option to regex queries 
>>> in the query parser of the form:
>>>
>>>/Foo/i
>>>
>>>
>>>
>>> However, today people can search for :
>>>
>>>
>>>
>>>/foo.com/index.html
>>>
>>>
>>>
>>> and not get an error. The searcher may think this is a query for a URL but 
>>> it's actually parsed as a regex "foo.com" ORed with a term query.
>>>
>>>
>>>
>>> I'd like to draw attention to this proposed change in behaviour because I 
>>> think it could affect many existing systems. Arguably it may be a positive 
>>> in drawing attention to a number of existing silent failures (unescaped 
>>> searches for urls or file paths) but equally could be seen as a negative 
>>> breaking change by some.
>>>
>>>
>>>
>>> What is our BWC policy for changes to query parser?
>>>
>>> Do the benefits of the proposed new regex feature outweigh the costs of the 
>>> breakages in your view?
>>>
>>>
>>>
>>> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
>>>
>>>
>>>
>>>
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: QueryParser - proposed change may break existing queries.

2020-09-17 Thread Uwe Schindler
That's a much better idea, I like it. It's basically what Javas regex parser in 
the Pattern class also does.

If we do this we won't even need a syntax change.

Uwe

Am September 17, 2020 7:09:18 PM UTC schrieb Steve Rowe :
>You could avoid (some of?) these problems by supporting /(?i)foo/
>instead of /foo/i
>
>--
>Steve
>
>> On Sep 17, 2020, at 1:55 PM, Gus Heck  wrote:
>> 
>> And as I understand it, current behavior is the silent
>misinterpretation. To me, the failure to require a space after the
>regex (and either not become a regex in that case or complain about
>invalid regex) might be considered a bug...
>> 
>> On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood <mailto:markharw...@gmail.com>> wrote:
>> I think the decision comes down to choosing between silent
>(mis)interpratations of ambiguous queries or noisy failures..
>> 
>> On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler <mailto:u...@thetaphi.de>> wrote:
>> Hi,
>> 
>>  
>> 
>> My idea would have been not to bee too strict and instead only detect
>it as a regex if its separated. So /foo/bar and /foo/iphone would both
>go through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’
>would interpret the first token as regex.
>> 
>>  
>> 
>> That’s just my idea, not sure if it makes sense to have this relaxed
>parsing. I was always very skeptical of adding the regexes, as it
>breaks many queries. Now it’s even more.
>> 
>>  
>> 
>> Uwe
>> 
>>  
>> 
>> -
>> 
>> Uwe Schindler
>> 
>> Achterdiek 19, D-28357 Bremen
>> 
>> https://www.thetaphi.de <https://www.thetaphi.de/>
>> eMail: u...@thetaphi.de <mailto:u...@thetaphi.de>
>>  
>> 
>> From: Mark Harwood <mailto:markharw...@gmail.com>> 
>> Sent: Wednesday, September 16, 2020 6:45 PM
>> To: dev@lucene.apache.org <mailto:dev@lucene.apache.org>
>> Subject: Re: QueryParser - proposed change may break existing
>queries.
>> 
>>  
>> 
>> The strictness I was thinking of adding was to make all of the
>following error:
>> 
>>  /foo/bar
>> 
>>  /foo//bar/
>> 
>>  /foo/iphone 
>> 
>>  /foo/AND x
>> 
>>  
>> 
>> These would be allowed:
>> 
>>  /foo/i bar
>> 
>>  (/foo/ OR /bar/)
>> 
>>  (/foo/ OR /bar/i)
>> 
>>  /foo/^2
>> 
>>  /foo/i^2
>> 
>>  
>> 
>>  
>> 
>> 
>> 
>> 
>> On 16 Sep 2020, at 12:00, Uwe Schindler <mailto:u...@thetaphi.de>> wrote:
>> 
>> 
>> 
>> In my opinion, the proposed syntax change should enforce to have
>whitespace or any other separator chat after the regex “i” parameter.
>> 
>>  
>> 
>> Uwe
>> 
>>  
>> 
>> -
>> 
>> Uwe Schindler
>> 
>> Achterdiek 19, D-28357 Bremen
>> 
>> https://www.thetaphi.de <https://www.thetaphi.de/>
>> eMail: u...@thetaphi.de <mailto:u...@thetaphi.de>
>>  
>> 
>> From: Mark Harwood <mailto:markharw...@gmail.com>> 
>> Sent: Wednesday, September 16, 2020 11:04 AM
>> To: dev@lucene.apache.org <mailto:dev@lucene.apache.org>
>> Subject: QueryParser - proposed change may break existing queries.
>> 
>>  
>> 
>> In Lucene-9445 we'd like to add a case insensitive option to regex
>queries in the query parser of the form: 
>> 
>>/Foo/i
>> 
>>  
>> 
>> However, today people can search for :
>> 
>>  
>> 
>>/foo.com/index.html <http://foo.com/index.html>
>>  
>> 
>> and not get an error. The searcher may think this is a query for a
>URL but it's actually parsed as a regex "foo.com <http://foo.com/>"
>ORed with a term query.
>> 
>>  
>> 
>> I'd like to draw attention to this proposed change in behaviour
>because I think it could affect many existing systems. Arguably it may
>be a positive in drawing attention to a number of existing silent
>failures (unescaped searches for urls or file paths) but equally could
>be seen as a negative breaking change by some.
>> 
>>  
>> 
>> What is our BWC policy for changes to query parser?
>> 
>> Do the benefits of the proposed new regex feature outweigh the costs
>of the breakages in your view?
>> 
>>  
>> 
>>
>https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
><https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17196793>
>>  
>> 
>>  
>> 
>> 
>> 
>> -- 
>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/>
>(work)
>> http://www.the111shift.com <http://www.the111shift.com/> (play)

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: QueryParser - proposed change may break existing queries.

2020-09-17 Thread Steve Rowe
You could avoid (some of?) these problems by supporting /(?i)foo/ instead of 
/foo/i

--
Steve

> On Sep 17, 2020, at 1:55 PM, Gus Heck  wrote:
> 
> And as I understand it, current behavior is the silent misinterpretation. To 
> me, the failure to require a space after the regex (and either not become a 
> regex in that case or complain about invalid regex) might be considered a 
> bug...
> 
> On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood  <mailto:markharw...@gmail.com>> wrote:
> I think the decision comes down to choosing between silent 
> (mis)interpratations of ambiguous queries or noisy failures..
> 
> On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler  <mailto:u...@thetaphi.de>> wrote:
> Hi,
> 
>  
> 
> My idea would have been not to bee too strict and instead only detect it as a 
> regex if its separated. So /foo/bar and /foo/iphone would both go through and 
> ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would interpret the 
> first token as regex.
> 
>  
> 
> That’s just my idea, not sure if it makes sense to have this relaxed parsing. 
> I was always very skeptical of adding the regexes, as it breaks many queries. 
> Now it’s even more.
> 
>  
> 
> Uwe
> 
>  
> 
> -
> 
> Uwe Schindler
> 
> Achterdiek 19, D-28357 Bremen
> 
> https://www.thetaphi.de <https://www.thetaphi.de/>
> eMail: u...@thetaphi.de <mailto:u...@thetaphi.de>
>  
> 
> From: Mark Harwood mailto:markharw...@gmail.com>> 
> Sent: Wednesday, September 16, 2020 6:45 PM
> To: dev@lucene.apache.org <mailto:dev@lucene.apache.org>
> Subject: Re: QueryParser - proposed change may break existing queries.
> 
>  
> 
> The strictness I was thinking of adding was to make all of the following 
> error:
> 
>  /foo/bar
> 
>  /foo//bar/
> 
>  /foo/iphone 
> 
>  /foo/AND x
> 
>  
> 
> These would be allowed:
> 
>  /foo/i bar
> 
>  (/foo/ OR /bar/)
> 
>  (/foo/ OR /bar/i)
> 
>  /foo/^2
> 
>  /foo/i^2
> 
>  
> 
>  
> 
> 
> 
> 
> On 16 Sep 2020, at 12:00, Uwe Schindler  <mailto:u...@thetaphi.de>> wrote:
> 
> 
> 
> In my opinion, the proposed syntax change should enforce to have whitespace 
> or any other separator chat after the regex “i” parameter.
> 
>  
> 
> Uwe
> 
>  
> 
> -
> 
> Uwe Schindler
> 
> Achterdiek 19, D-28357 Bremen
> 
> https://www.thetaphi.de <https://www.thetaphi.de/>
> eMail: u...@thetaphi.de <mailto:u...@thetaphi.de>
>  
> 
> From: Mark Harwood mailto:markharw...@gmail.com>> 
> Sent: Wednesday, September 16, 2020 11:04 AM
> To: dev@lucene.apache.org <mailto:dev@lucene.apache.org>
> Subject: QueryParser - proposed change may break existing queries.
> 
>  
> 
> In Lucene-9445 we'd like to add a case insensitive option to regex queries in 
> the query parser of the form: 
> 
>/Foo/i
> 
>  
> 
> However, today people can search for :
> 
>  
> 
>/foo.com/index.html <http://foo.com/index.html>
>  
> 
> and not get an error. The searcher may think this is a query for a URL but 
> it's actually parsed as a regex "foo.com <http://foo.com/>" ORed with a term 
> query.
> 
>  
> 
> I'd like to draw attention to this proposed change in behaviour because I 
> think it could affect many existing systems. Arguably it may be a positive in 
> drawing attention to a number of existing silent failures (unescaped searches 
> for urls or file paths) but equally could be seen as a negative breaking 
> change by some.
> 
>  
> 
> What is our BWC policy for changes to query parser?
> 
> Do the benefits of the proposed new regex feature outweigh the costs of the 
> breakages in your view?
> 
>  
> 
> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
>  
> <https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17196793>
>  
> 
>  
> 
> 
> 
> -- 
> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> http://www.the111shift.com <http://www.the111shift.com/> (play)



Re: QueryParser - proposed change may break existing queries.

2020-09-17 Thread Gus Heck
And as I understand it, current behavior is the silent misinterpretation.
To me, the failure to require a space after the regex (and either not
become a regex in that case or complain about invalid regex) might be
considered a bug...

On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood  wrote:

> I think the decision comes down to choosing between silent
> (mis)interpratations of ambiguous queries or noisy failures..
>
> On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler  wrote:
>
>> Hi,
>>
>>
>>
>> My idea would have been not to bee too strict and instead only detect it
>> as a regex if its separated. So /foo/bar and /foo/iphone would both go
>> through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would
>> interpret the first token as regex.
>>
>>
>>
>> That’s just my idea, not sure if it makes sense to have this relaxed
>> parsing. I was always very skeptical of adding the regexes, as it breaks
>> many queries. Now it’s even more.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Mark Harwood 
>> *Sent:* Wednesday, September 16, 2020 6:45 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Re: QueryParser - proposed change may break existing queries.
>>
>>
>>
>> The strictness I was thinking of adding was to make all of the following
>> error:
>>
>>  /foo/bar
>>
>>  /foo//bar/
>>
>>  /foo/iphone
>>
>>  /foo/AND x
>>
>>
>>
>> These would be allowed:
>>
>>  /foo/i bar
>>
>>  (/foo/ OR /bar/)
>>
>>  (/foo/ OR /bar/i)
>>
>>  /foo/^2
>>
>>  /foo/i^2
>>
>>
>>
>>
>>
>>
>>
>> On 16 Sep 2020, at 12:00, Uwe Schindler  wrote:
>>
>> 
>>
>> In my opinion, the proposed syntax change should enforce to have
>> whitespace or any other separator chat after the regex “i” parameter.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Mark Harwood 
>> *Sent:* Wednesday, September 16, 2020 11:04 AM
>> *To:* dev@lucene.apache.org
>> *Subject:* QueryParser - proposed change may break existing queries.
>>
>>
>>
>> In Lucene-9445 we'd like to add a case insensitive option to regex
>> queries in the query parser of the form:
>>
>>/Foo/i
>>
>>
>>
>> However, today people can search for :
>>
>>
>>
>>/foo.com/index.html
>>
>>
>>
>> and not get an error. The searcher may think this is a query for a URL
>> but it's actually parsed as a regex "foo.com" ORed with a term query.
>>
>>
>>
>> I'd like to draw attention to this proposed change in behaviour because I
>> think it could affect many existing systems. Arguably it may be a positive
>> in drawing attention to a number of existing silent failures (unescaped
>> searches for urls or file paths) but equally could be seen as a negative
>> breaking change by some.
>>
>>
>>
>> What is our BWC policy for changes to query parser?
>>
>> Do the benefits of the proposed new regex feature outweigh the costs of
>> the breakages in your view?
>>
>>
>>
>>
>> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
>>
>>
>>
>>
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: QueryParser - proposed change may break existing queries.

2020-09-17 Thread Mark Harwood
I think the decision comes down to choosing between silent
(mis)interpratations of ambiguous queries or noisy failures..

On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler  wrote:

> Hi,
>
>
>
> My idea would have been not to bee too strict and instead only detect it
> as a regex if its separated. So /foo/bar and /foo/iphone would both go
> through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would
> interpret the first token as regex.
>
>
>
> That’s just my idea, not sure if it makes sense to have this relaxed
> parsing. I was always very skeptical of adding the regexes, as it breaks
> many queries. Now it’s even more.
>
>
>
> Uwe
>
>
>
> -
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Mark Harwood 
> *Sent:* Wednesday, September 16, 2020 6:45 PM
> *To:* dev@lucene.apache.org
> *Subject:* Re: QueryParser - proposed change may break existing queries.
>
>
>
> The strictness I was thinking of adding was to make all of the following
> error:
>
>  /foo/bar
>
>  /foo//bar/
>
>  /foo/iphone
>
>  /foo/AND x
>
>
>
> These would be allowed:
>
>  /foo/i bar
>
>  (/foo/ OR /bar/)
>
>  (/foo/ OR /bar/i)
>
>  /foo/^2
>
>  /foo/i^2
>
>
>
>
>
>
>
> On 16 Sep 2020, at 12:00, Uwe Schindler  wrote:
>
> 
>
> In my opinion, the proposed syntax change should enforce to have
> whitespace or any other separator chat after the regex “i” parameter.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Mark Harwood 
> *Sent:* Wednesday, September 16, 2020 11:04 AM
> *To:* dev@lucene.apache.org
> *Subject:* QueryParser - proposed change may break existing queries.
>
>
>
> In Lucene-9445 we'd like to add a case insensitive option to regex queries
> in the query parser of the form:
>
>/Foo/i
>
>
>
> However, today people can search for :
>
>
>
>/foo.com/index.html
>
>
>
> and not get an error. The searcher may think this is a query for a URL but
> it's actually parsed as a regex "foo.com" ORed with a term query.
>
>
>
> I'd like to draw attention to this proposed change in behaviour because I
> think it could affect many existing systems. Arguably it may be a positive
> in drawing attention to a number of existing silent failures (unescaped
> searches for urls or file paths) but equally could be seen as a negative
> breaking change by some.
>
>
>
> What is our BWC policy for changes to query parser?
>
> Do the benefits of the proposed new regex feature outweigh the costs of
> the breakages in your view?
>
>
>
>
> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
>
>
>
>
>
>


RE: QueryParser - proposed change may break existing queries.

2020-09-17 Thread Uwe Schindler
Hi,

 

My idea would have been not to bee too strict and instead only detect it as a 
regex if its separated. So /foo/bar and /foo/iphone would both go through and 
ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would interpret the 
first token as regex.

 

That’s just my idea, not sure if it makes sense to have this relaxed parsing. I 
was always very skeptical of adding the regexes, as it breaks many queries. Now 
it’s even more.

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Mark Harwood  
Sent: Wednesday, September 16, 2020 6:45 PM
To: dev@lucene.apache.org
Subject: Re: QueryParser - proposed change may break existing queries.

 

The strictness I was thinking of adding was to make all of the following error:

 /foo/bar

 /foo//bar/

 /foo/iphone 

 /foo/AND x

 

These would be allowed:

 /foo/i bar

 (/foo/ OR /bar/)

 (/foo/ OR /bar/i)

 /foo/^2

 /foo/i^2

 

 





On 16 Sep 2020, at 12:00, Uwe Schindler mailto:u...@thetaphi.de> > wrote:



In my opinion, the proposed syntax change should enforce to have whitespace or 
any other separator chat after the regex “i” parameter.

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: u...@thetaphi.de <mailto:u...@thetaphi.de> 

 

From: Mark Harwood mailto:markharw...@gmail.com> > 
Sent: Wednesday, September 16, 2020 11:04 AM
To: dev@lucene.apache.org <mailto:dev@lucene.apache.org> 
Subject: QueryParser - proposed change may break existing queries.

 

In Lucene-9445 we'd like to add a case insensitive option to regex queries in 
the query parser of the form: 

   /Foo/i

 

However, today people can search for :

 

   /foo.com/index.html <http://foo.com/index.html> 

 

and not get an error. The searcher may think this is a query for a URL but it's 
actually parsed as a regex "foo.com <http://foo.com> " ORed with a term query.

 

I'd like to draw attention to this proposed change in behaviour because I think 
it could affect many existing systems. Arguably it may be a positive in drawing 
attention to a number of existing silent failures (unescaped searches for urls 
or file paths) but equally could be seen as a negative breaking change by some.

 

What is our BWC policy for changes to query parser?

Do the benefits of the proposed new regex feature outweigh the costs of the 
breakages in your view?

 

https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793 
<https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793>
 
=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793

 

 



Re: QueryParser - proposed change may break existing queries.

2020-09-16 Thread Mark Harwood
The strictness I was thinking of adding was to make all of the following error:
 /foo/bar
 /foo//bar/
 /foo/iphone 
 /foo/AND x

These would be allowed:
 /foo/i bar
 (/foo/ OR /bar/)
 (/foo/ OR /bar/i)
 /foo/^2
 /foo/i^2

 

> On 16 Sep 2020, at 12:00, Uwe Schindler  wrote:
> 
> 
> In my opinion, the proposed syntax change should enforce to have whitespace 
> or any other separator chat after the regex “i” parameter.
>  
> Uwe
>  
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>  
> From: Mark Harwood  
> Sent: Wednesday, September 16, 2020 11:04 AM
> To: dev@lucene.apache.org
> Subject: QueryParser - proposed change may break existing queries.
>  
> In Lucene-9445 we'd like to add a case insensitive option to regex queries in 
> the query parser of the form: 
>/Foo/i
>  
> However, today people can search for :
>  
>/foo.com/index.html
>  
> and not get an error. The searcher may think this is a query for a URL but 
> it's actually parsed as a regex "foo.com" ORed with a term query.
>  
> I'd like to draw attention to this proposed change in behaviour because I 
> think it could affect many existing systems. Arguably it may be a positive in 
> drawing attention to a number of existing silent failures (unescaped searches 
> for urls or file paths) but equally could be seen as a negative breaking 
> change by some.
>  
> What is our BWC policy for changes to query parser?
> Do the benefits of the proposed new regex feature outweigh the costs of the 
> breakages in your view?
>  
> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
>  
>  


RE: QueryParser - proposed change may break existing queries.

2020-09-16 Thread Uwe Schindler
In my opinion, the proposed syntax change should enforce to have whitespace or 
any other separator chat after the regex “i” parameter.

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Mark Harwood  
Sent: Wednesday, September 16, 2020 11:04 AM
To: dev@lucene.apache.org
Subject: QueryParser - proposed change may break existing queries.

 

In Lucene-9445 we'd like to add a case insensitive option to regex queries in 
the query parser of the form: 

   /Foo/i

 

However, today people can search for :

 

   /foo.com/index.html <http://foo.com/index.html> 

 

and not get an error. The searcher may think this is a query for a URL but it's 
actually parsed as a regex "foo.com <http://foo.com> " ORed with a term query.

 

I'd like to draw attention to this proposed change in behaviour because I think 
it could affect many existing systems. Arguably it may be a positive in drawing 
attention to a number of existing silent failures (unescaped searches for urls 
or file paths) but equally could be seen as a negative breaking change by some.

 

What is our BWC policy for changes to query parser?

Do the benefits of the proposed new regex feature outweigh the costs of the 
breakages in your view?

 

https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793 
<https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793>
 
=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793

 

 



QueryParser - proposed change may break existing queries.

2020-09-16 Thread Mark Harwood
In Lucene-9445 we'd like to add a case insensitive option to regex queries
in the query parser of the form:
   /Foo/i

However, today people can search for :

   /foo.com/index.html

and not get an error. The searcher may think this is a query for a URL but
it's actually parsed as a regex "foo.com" ORed with a term query.

I'd like to draw attention to this proposed change in behaviour because I
think it could affect many existing systems. Arguably it may be a positive
in drawing attention to a number of existing silent failures (unescaped
searches for urls or file paths) but equally could be seen as a negative
breaking change by some.

What is our BWC policy for changes to query parser?
Do the benefits of the proposed new regex feature outweigh the costs of the
breakages in your view?

https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793