Re: QueryParser - proposed change may break existing queries.
>You could avoid (some of?) these problems by supporting /(?i)foo/ instead of /foo/i That would avoid our parsing dilemma but brings some other concerns. This inline syntax can normally be used to selectively turn on case sensitivity for sections of a regex and then turn it off with (?-i). We could potentially implement this support in the underlying o.a.l.util.automaton.RegExp class. We changed that class recently to take a separate global flag alongside the regex string which can determine case sensitivity. I guess any inline (?i) syntax would override whatever default option had been passed in the constructor flag. That might be a hairy change though - the RegExp parser logic is hand-crafted rather than JavaCC. On Fri, Sep 18, 2020 at 7:47 AM Dawid Weiss wrote: > > If they try to use any other options then 'i' we throow a ParseException > > +1. Complex-syntax parsers should throw (human-palatable) exceptions > on syntax errors. A lenient, "naive user" query parser should be > separate and accept a very, very > rudimentary query syntax (so that there are literally no chances of > making a syntax error). > > D. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: QueryParser - proposed change may break existing queries.
> If they try to use any other options then 'i' we throow a ParseException +1. Complex-syntax parsers should throw (human-palatable) exceptions on syntax errors. A lenient, "naive user" query parser should be separate and accept a very, very rudimentary query syntax (so that there are literally no chances of making a syntax error). D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: QueryParser - proposed change may break existing queries.
: And as I understand it, current behavior is the silent misinterpretation. : To me, the failure to require a space after the regex (and either not : become a regex in that case or complain about invalid regex) might be : considered a bug... I would agree ... : >> However, today people can search for : : >>/foo.com/index.html : >> and not get an error. The searcher may think this is a query for a URL : >> but it's actually parsed as a regex "foo.com" ORed with a term query. ... i didn't realize that was happening. To me that seems like it should definitely be considered a bug, and the "regex" branch of the grammer shouldn't be used if there is any unexpected characters after the closing "/" ... the current behavior Mark is describgin seems analogous to the grammer assuming "WESS ANDERSON" should be parsed as "WESS +DERSON" : > You could avoid (some of?) these problems by supporting /(?i)foo/ : > instead of /foo/i : : I like this idea. The only downside is that folks will tend to think : it's a full Java Pattern and try other options. :) If they try to use any other options then 'i' we throow a ParseException -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: QueryParser - proposed change may break existing queries.
I like this idea. The only downside is that folks will tend to think it's a full Java Pattern and try other options. :) On Thu, Sep 17, 2020 at 9:09 PM Steve Rowe wrote: > > You could avoid (some of?) these problems by supporting /(?i)foo/ instead of > /foo/i > > -- > Steve > > On Sep 17, 2020, at 1:55 PM, Gus Heck wrote: > > And as I understand it, current behavior is the silent misinterpretation. To > me, the failure to require a space after the regex (and either not become a > regex in that case or complain about invalid regex) might be considered a > bug... > > On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood wrote: >> >> I think the decision comes down to choosing between silent >> (mis)interpratations of ambiguous queries or noisy failures.. >> >> On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler wrote: >>> >>> Hi, >>> >>> >>> >>> My idea would have been not to bee too strict and instead only detect it as >>> a regex if its separated. So /foo/bar and /foo/iphone would both go through >>> and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would interpret >>> the first token as regex. >>> >>> >>> >>> That’s just my idea, not sure if it makes sense to have this relaxed >>> parsing. I was always very skeptical of adding the regexes, as it breaks >>> many queries. Now it’s even more. >>> >>> >>> >>> Uwe >>> >>> >>> >>> ----- >>> >>> Uwe Schindler >>> >>> Achterdiek 19, D-28357 Bremen >>> >>> https://www.thetaphi.de >>> >>> eMail: u...@thetaphi.de >>> >>> >>> >>> From: Mark Harwood >>> Sent: Wednesday, September 16, 2020 6:45 PM >>> To: dev@lucene.apache.org >>> Subject: Re: QueryParser - proposed change may break existing queries. >>> >>> >>> >>> The strictness I was thinking of adding was to make all of the following >>> error: >>> >>> /foo/bar >>> >>> /foo//bar/ >>> >>> /foo/iphone >>> >>> /foo/AND x >>> >>> >>> >>> These would be allowed: >>> >>> /foo/i bar >>> >>> (/foo/ OR /bar/) >>> >>> (/foo/ OR /bar/i) >>> >>> /foo/^2 >>> >>> /foo/i^2 >>> >>> >>> >>> >>> >>> >>> >>> On 16 Sep 2020, at 12:00, Uwe Schindler wrote: >>> >>> >>> >>> In my opinion, the proposed syntax change should enforce to have whitespace >>> or any other separator chat after the regex “i” parameter. >>> >>> >>> >>> Uwe >>> >>> >>> >>> - >>> >>> Uwe Schindler >>> >>> Achterdiek 19, D-28357 Bremen >>> >>> https://www.thetaphi.de >>> >>> eMail: u...@thetaphi.de >>> >>> >>> >>> From: Mark Harwood >>> Sent: Wednesday, September 16, 2020 11:04 AM >>> To: dev@lucene.apache.org >>> Subject: QueryParser - proposed change may break existing queries. >>> >>> >>> >>> In Lucene-9445 we'd like to add a case insensitive option to regex queries >>> in the query parser of the form: >>> >>>/Foo/i >>> >>> >>> >>> However, today people can search for : >>> >>> >>> >>>/foo.com/index.html >>> >>> >>> >>> and not get an error. The searcher may think this is a query for a URL but >>> it's actually parsed as a regex "foo.com" ORed with a term query. >>> >>> >>> >>> I'd like to draw attention to this proposed change in behaviour because I >>> think it could affect many existing systems. Arguably it may be a positive >>> in drawing attention to a number of existing silent failures (unescaped >>> searches for urls or file paths) but equally could be seen as a negative >>> breaking change by some. >>> >>> >>> >>> What is our BWC policy for changes to query parser? >>> >>> Do the benefits of the proposed new regex feature outweigh the costs of the >>> breakages in your view? >>> >>> >>> >>> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793 >>> >>> >>> >>> > > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: QueryParser - proposed change may break existing queries.
That's a much better idea, I like it. It's basically what Javas regex parser in the Pattern class also does. If we do this we won't even need a syntax change. Uwe Am September 17, 2020 7:09:18 PM UTC schrieb Steve Rowe : >You could avoid (some of?) these problems by supporting /(?i)foo/ >instead of /foo/i > >-- >Steve > >> On Sep 17, 2020, at 1:55 PM, Gus Heck wrote: >> >> And as I understand it, current behavior is the silent >misinterpretation. To me, the failure to require a space after the >regex (and either not become a regex in that case or complain about >invalid regex) might be considered a bug... >> >> On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood <mailto:markharw...@gmail.com>> wrote: >> I think the decision comes down to choosing between silent >(mis)interpratations of ambiguous queries or noisy failures.. >> >> On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler <mailto:u...@thetaphi.de>> wrote: >> Hi, >> >> >> >> My idea would have been not to bee too strict and instead only detect >it as a regex if its separated. So /foo/bar and /foo/iphone would both >go through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ >would interpret the first token as regex. >> >> >> >> That’s just my idea, not sure if it makes sense to have this relaxed >parsing. I was always very skeptical of adding the regexes, as it >breaks many queries. Now it’s even more. >> >> >> >> Uwe >> >> >> >> - >> >> Uwe Schindler >> >> Achterdiek 19, D-28357 Bremen >> >> https://www.thetaphi.de <https://www.thetaphi.de/> >> eMail: u...@thetaphi.de <mailto:u...@thetaphi.de> >> >> >> From: Mark Harwood <mailto:markharw...@gmail.com>> >> Sent: Wednesday, September 16, 2020 6:45 PM >> To: dev@lucene.apache.org <mailto:dev@lucene.apache.org> >> Subject: Re: QueryParser - proposed change may break existing >queries. >> >> >> >> The strictness I was thinking of adding was to make all of the >following error: >> >> /foo/bar >> >> /foo//bar/ >> >> /foo/iphone >> >> /foo/AND x >> >> >> >> These would be allowed: >> >> /foo/i bar >> >> (/foo/ OR /bar/) >> >> (/foo/ OR /bar/i) >> >> /foo/^2 >> >> /foo/i^2 >> >> >> >> >> >> >> >> >> On 16 Sep 2020, at 12:00, Uwe Schindler <mailto:u...@thetaphi.de>> wrote: >> >> >> >> In my opinion, the proposed syntax change should enforce to have >whitespace or any other separator chat after the regex “i” parameter. >> >> >> >> Uwe >> >> >> >> - >> >> Uwe Schindler >> >> Achterdiek 19, D-28357 Bremen >> >> https://www.thetaphi.de <https://www.thetaphi.de/> >> eMail: u...@thetaphi.de <mailto:u...@thetaphi.de> >> >> >> From: Mark Harwood <mailto:markharw...@gmail.com>> >> Sent: Wednesday, September 16, 2020 11:04 AM >> To: dev@lucene.apache.org <mailto:dev@lucene.apache.org> >> Subject: QueryParser - proposed change may break existing queries. >> >> >> >> In Lucene-9445 we'd like to add a case insensitive option to regex >queries in the query parser of the form: >> >>/Foo/i >> >> >> >> However, today people can search for : >> >> >> >>/foo.com/index.html <http://foo.com/index.html> >> >> >> and not get an error. The searcher may think this is a query for a >URL but it's actually parsed as a regex "foo.com <http://foo.com/>" >ORed with a term query. >> >> >> >> I'd like to draw attention to this proposed change in behaviour >because I think it could affect many existing systems. Arguably it may >be a positive in drawing attention to a number of existing silent >failures (unescaped searches for urls or file paths) but equally could >be seen as a negative breaking change by some. >> >> >> >> What is our BWC policy for changes to query parser? >> >> Do the benefits of the proposed new regex feature outweigh the costs >of the breakages in your view? >> >> >> >> >https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793 ><https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17196793> >> >> >> >> >> >> >> -- >> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> >(work) >> http://www.the111shift.com <http://www.the111shift.com/> (play) -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de
Re: QueryParser - proposed change may break existing queries.
You could avoid (some of?) these problems by supporting /(?i)foo/ instead of /foo/i -- Steve > On Sep 17, 2020, at 1:55 PM, Gus Heck wrote: > > And as I understand it, current behavior is the silent misinterpretation. To > me, the failure to require a space after the regex (and either not become a > regex in that case or complain about invalid regex) might be considered a > bug... > > On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood <mailto:markharw...@gmail.com>> wrote: > I think the decision comes down to choosing between silent > (mis)interpratations of ambiguous queries or noisy failures.. > > On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler <mailto:u...@thetaphi.de>> wrote: > Hi, > > > > My idea would have been not to bee too strict and instead only detect it as a > regex if its separated. So /foo/bar and /foo/iphone would both go through and > ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would interpret the > first token as regex. > > > > That’s just my idea, not sure if it makes sense to have this relaxed parsing. > I was always very skeptical of adding the regexes, as it breaks many queries. > Now it’s even more. > > > > Uwe > > > > - > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de <https://www.thetaphi.de/> > eMail: u...@thetaphi.de <mailto:u...@thetaphi.de> > > > From: Mark Harwood mailto:markharw...@gmail.com>> > Sent: Wednesday, September 16, 2020 6:45 PM > To: dev@lucene.apache.org <mailto:dev@lucene.apache.org> > Subject: Re: QueryParser - proposed change may break existing queries. > > > > The strictness I was thinking of adding was to make all of the following > error: > > /foo/bar > > /foo//bar/ > > /foo/iphone > > /foo/AND x > > > > These would be allowed: > > /foo/i bar > > (/foo/ OR /bar/) > > (/foo/ OR /bar/i) > > /foo/^2 > > /foo/i^2 > > > > > > > > > On 16 Sep 2020, at 12:00, Uwe Schindler <mailto:u...@thetaphi.de>> wrote: > > > > In my opinion, the proposed syntax change should enforce to have whitespace > or any other separator chat after the regex “i” parameter. > > > > Uwe > > > > - > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de <https://www.thetaphi.de/> > eMail: u...@thetaphi.de <mailto:u...@thetaphi.de> > > > From: Mark Harwood mailto:markharw...@gmail.com>> > Sent: Wednesday, September 16, 2020 11:04 AM > To: dev@lucene.apache.org <mailto:dev@lucene.apache.org> > Subject: QueryParser - proposed change may break existing queries. > > > > In Lucene-9445 we'd like to add a case insensitive option to regex queries in > the query parser of the form: > >/Foo/i > > > > However, today people can search for : > > > >/foo.com/index.html <http://foo.com/index.html> > > > and not get an error. The searcher may think this is a query for a URL but > it's actually parsed as a regex "foo.com <http://foo.com/>" ORed with a term > query. > > > > I'd like to draw attention to this proposed change in behaviour because I > think it could affect many existing systems. Arguably it may be a positive in > drawing attention to a number of existing silent failures (unescaped searches > for urls or file paths) but equally could be seen as a negative breaking > change by some. > > > > What is our BWC policy for changes to query parser? > > Do the benefits of the proposed new regex feature outweigh the costs of the > breakages in your view? > > > > https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793 > > <https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17196793> > > > > > > > -- > http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work) > http://www.the111shift.com <http://www.the111shift.com/> (play)
Re: QueryParser - proposed change may break existing queries.
And as I understand it, current behavior is the silent misinterpretation. To me, the failure to require a space after the regex (and either not become a regex in that case or complain about invalid regex) might be considered a bug... On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood wrote: > I think the decision comes down to choosing between silent > (mis)interpratations of ambiguous queries or noisy failures.. > > On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler wrote: > >> Hi, >> >> >> >> My idea would have been not to bee too strict and instead only detect it >> as a regex if its separated. So /foo/bar and /foo/iphone would both go >> through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would >> interpret the first token as regex. >> >> >> >> That’s just my idea, not sure if it makes sense to have this relaxed >> parsing. I was always very skeptical of adding the regexes, as it breaks >> many queries. Now it’s even more. >> >> >> >> Uwe >> >> >> >> - >> >> Uwe Schindler >> >> Achterdiek 19, D-28357 Bremen >> >> https://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >> *From:* Mark Harwood >> *Sent:* Wednesday, September 16, 2020 6:45 PM >> *To:* dev@lucene.apache.org >> *Subject:* Re: QueryParser - proposed change may break existing queries. >> >> >> >> The strictness I was thinking of adding was to make all of the following >> error: >> >> /foo/bar >> >> /foo//bar/ >> >> /foo/iphone >> >> /foo/AND x >> >> >> >> These would be allowed: >> >> /foo/i bar >> >> (/foo/ OR /bar/) >> >> (/foo/ OR /bar/i) >> >> /foo/^2 >> >> /foo/i^2 >> >> >> >> >> >> >> >> On 16 Sep 2020, at 12:00, Uwe Schindler wrote: >> >> >> >> In my opinion, the proposed syntax change should enforce to have >> whitespace or any other separator chat after the regex “i” parameter. >> >> >> >> Uwe >> >> >> >> - >> >> Uwe Schindler >> >> Achterdiek 19, D-28357 Bremen >> >> https://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >> *From:* Mark Harwood >> *Sent:* Wednesday, September 16, 2020 11:04 AM >> *To:* dev@lucene.apache.org >> *Subject:* QueryParser - proposed change may break existing queries. >> >> >> >> In Lucene-9445 we'd like to add a case insensitive option to regex >> queries in the query parser of the form: >> >>/Foo/i >> >> >> >> However, today people can search for : >> >> >> >>/foo.com/index.html >> >> >> >> and not get an error. The searcher may think this is a query for a URL >> but it's actually parsed as a regex "foo.com" ORed with a term query. >> >> >> >> I'd like to draw attention to this proposed change in behaviour because I >> think it could affect many existing systems. Arguably it may be a positive >> in drawing attention to a number of existing silent failures (unescaped >> searches for urls or file paths) but equally could be seen as a negative >> breaking change by some. >> >> >> >> What is our BWC policy for changes to query parser? >> >> Do the benefits of the proposed new regex feature outweigh the costs of >> the breakages in your view? >> >> >> >> >> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793 >> >> >> >> >> >> -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
Re: QueryParser - proposed change may break existing queries.
I think the decision comes down to choosing between silent (mis)interpratations of ambiguous queries or noisy failures.. On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler wrote: > Hi, > > > > My idea would have been not to bee too strict and instead only detect it > as a regex if its separated. So /foo/bar and /foo/iphone would both go > through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would > interpret the first token as regex. > > > > That’s just my idea, not sure if it makes sense to have this relaxed > parsing. I was always very skeptical of adding the regexes, as it breaks > many queries. Now it’s even more. > > > > Uwe > > > > - > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > *From:* Mark Harwood > *Sent:* Wednesday, September 16, 2020 6:45 PM > *To:* dev@lucene.apache.org > *Subject:* Re: QueryParser - proposed change may break existing queries. > > > > The strictness I was thinking of adding was to make all of the following > error: > > /foo/bar > > /foo//bar/ > > /foo/iphone > > /foo/AND x > > > > These would be allowed: > > /foo/i bar > > (/foo/ OR /bar/) > > (/foo/ OR /bar/i) > > /foo/^2 > > /foo/i^2 > > > > > > > > On 16 Sep 2020, at 12:00, Uwe Schindler wrote: > > > > In my opinion, the proposed syntax change should enforce to have > whitespace or any other separator chat after the regex “i” parameter. > > > > Uwe > > > > ----- > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > *From:* Mark Harwood > *Sent:* Wednesday, September 16, 2020 11:04 AM > *To:* dev@lucene.apache.org > *Subject:* QueryParser - proposed change may break existing queries. > > > > In Lucene-9445 we'd like to add a case insensitive option to regex queries > in the query parser of the form: > >/Foo/i > > > > However, today people can search for : > > > >/foo.com/index.html > > > > and not get an error. The searcher may think this is a query for a URL but > it's actually parsed as a regex "foo.com" ORed with a term query. > > > > I'd like to draw attention to this proposed change in behaviour because I > think it could affect many existing systems. Arguably it may be a positive > in drawing attention to a number of existing silent failures (unescaped > searches for urls or file paths) but equally could be seen as a negative > breaking change by some. > > > > What is our BWC policy for changes to query parser? > > Do the benefits of the proposed new regex feature outweigh the costs of > the breakages in your view? > > > > > https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793 > > > > > >
RE: QueryParser - proposed change may break existing queries.
Hi, My idea would have been not to bee too strict and instead only detect it as a regex if its separated. So /foo/bar and /foo/iphone would both go through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would interpret the first token as regex. That’s just my idea, not sure if it makes sense to have this relaxed parsing. I was always very skeptical of adding the regexes, as it breaks many queries. Now it’s even more. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de From: Mark Harwood Sent: Wednesday, September 16, 2020 6:45 PM To: dev@lucene.apache.org Subject: Re: QueryParser - proposed change may break existing queries. The strictness I was thinking of adding was to make all of the following error: /foo/bar /foo//bar/ /foo/iphone /foo/AND x These would be allowed: /foo/i bar (/foo/ OR /bar/) (/foo/ OR /bar/i) /foo/^2 /foo/i^2 On 16 Sep 2020, at 12:00, Uwe Schindler mailto:u...@thetaphi.de> > wrote: In my opinion, the proposed syntax change should enforce to have whitespace or any other separator chat after the regex “i” parameter. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de <mailto:u...@thetaphi.de> From: Mark Harwood mailto:markharw...@gmail.com> > Sent: Wednesday, September 16, 2020 11:04 AM To: dev@lucene.apache.org <mailto:dev@lucene.apache.org> Subject: QueryParser - proposed change may break existing queries. In Lucene-9445 we'd like to add a case insensitive option to regex queries in the query parser of the form: /Foo/i However, today people can search for : /foo.com/index.html <http://foo.com/index.html> and not get an error. The searcher may think this is a query for a URL but it's actually parsed as a regex "foo.com <http://foo.com> " ORed with a term query. I'd like to draw attention to this proposed change in behaviour because I think it could affect many existing systems. Arguably it may be a positive in drawing attention to a number of existing silent failures (unescaped searches for urls or file paths) but equally could be seen as a negative breaking change by some. What is our BWC policy for changes to query parser? Do the benefits of the proposed new regex feature outweigh the costs of the breakages in your view? https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793 <https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793> =com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
Re: QueryParser - proposed change may break existing queries.
The strictness I was thinking of adding was to make all of the following error: /foo/bar /foo//bar/ /foo/iphone /foo/AND x These would be allowed: /foo/i bar (/foo/ OR /bar/) (/foo/ OR /bar/i) /foo/^2 /foo/i^2 > On 16 Sep 2020, at 12:00, Uwe Schindler wrote: > > > In my opinion, the proposed syntax change should enforce to have whitespace > or any other separator chat after the regex “i” parameter. > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > From: Mark Harwood > Sent: Wednesday, September 16, 2020 11:04 AM > To: dev@lucene.apache.org > Subject: QueryParser - proposed change may break existing queries. > > In Lucene-9445 we'd like to add a case insensitive option to regex queries in > the query parser of the form: >/Foo/i > > However, today people can search for : > >/foo.com/index.html > > and not get an error. The searcher may think this is a query for a URL but > it's actually parsed as a regex "foo.com" ORed with a term query. > > I'd like to draw attention to this proposed change in behaviour because I > think it could affect many existing systems. Arguably it may be a positive in > drawing attention to a number of existing silent failures (unescaped searches > for urls or file paths) but equally could be seen as a negative breaking > change by some. > > What is our BWC policy for changes to query parser? > Do the benefits of the proposed new regex feature outweigh the costs of the > breakages in your view? > > https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793 > >
RE: QueryParser - proposed change may break existing queries.
In my opinion, the proposed syntax change should enforce to have whitespace or any other separator chat after the regex “i” parameter. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de From: Mark Harwood Sent: Wednesday, September 16, 2020 11:04 AM To: dev@lucene.apache.org Subject: QueryParser - proposed change may break existing queries. In Lucene-9445 we'd like to add a case insensitive option to regex queries in the query parser of the form: /Foo/i However, today people can search for : /foo.com/index.html <http://foo.com/index.html> and not get an error. The searcher may think this is a query for a URL but it's actually parsed as a regex "foo.com <http://foo.com> " ORed with a term query. I'd like to draw attention to this proposed change in behaviour because I think it could affect many existing systems. Arguably it may be a positive in drawing attention to a number of existing silent failures (unescaped searches for urls or file paths) but equally could be seen as a negative breaking change by some. What is our BWC policy for changes to query parser? Do the benefits of the proposed new regex feature outweigh the costs of the breakages in your view? https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793 <https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793> =com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
QueryParser - proposed change may break existing queries.
In Lucene-9445 we'd like to add a case insensitive option to regex queries in the query parser of the form: /Foo/i However, today people can search for : /foo.com/index.html and not get an error. The searcher may think this is a query for a URL but it's actually parsed as a regex "foo.com" ORed with a term query. I'd like to draw attention to this proposed change in behaviour because I think it could affect many existing systems. Arguably it may be a positive in drawing attention to a number of existing silent failures (unescaped searches for urls or file paths) but equally could be seen as a negative breaking change by some. What is our BWC policy for changes to query parser? Do the benefits of the proposed new regex feature outweigh the costs of the breakages in your view? https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793