[ 
https://issues.apache.org/jira/browse/LUCENE-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081105#comment-17081105
 ] 

David Smiley commented on LUCENE-9315:
--------------------------------------

+1 and you rock!

> redfine (Classi & Standard) QueryParser semantics to be consistent: 
> prioritize prefix op > infix op > default op
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9315
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9315
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: LUCENE-9315.patch
>
>
> For as long as I can remember, the way QueryParser deals with the "infix" 
> operators {{AND}} & {{OR}} hasn't made much sense unless they are used 
> consistently to express pure boolean logic (ie: always explicitly specified, 
> and never more then 2 clauses to a query). As soon as you have query strings 
> where a BooleanQuery has more then 2 clauses, or you have query strings that 
> mix {{AND}} & {{OR}} with the "prefix" {{+}} & {{-|NOT}} operators, or query 
> strings where not every clause has an operator, or (absolutely the most 
> confusing) you mix the types of operators _and_ change the QueryParser 
> "default op" from {{OR}} to {{AND}} the behavior just becomes inpossible to 
> make sense of for new users - and hard to explain/justify. (It's not 
> precedence based, it's not left to right, it's just ... weird.)
> The problem is so confusing to new users, that I wrote a blog post almost 10 
> years ago (?!?) trying to convince people that using {{AND}} & {{OR}} was a 
> terrible idea unless they were used only in strict boolean expressions)...
> [https://lucidworks.com/post/why-not-and-or-and-not/]
> ...and yet it still regularly comes up as a point of confusion.
> A lot this weird behavior seems to be historical artifact of how 
> {{QueryParserBase.addClauses()}} works - a method whose basic semantics 
> haven't really changed since Lucne 1.0.1, back before the introductiong of 
> {{QueryParser.setDefaultOperator()}}. Some of those early choices seemed to 
> be predicated on the idea that {{AND}} should take "precedence" (i use that 
> term loosely) over {{OR}} as it parses clauses left to right, purely becuase 
> {{OR}} was the "default" assumption (and had - and stll has - no 
> corrisponding "prefix" operator). As functionality in QueryParser has grown, 
> a lot of the assumptions made in the code and the resulting parse behavior 
> really make no sense to users, particularly in "non trivial" query strings. 
> In many cases, parse behavior that can seem "intentional" to new users, even 
> for input where every clause is impacted by an explicit {{AND}} or {{OR}} 
> operators, can suddenly be flipped on it's head when the "default operator" 
> is changed (ex: "{{X AND Y OR Z}}"), or if the only the order of "clauses" in 
> the string changes (ex: previous example vs "{{Z OR Y AND X}}") even though 
> it's clear from other queries that there is no strict precedence of operators.
> ----
> The "root" of the problem, as I see it, is that 
> {{QueryParserBase.addClauses()}} allows {{AND}} & {{OR}} to modify the 
> {{Occur}} property of the previously parsed {{BooleanClause}} depending on 
> _if_ that {{BooleanClause.getOccur()}} value matches the "default operator" 
> for the parser, w/o any considerationg to _why_ that that {{getOccur()}} 
> value matches the "default operator" - ie: did it actually come from the 
> "default" or was it explicitly set by something in the query string? (ie: a 
> prior infix operator)
> ----
> I propose that starting with Lucene 9.0, we redefine the semantics in 
> {{QueryParserBase}} such that:
>  * "Prefix" operators ({{+}} | {{-}} | {{NOT}}) always take precedence (over 
> any "Infix" operator or QueryParser default) in setting the {{Occur}} value 
> of the clause they prefix.
>  * "Infix" operators ({{AND}} | {{OR}}) are evaluated left to right and used 
> to set the {{Occur}} value of the clauses adjacent to them (that do not 
> already have a {{Occur}} value set by a "Pefix" operator)
>  * the {{QueryParser.getDefaultOperator()}} is only used to set the {{Occur}} 
> value of any clause that did not get an {{Occur}} value assigned by either a 
> prefix or (prior) infix operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to