Chris M. Hostetter created LUCENE-9315:
------------------------------------------

             Summary: redfine (Classi & Standard) QueryParser semantics to be 
consistent: prioritize prefix op > infix op > default op
                 Key: LUCENE-9315
                 URL: https://issues.apache.org/jira/browse/LUCENE-9315
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/queryparser
            Reporter: Chris M. Hostetter


For as long as I can remember, the way QueryParser deals with the "infix" 
operators {{AND}} & {{OR}} hasn't made much sense unless they are used 
consistently to express pure boolean logic (ie: always explicitly specified, 
and never more then 2 clauses to a query). As soon as you have query strings 
where a BooleanQuery has more then 2 clauses, or you have query strings that 
mix {{AND}} & {{OR}} with the "prefix" {{+}} & {{-|NOT}} operators, or query 
strings where not every clause has an operator, or (absolutely the most 
confusing) you mix the types of operators _and_ change the QueryParser "default 
op" from {{OR}} to {{AND}} the behavior just becomes inpossible to make sense 
of for new users - and hard to explain/justify. (It's not precedence based, 
it's not left to right, it's just ... weird.)

The problem is so confusing to new users, that I wrote a blog post almost 10 
years ago (?!?) trying to convince people that using {{AND}} & {{OR}} was a 
terrible idea unless they were used only in strict boolean expressions)...

[https://lucidworks.com/post/why-not-and-or-and-not/]

...and yet it still regularly comes up as a point of confusion.

A lot this weird behavior seems to be historical artifact of how 
{{QueryParserBase.addClauses()}} works - a method whose basic semantics haven't 
really changed since Lucne 1.0.1, back before the introductiong of 
{{QueryParser.setDefaultOperator()}}. Some of those early choices seemed to be 
predicated on the idea that {{AND}} should take "precedence" (i use that term 
loosely) over {{OR}} as it parses clauses left to right, purely becuase {{OR}} 
was the "default" assumption (and had - and stll has - no corrisponding 
"prefix" operator). As functionality in QueryParser has grown, a lot of the 
assumptions made in the code and the resulting parse behavior really make no 
sense to users, particularly in "non trivial" query strings. In many cases, 
parse behavior that can seem "intentional" to new users, even for input where 
every clause is impacted by an explicit {{AND}} or {{OR}} operators, can 
suddenly be flipped on it's head when the "default operator" is changed (ex: 
"{{X AND Y OR Z}}"), or if the only the order of "clauses" in the string 
changes (ex: previous example vs "{{Z OR Y AND X}}") even though it's clear 
from other queries that there is no strict precedence of operators.
----
The "root" of the problem, as I see it, is that 
{{QueryParserBase.addClauses()}} allows {{AND}} & {{OR}} to modify the 
{{Occur}} property of the previously parsed {{BooleanClause}} depending on _if_ 
that {{BooleanClause.getOccur()}} value matches the "default operator" for the 
parser, w/o any considerationg to _why_ that that {{getOccur()}} value matches 
the "default operator" - ie: did it actually come from the "default" or was it 
explicitly set by something in the query string? (ie: a prior infix operator)
----
I propose that starting with Lucene 9.0, we redefine the semantics in 
{{QueryParserBase}} such that:
 * "Prefix" operators ({{+}} | {{-}} | {{NOT}}) always take precedence (over 
any "Infix" operator or QueryParser default) in setting the {{Occur}} value of 
the clause they prefix.
 * "Infix" operators ({{AND}} | {{OR}}) are evaluated left to right and used to 
set the {{Occur}} value of the clauses adjacent to them (that do not already 
have a {{Occur}} value set by a "Pefix" operator)
 * the {{QueryParser.getDefaultOperator()}} is only used to set the {{Occur}} 
value of any clause that did not get an {{Occur}} value assigned by either a 
prefix or (prior) infix operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to