[ https://issues.apache.org/jira/browse/LUCENE-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Smiley updated LUCENE-9315: --------------------------------- Summary: redefine (Classic & Standard) QueryParser semantics to be consistent: prioritize prefix op > infix op > default op (was: redfine (Classi & Standard) QueryParser semantics to be consistent: prioritize prefix op > infix op > default op) > redefine (Classic & Standard) QueryParser semantics to be consistent: > prioritize prefix op > infix op > default op > ------------------------------------------------------------------------------------------------------------------ > > Key: LUCENE-9315 > URL: https://issues.apache.org/jira/browse/LUCENE-9315 > Project: Lucene - Core > Issue Type: Improvement > Components: core/queryparser > Reporter: Chris M. Hostetter > Priority: Major > Attachments: LUCENE-9315.patch > > > For as long as I can remember, the way QueryParser deals with the "infix" > operators {{AND}} & {{OR}} hasn't made much sense unless they are used > consistently to express pure boolean logic (ie: always explicitly specified, > and never more then 2 clauses to a query). As soon as you have query strings > where a BooleanQuery has more then 2 clauses, or you have query strings that > mix {{AND}} & {{OR}} with the "prefix" {{+}} & {{-|NOT}} operators, or query > strings where not every clause has an operator, or (absolutely the most > confusing) you mix the types of operators _and_ change the QueryParser > "default op" from {{OR}} to {{AND}} the behavior just becomes inpossible to > make sense of for new users - and hard to explain/justify. (It's not > precedence based, it's not left to right, it's just ... weird.) > The problem is so confusing to new users, that I wrote a blog post almost 10 > years ago (?!?) trying to convince people that using {{AND}} & {{OR}} was a > terrible idea unless they were used only in strict boolean expressions)... > [https://lucidworks.com/post/why-not-and-or-and-not/] > ...and yet it still regularly comes up as a point of confusion. > A lot this weird behavior seems to be historical artifact of how > {{QueryParserBase.addClauses()}} works - a method whose basic semantics > haven't really changed since Lucne 1.0.1, back before the introductiong of > {{QueryParser.setDefaultOperator()}}. Some of those early choices seemed to > be predicated on the idea that {{AND}} should take "precedence" (i use that > term loosely) over {{OR}} as it parses clauses left to right, purely becuase > {{OR}} was the "default" assumption (and had - and stll has - no > corrisponding "prefix" operator). As functionality in QueryParser has grown, > a lot of the assumptions made in the code and the resulting parse behavior > really make no sense to users, particularly in "non trivial" query strings. > In many cases, parse behavior that can seem "intentional" to new users, even > for input where every clause is impacted by an explicit {{AND}} or {{OR}} > operators, can suddenly be flipped on it's head when the "default operator" > is changed (ex: "{{X AND Y OR Z}}"), or if the only the order of "clauses" in > the string changes (ex: previous example vs "{{Z OR Y AND X}}") even though > it's clear from other queries that there is no strict precedence of operators. > ---- > The "root" of the problem, as I see it, is that > {{QueryParserBase.addClauses()}} allows {{AND}} & {{OR}} to modify the > {{Occur}} property of the previously parsed {{BooleanClause}} depending on > _if_ that {{BooleanClause.getOccur()}} value matches the "default operator" > for the parser, w/o any considerationg to _why_ that that {{getOccur()}} > value matches the "default operator" - ie: did it actually come from the > "default" or was it explicitly set by something in the query string? (ie: a > prior infix operator) > ---- > I propose that starting with Lucene 9.0, we redefine the semantics in > {{QueryParserBase}} such that: > * "Prefix" operators ({{+}} | {{-}} | {{NOT}}) always take precedence (over > any "Infix" operator or QueryParser default) in setting the {{Occur}} value > of the clause they prefix. > * "Infix" operators ({{AND}} | {{OR}}) are evaluated left to right and used > to set the {{Occur}} value of the clauses adjacent to them (that do not > already have a {{Occur}} value set by a "Pefix" operator) > * the {{QueryParser.getDefaultOperator()}} is only used to set the {{Occur}} > value of any clause that did not get an {{Occur}} value assigned by either a > prefix or (prior) infix operator. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org