#281: Provide more helpful messages for gibberish syntax
------------------------+---------------------------------------------------
Reporter: jblayloc | Owner: jblayloc
Type: defect | Status: new
Priority: minor | Milestone:
Component: WebSearch | Version:
Resolution: | Keywords: Invenio INSPIRE Syntax
------------------------+---------------------------------------------------
Comment (by simko):
Note that I said it only //may// make sense, not that is necessarily
does :). It is profitable to look at the issue from the syntactic
sugar perspective as well, where `AND` is kind of equivalent to `+`,
`NOT` to `-`, and `OR` to `|`. [[http://invenio-demo.cern.ch/help/search-
guide#boolean]]
Using this perspective, people can think of `+ellis` as a word
inclusion query, `-ellis as a word exclusion query. Which is why
searches like `+ellis` may seem perfectly reasonable even if they may
make little sense from the Boolean logic perspective ("find me (what?)
AND ellis"). It feels like an operand is missing from the Boolean
logic point of view. However, this is mostly true of the AND and OR
operators; the NOT operator may seem reasonable, because it can be
viewed upon as a negation operator working on a single operand.
(And, from the point of view of discussion about ticket:131, note that
the NOT operator is still ill-treated, e.g. `(ellis (NOT muon))` leads
to a parsing troubles, as I mentioned in ticket:131#comment:22.)
Going beyond the NOT operator, and looking at it from the word
inclusion perspective mentioned above, the AND operator still //may//
make sense. Which leaves us especially with the OR operator as having
the biggest confusing potential here. The current behaviour of the
`|' operator with a single operand roughly follows simple consistency
reasons with respect to the behaviour of NOT and AND operators. In a
nutshell:
* empty query currently gives everything (which is handy for
collection trees);
* `-word` (logically `NOT word`) gives everything excluding hits
containing `word` (logically empty query (=everything) NOT `word`)
* `+word` (logically `AND word`) gives only hits including `word`
(logically empty query (=everything) AND `word`)
* `|word` (logically `OR word`) gives, by analogy, an empty query
(=everything) OR `word`, hence everything.
I agree it may seem weird, especially the very last case. It is just
one behavioural possibility that we had chosen in the past to
represent this syntactic sugar; another choice may be more
appropriate. For example, we could say that an empty query is
ill-defined, and not accept it to boot, which would modify the
ulterior logic of things. We could also simply refuse to accept some
of the above queries in absence of the second operator. Though we may
also enter into some definition problems here, namely how to interpret
syntactic sugar queries like `+word` or `-word`. E.g. Google has
chosen to accept `+word` but not `-word`. Note that we differ from
Google in this respect, as I mentioned elswehere; we behave more like
Lucene/Solr does, where query `+word` or `-word` seem to do quite like
what we do.
--
Ticket URL: <http://invenio-software.org/ticket/281#comment:4>
Invenio <http://invenio-software.org>