#281: Provide more helpful messages for gibberish syntax
------------------------+---------------------------------------------------
  Reporter:  jblayloc   |       Owner:  jblayloc              
      Type:  defect     |      Status:  new                   
  Priority:  minor      |   Milestone:                        
 Component:  WebSearch  |     Version:                        
Resolution:             |    Keywords:  Invenio INSPIRE Syntax
------------------------+---------------------------------------------------

Comment (by simko):

 Note that I said it only //may// make sense, not that is necessarily
 does :).  It is profitable to look at the issue from the syntactic
 sugar perspective as well, where `AND` is kind of equivalent to `+`,
 `NOT` to `-`, and `OR` to `|`. [[http://invenio-demo.cern.ch/help/search-
 guide#boolean]]

 Using this perspective, people can think of `+ellis` as a word
 inclusion query, `-ellis as a word exclusion query.  Which is why
 searches like `+ellis` may seem perfectly reasonable even if they may
 make little sense from the Boolean logic perspective ("find me (what?)
 AND ellis").  It feels like an operand is missing from the Boolean
 logic point of view.  However, this is mostly true of the AND and OR
 operators; the NOT operator may seem reasonable, because it can be
 viewed upon as a negation operator working on a single operand.

 (And, from the point of view of discussion about ticket:131, note that
 the NOT operator is still ill-treated, e.g. `(ellis (NOT muon))` leads
 to a parsing troubles, as I mentioned in ticket:131#comment:22.)

 Going beyond the NOT operator, and looking at it from the word
 inclusion perspective mentioned above, the AND operator still //may//
 make sense.  Which leaves us especially with the OR operator as having
 the biggest confusing potential here.  The current behaviour of the
 `|' operator with a single operand roughly follows simple consistency
 reasons with respect to the behaviour of NOT and AND operators.  In a
 nutshell:

  * empty query currently gives everything (which is handy for
    collection trees);

  * `-word` (logically `NOT word`) gives everything excluding hits
    containing `word` (logically empty query (=everything) NOT `word`)

  * `+word` (logically `AND word`) gives only hits including `word`
    (logically empty query (=everything) AND `word`)

  * `|word` (logically `OR word`) gives, by analogy, an empty query
    (=everything) OR `word`, hence everything.

 I agree it may seem weird, especially the very last case.  It is just
 one behavioural possibility that we had chosen in the past to
 represent this syntactic sugar; another choice may be more
 appropriate.  For example, we could say that an empty query is
 ill-defined, and not accept it to boot, which would modify the
 ulterior logic of things.  We could also simply refuse to accept some
 of the above queries in absence of the second operator.  Though we may
 also enter into some definition problems here, namely how to interpret
 syntactic sugar queries like `+word` or `-word`.  E.g. Google has
 chosen to accept `+word` but not `-word`.  Note that we differ from
 Google in this respect, as I mentioned elswehere; we behave more like
 Lucene/Solr does, where query `+word` or `-word` seem to do quite like
 what we do.

-- 
Ticket URL: <http://invenio-software.org/ticket/281#comment:4>
Invenio <http://invenio-software.org>

Reply via email to