#643: Solving "find" problems
------------------------+----------------------------
  Reporter:  tbrooks    |      Owner:  valkyrie
      Type:  defect     |     Status:  assigned
  Priority:  major      |  Milestone:
 Component:  WebSearch  |    Version:
Resolution:             |   Keywords:  INSPIRE syntax
------------------------+----------------------------

Comment (by simko):

 Let me look at the differences between this approach and the guesswork
 approach suggested earlier in ticket:508.  For queries like "a ellis",
 there is no difference, for the guesswork should work well due to
 leading "a" use case.  (And I'd venture to guess that the majority of
 real-life user queries where people left out leading find is of such a
 nature that would be nicely detected by the guesswork.)  For queries
 like "sd shell", the Boolean OR approach works better indeed, because
 "sd shell" can be interpreted in two ways, either in Google free
 keyword search style or in SPIRES fielded search style
 (sd=experiment).  So doing silent Boolean OR definitely helps here.

 However, this comes at a price.  Naively speaking, doing Boolean OR
 may be expected to take twice the search time, so in case of user
 storms, we may be expected to be able to serve twice less users per
 second, so to speak.  (Having multiple workers helps only a little
 here due to not having multiple DBs behind, currently.)
 Economically-speaking, it would be good to avoid doing unnecessary
 queries, which helps with scalability.

 Perhaps you meant to perform Boolean OR only sometimes?  E.g. when the
 SPIRES parser leads to basic search unit combination that is different
 from the Invenio parser's basic search unit combination?  This would
 help with not doing unnecessary queries, but we would still have the
 parsing time to consider.  Note that currently the mixed SPIRES
 parsing is very slow, as I mentioned in the inspire-dev discussion
 where we have been discussing ticket:508.  We would have to improve
 the speed of the mixed parser first before we can send every query to
 the mixed parser.

 So to me this discussion is very similar in nature to the one we have
 been having in ticket:508 and on the inspire-dev mailing list, namely
 whether to dispatch all queries, or only some queries, via the SPIRES
 parser.  A new element in the discussion is whether to do silent
 Boolean OR query always or only sometimes, so to speak.  At the
 current state of things, for economical and technical reasons, I tend
 to like the latter approach more, because I think the guesswork can
 already helpfully cover most of user queries without much scalability
 issues.  If sending all queries via SPIRES parser is to be considered,
 then we should definitely improve its speed as part of the same
 package, as mentioned in connection to ticket:508.

-- 
Ticket URL: <http://invenio-software.org/ticket/643#comment:2>
Invenio <http://invenio-software.org>

Reply via email to