#643: Solving "find" problems
------------------------+----------------------------
Reporter: tbrooks | Owner: valkyrie
Type: defect | Status: assigned
Priority: major | Milestone:
Component: WebSearch | Version:
Resolution: | Keywords: INSPIRE syntax
------------------------+----------------------------
Comment (by simko):
Let me look at the differences between this approach and the guesswork
approach suggested earlier in ticket:508. For queries like "a ellis",
there is no difference, for the guesswork should work well due to
leading "a" use case. (And I'd venture to guess that the majority of
real-life user queries where people left out leading find is of such a
nature that would be nicely detected by the guesswork.) For queries
like "sd shell", the Boolean OR approach works better indeed, because
"sd shell" can be interpreted in two ways, either in Google free
keyword search style or in SPIRES fielded search style
(sd=experiment). So doing silent Boolean OR definitely helps here.
However, this comes at a price. Naively speaking, doing Boolean OR
may be expected to take twice the search time, so in case of user
storms, we may be expected to be able to serve twice less users per
second, so to speak. (Having multiple workers helps only a little
here due to not having multiple DBs behind, currently.)
Economically-speaking, it would be good to avoid doing unnecessary
queries, which helps with scalability.
Perhaps you meant to perform Boolean OR only sometimes? E.g. when the
SPIRES parser leads to basic search unit combination that is different
from the Invenio parser's basic search unit combination? This would
help with not doing unnecessary queries, but we would still have the
parsing time to consider. Note that currently the mixed SPIRES
parsing is very slow, as I mentioned in the inspire-dev discussion
where we have been discussing ticket:508. We would have to improve
the speed of the mixed parser first before we can send every query to
the mixed parser.
So to me this discussion is very similar in nature to the one we have
been having in ticket:508 and on the inspire-dev mailing list, namely
whether to dispatch all queries, or only some queries, via the SPIRES
parser. A new element in the discussion is whether to do silent
Boolean OR query always or only sometimes, so to speak. At the
current state of things, for economical and technical reasons, I tend
to like the latter approach more, because I think the guesswork can
already helpfully cover most of user queries without much scalability
issues. If sending all queries via SPIRES parser is to be considered,
then we should definitely improve its speed as part of the same
package, as mentioned in connection to ticket:508.
--
Ticket URL: <http://invenio-software.org/ticket/643#comment:2>
Invenio <http://invenio-software.org>