#502: Advanced search hangs with SPIRES syntax
------------------------+----------------------------
Reporter: tbrooks | Owner:
Type: defect | Status: new
Priority: major | Milestone:
Component: WebSearch | Version:
Resolution: | Keywords: INSPIRE syntax
------------------------+----------------------------
Comment (by simko):
Ad (1) concerning hanging search, the wildcard limit (ticket:138) is
now revived and ready to be pushed to master, so at least this part
can be fixed quickly.
One important warning: SPIRES syntax query parser enlarges terms like
//ji, c-r// into //ji, c* r*// which may trigger wildcard check. This
is something that we generally want to have in order to avoid
denial-of-service kind of attacks, but on the other hand it may be
problematic if we are too strict with the allowed wildcard limit, as
users may not understand what is happening, because they did not use
any explicit wildcard in their queries, it was added implicitly for
them by the parser. So we have to be somewhat careful here.
By default we thought of using a wildcard limit of 200, meaning that a
wildcard query is allowed if it leads to up to 200 "subqueries", which
typically finishes under 5-10 seconds or thereabouts. I've just
checked typical INSPIRE situations and it seems that this limit will
be enough, e.g. this example:
{{{
mysql> select count(term) from idxPHRASE03F where term like 'ji, c%';
+-------------+
| count(term) |
+-------------+
| 14 |
+-------------+
}}}
and also for a common name like //Smith, J//:
{{{
mysql> select count(term) from idxPHRASE03F where term like 'smith, j%';
+-------------+
| count(term) |
+-------------+
| 77 |
+-------------+
}}}
Can you think of a frequently used SPIRES query term that may get
silently expanded by the SPIRES query parser into a wildcard query
that may lead to more than 200 individual terms, hence more than 200
silent individual "sub-queries" and silent Boolean ORs, in order to
complete?
If yes, then let's muse about what a reasonable limit could be.
If not, then let's keep the limit relatively low, and deploy the
feature and monitor its usage. I would not like to raise it too much
beyond say 300 for performance reasons...
--
Ticket URL: <http://invenio-software.org/ticket/502#comment:3>
Invenio <http://invenio-software.org>