#138: Introduce timeout mechanism for long queries
-----------------------+----------------------------------------------------
Reporter: lmarian | Owner: lmarian
Type: task | Status: new
Priority: major | Milestone: v1.0
Component: WebSearch | Version:
Keywords: |
-----------------------+----------------------------------------------------
Wildcards are currently allowed for words longer than N letters. This
is too simplistic, because phys* can have lots of variants, while xy*
may have less. So the wildcard should be allowed for the term xy, but
not for the term cern.
We should therefore use COUNT() to see how many matching terms there
may be, and allow wildcard if there are less than a reasonable limit
number, or remove wildcard if there are more. Example:
mysql> select count(*) from idxWORD01F where term like 'cern%';
Note that this limiting technique is not perfect for any kind of
query, e.g. this one would be very slow to check:
mysql> select count(*) from idxWORD01F where term like '%cern%';
due to full table scan. Similarly span queries of the kind:
mysql> select count(*) from idxWORD01F where term between 'a' and 'y';
For these queries, we'd better use explicit LIMIT statement:
mysql> select term from idxWORD01F where term between 'a' and 'y' limit
1001;
If the resulting list contains 1001 terms indeed, then we know we have
hit the limit and we should remove the wildcards from the term and
warn the user that it was removed because there were too many words.
(P.S. Timeouting would have to kill query on MySQL side too.)
--
Ticket URL: <http://cdswaredev.cern.ch/invenio/ticket/138>
Invenio <http://invenio-software.org>