#138: Introduce timeout mechanism for long queries
-----------------------+----------------------------------------------------
 Reporter:  lmarian    |       Owner:  lmarian
     Type:  task       |      Status:  new    
 Priority:  major      |   Milestone:  v1.0   
Component:  WebSearch  |     Version:         
 Keywords:             |  
-----------------------+----------------------------------------------------
 Wildcards are currently allowed for words longer than N letters. This
 is too simplistic, because phys* can have lots of variants, while xy*
 may have less. So the wildcard should be allowed for the term xy, but
 not for the term cern.

 We should therefore use COUNT() to see how many matching terms there
 may be, and allow wildcard if there are less than a reasonable limit
 number, or remove wildcard if there are more. Example:

 mysql> select count(*) from idxWORD01F where term like 'cern%';

 Note that this limiting technique is not perfect for any kind of
 query, e.g. this one would be very slow to check:

 mysql> select count(*) from idxWORD01F where term like '%cern%';

 due to full table scan. Similarly span queries of the kind:

 mysql> select count(*) from idxWORD01F where term between 'a' and 'y';

 For these queries, we'd better use explicit LIMIT statement:

 mysql> select term from idxWORD01F where term between 'a' and 'y' limit
 1001;

 If the resulting list contains 1001 terms indeed, then we know we have
 hit the limit and we should remove the wildcards from the term and
 warn the user that it was removed because there were too many words.

 (P.S. Timeouting would have to kill query on MySQL side too.)

-- 
Ticket URL: <http://cdswaredev.cern.ch/invenio/ticket/138>
Invenio <http://invenio-software.org>

Reply via email to