#502: Advanced search hangs with SPIRES syntax
------------------------+----------------------------
  Reporter:  tbrooks    |      Owner:
      Type:  defect     |     Status:  new
  Priority:  major      |  Milestone:
 Component:  WebSearch  |    Version:
Resolution:             |   Keywords:  INSPIRE syntax
------------------------+----------------------------

Comment (by simko):

 Ad (1) concerning hanging search, the wildcard limit (ticket:138) is
 now revived and ready to be pushed to master, so at least this part
 can be fixed quickly.

 One important warning: SPIRES syntax query parser enlarges terms like
 //ji, c-r// into //ji, c* r*// which may trigger wildcard check.  This
 is something that we generally want to have in order to avoid
 denial-of-service kind of attacks, but on the other hand it may be
 problematic if we are too strict with the allowed wildcard limit, as
 users may not understand what is happening, because they did not use
 any explicit wildcard in their queries, it was added implicitly for
 them by the parser.  So we have to be somewhat careful here.

 By default we thought of using a wildcard limit of 200, meaning that a
 wildcard query is allowed if it leads to up to 200 "subqueries", which
 typically finishes under 5-10 seconds or thereabouts.  I've just
 checked typical INSPIRE situations and it seems that this limit will
 be enough, e.g. this example:

 {{{
 mysql> select count(term) from idxPHRASE03F where term like 'ji, c%';
 +-------------+
 | count(term) |
 +-------------+
 |          14 |
 +-------------+
 }}}

 and also for a common name like //Smith, J//:

 {{{
 mysql> select count(term) from idxPHRASE03F where term like 'smith, j%';
 +-------------+
 | count(term) |
 +-------------+
 |          77 |
 +-------------+
 }}}

 Can you think of a frequently used SPIRES query term that may get
 silently expanded by the SPIRES query parser into a wildcard query
 that may lead to more than 200 individual terms, hence more than 200
 silent individual "sub-queries" and silent Boolean ORs, in order to
 complete?

 If yes, then let's muse about what a reasonable limit could be.

 If not, then let's keep the limit relatively low, and deploy the
 feature and monitor its usage.  I would not like to raise it too much
 beyond say 300 for performance reasons...

-- 
Ticket URL: <http://invenio-software.org/ticket/502#comment:3>
Invenio <http://invenio-software.org>

Reply via email to