This is an automated notification sent by LCG Savannah.
It relates to:
                task #3253, project CDS Invenio

==============================================================================
 LATEST MODIFICATIONS of task #3253:
==============================================================================

Update of task #3253 (project cdsware):

             Open/Closed:                    Open => Closed                 


==============================================================================
 OVERVIEW of task #3253:
==============================================================================

URL:
  <http://savannah.cern.ch/task/?3253>

                 Summary: introduce timeout mechanism for long queries
                 Project: CDS Invenio
            Submitted by: simko
            Submitted on: 2006-03-30 08:56
         Should Start On: 2006-03-30 00:00
   Should be Finished on: 2006-03-30 00:00
                Category: WebSearch
                Priority: 5 - Normal
                  Status: Done
                 Privacy: Public
        Percent Complete: 0%
             Assigned to: lmarian
             Open/Closed: Closed
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________


> okay, just began to wonder when query for CERN* never returned an answer
:)

Yup.  I wanted to plug-in a generic timeouter to the whole search
engine to make sure that queries finish within 10 seconds or so.  But
this is not done yet.  

At the moment, the wildcards are simply refused for words with less
that three letters, and accepted for longer words.  But this does not
work well for words like `CERN'.  While waiting for that generic
timeouter, I should rather check how many indexed terms are returned
by a wildcard word, and refused to take wildcard into account in case
of e.g. more than 20 terms or so...

> Looks like you are sending me 'terror*', and not each word that
> includes in 'terror*'

Currently `cern*' could lead to hundreds of thousands of words, so
it's hard to .  I'll rewrite the wildcard handling part in order to
retain cases with <200 words, say, and then I'll pass you the full
list.

    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
List-Post: project-invenio-devel@cern.ch
Date: 2010-06-21 12:19              By: Ludmila Marian <lmarian>
duplicate with trac ticket #138:
<http://cdswaredev.cern.ch/invenio/ticket/138>

-------------------------------------------------------
List-Post: project-invenio-devel@cern.ch
Date: 2009-11-04 21:54              By: Tibor Simko <simko>
Wildcards are currently allowed for words longer than N letters.  This
is too simplistic, because phys* can have lots of variants, while xy*
may have less.  So the wildcard should be allowed for the term xy, but
not for the term cern.

We should therefore use COUNT() to see how many matching terms there
may be, and allow wildcard if there are less than a reasonable limit
number, or remove wildcard if there are more.  Example:

mysql> select count(*) from idxWORD01F where term like 'cern%';

Note that this limiting technique is not perfect for any kind of
query, e.g. this one would be very slow to check:

mysql> select count(*) from idxWORD01F where term like '%cern%';

due to full table scan.  Similarly span queries of the kind:

mysql> select count(*) from idxWORD01F where term between 'a' and 'y';

For these queries, we'd better use explicit LIMIT statement:

mysql> select term from idxWORD01F where term between 'a' and 'y' limit
1001;

If the resulting list contains 1001 terms indeed, then we know we have
hit the limit and we should remove the wildcards from the term and
warn the user that it was removed because there were too many words.

(P.S. Timeouting would have to kill query on MySQL side too.)





    _______________________________________________________

Carbon-Copy List:

CC Address                          | Comment
------------------------------------+-----------------------------
5845                                | -COM-
1576                                | -SUB-




==============================================================================

This item URL is:
  <http://savannah.cern.ch/task/?3253>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/

Reply via email to