This is an automated notification sent by LCG Savannah. It relates to: task #3253, project CDS Invenio
============================================================================== LATEST MODIFICATIONS of task #3253: ============================================================================== Update of task #3253 (project cdsware): Status: None => Done _______________________________________________________ Follow-up Comment #2: duplicate with trac ticket #138: <http://cdswaredev.cern.ch/invenio/ticket/138> ============================================================================== OVERVIEW of task #3253: ============================================================================== URL: <http://savannah.cern.ch/task/?3253> Summary: introduce timeout mechanism for long queries Project: CDS Invenio Submitted by: simko Submitted on: 2006-03-30 08:56 Should Start On: 2006-03-30 00:00 Should be Finished on: 2006-03-30 00:00 Category: WebSearch Priority: 5 - Normal Status: Done Privacy: Public Percent Complete: 0% Assigned to: lmarian Open/Closed: Open Discussion Lock: Any Effort: 0.00 _______________________________________________________ > okay, just began to wonder when query for CERN* never returned an answer :) Yup. I wanted to plug-in a generic timeouter to the whole search engine to make sure that queries finish within 10 seconds or so. But this is not done yet. At the moment, the wildcards are simply refused for words with less that three letters, and accepted for longer words. But this does not work well for words like `CERN'. While waiting for that generic timeouter, I should rather check how many indexed terms are returned by a wildcard word, and refused to take wildcard into account in case of e.g. more than 20 terms or so... > Looks like you are sending me 'terror*', and not each word that > includes in 'terror*' Currently `cern*' could lead to hundreds of thousands of words, so it's hard to . I'll rewrite the wildcard handling part in order to retain cases with <200 words, say, and then I'll pass you the full list. _______________________________________________________ Follow-up Comments: ------------------------------------------------------- List-Post: project-invenio-devel@cern.ch Date: 2010-06-21 12:19 By: Ludmila Marian <lmarian> duplicate with trac ticket #138: <http://cdswaredev.cern.ch/invenio/ticket/138> ------------------------------------------------------- List-Post: project-invenio-devel@cern.ch Date: 2009-11-04 21:54 By: Tibor Simko <simko> Wildcards are currently allowed for words longer than N letters. This is too simplistic, because phys* can have lots of variants, while xy* may have less. So the wildcard should be allowed for the term xy, but not for the term cern. We should therefore use COUNT() to see how many matching terms there may be, and allow wildcard if there are less than a reasonable limit number, or remove wildcard if there are more. Example: mysql> select count(*) from idxWORD01F where term like 'cern%'; Note that this limiting technique is not perfect for any kind of query, e.g. this one would be very slow to check: mysql> select count(*) from idxWORD01F where term like '%cern%'; due to full table scan. Similarly span queries of the kind: mysql> select count(*) from idxWORD01F where term between 'a' and 'y'; For these queries, we'd better use explicit LIMIT statement: mysql> select term from idxWORD01F where term between 'a' and 'y' limit 1001; If the resulting list contains 1001 terms indeed, then we know we have hit the limit and we should remove the wildcards from the term and warn the user that it was removed because there were too many words. (P.S. Timeouting would have to kill query on MySQL side too.) _______________________________________________________ Carbon-Copy List: CC Address | Comment ------------------------------------+----------------------------- 5845 | -COM- 1576 | -SUB- ============================================================================== This item URL is: <http://savannah.cern.ch/task/?3253> _______________________________________________ Message sent via/by LCG Savannah http://savannah.cern.ch/