#124: Exceptions when combining regular search and ranking in perform request
search
-----------------------+----------------------------------------------------
 Reporter:  tbrooks    |       Owner:     
     Type:  defect     |      Status:  new
 Priority:  major      |   Milestone:     
Component:  WebSearch  |     Version:     
 Keywords:             |  
-----------------------+----------------------------------------------------
 caused by searches of the form

 recid:240309 soliton

 _with_ rm=citation   (or word similarity)

 I.e. a recid search with a bare extra search term

 Joe points out the problem:
 The problem appears be in search_engine.py:

 {{{
 4332 elif rm and p.startswith("recid:"):
 4333 ## 3-ter - similarity search or citation search needed
 4334 if req and not req.header_only:
 4335 page_start(req, of, cc, aas, ln, uid, _("Search Results"),
 p=create_page_title_search
 4336 if of.startswith("h"):
 4337 req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so,
 sp, rm, of, ot, a
 4338 p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y,
 4339 if record_exists(p[6:]) != 1:
 4340 # record does not exist
 4341 if of.startswith("h"):
 4342 if req.header_only:
 4343 raise apache.SERVER_RETURN, apache.HTTP_NOT_FOUND
 4344 else:
 4345 print_warning(req, _("Requested record does not seem to exist."))
 4346 if of == "id":
 4347 return []
 4348 elif of.startswith("x"):
 4349 # Print empty, but valid XML
 4350 print_records_prologue(req, of)
 4351 print_records_epilogue(req, of)

 }}}

 record_exists(p[6:]) naturally grabs everything to send along, including
 the unexpectedly non-numeric rest of the string. This could just return
 an error to the user, or it could use a regex to more correctly parse
 out the recid. Or we could ask why the recid isn't already set when
 this body of code is called.


 Tibor points out that the correct solution is to move to a more general
 way of identifying citation or similarity searching:

 "recid:XXXXX" -> cites:XXXX
 or
 similarto:XXXX

 then searches of the form:

 cites:XXXX soliton

 or, more likely

 cites:XXXX 980:CORE

 year:2007 cites:XXXX

 (ie find all records that cite XXXX and were published in 2007...)

 This makes more sense to the user, and extends functionality a bit, in
 addition to fixing the exceptions raised when someone accidentally tries
 the wrong thing.

-- 
Ticket URL: <http://cdswaredev.cern.ch/invenio/ticket/124>
Invenio <http://invenio-software.org>

Reply via email to