[Invenio] #542: WebSearch - all-containing perform_request_search() makes it very hard to re-use search engine

Invenio Trac Sat, 5 Mar 2011 21:15:29 +0100

#542: WebSearch - all-containing perform_request_search() makes it very hard to
re-use search engine
-----------------------+-----------------
 Reporter:  rchyla     |      Owner:
     Type:  defect     |     Status:  new
 Priority:  major      |  Milestone:
Component:  WebSearch  |    Version:
 Keywords:             |
-----------------------+-----------------
 The perform_request_search() function consists of almost 800 lines of
 code. This code contains the search logic for retrieving the results and
 combining the result sets from a different query clauses into one hit-set
 (eg. boolean AND, OR, NOT), but it also contains formatting/presentation
 instructions.


 The function also contains the output formatting and many calls into
 req.write(), which means the output of the search results is defined
 inside this one function. For example, the external collections search box
 will be always printed after the normal search box, the normal search box
 will be always printed first etc - it is not easy to move the search box
 somewhere or duplicate search box, to have it also at the end of the page
 (by easy change, I mean something like a template or at least a functional
 call that would not require copying the existing code into several places
 inside the function)

 The req.write() inside the function starts to print results before
 computation is finished, this is done to give the user the impression that
 results are coming soon so that she remains seated. However, it also means
 that the function is useful as API call only if we want to get the
 standard Invenio search page (or recids only).

 The req.write() is also a bit of cheating. For example, when doing the
 long-running searches: the page says that the search finished in "0.02 s."
 but in fact, the function is still sorting and ranking the results. It
 starts printing results only after 10s. So the results arrive after 10.02
 s. but page was already sent to browser and says 'finished in 0.02s' and
 that is actually (cough, cough...:)) not true.

 Because perform_request_search() is so long and contains so much logic in
 one space, it is also duplicating certain calls - example is printing of
 the page ending block, this is needed in several places:

 {{{
 print_records_prologue(req, of)
 print_records_epilogue(req, of)
 }}}

 As for the logic being concentrated in one function, it is very hard to
 read it (I am sure Python interpreter has no problem parsing it, but I
 did, wanted even to print it, but gave up as that would take 27 pages of
 paper!)

 Because this function is API function, we cannot change it (easily without
 breaking backward compatibility). But I would propose to split it
 internally into functional blocks

 It could look like this:

 {{{

 # the standard API with the existing signature
 def perform_request_search(....):
    _prepare_search(....)

     output = _check_readiness(....)
     if output is not None:
         return output

     _search(....)


 # the execution logic
 def _search():

     ## 0 - start output
     if recid >= 0: # recid can be 0 if deduced from sysno and if such
 sysno does not exist
         output = _print_record()
         if output is not None:
             return output


     elif action == "browse":
         ## 2 - browse needed
         of = 'hb'
         output = _browse_records()
         if output is not None:
             return output

     elif rm and p.startswith("recid:"):
         ## 3-ter - similarity search (or old-style citation search) needed
         _print_similar_records_by_recid()

     elif p.startswith("cocitedwith:"):  #WAS EXPERIMENTAL
         ## 3-terter - cited by search needed
         output = _print_cocitedwith()
         if output is not None:
             return output

     else:
         ## 3 - common search needed
         output = _print_common_search()
         if output is not None:
             return output

     # External searches
     if of.startswith("h"):
         if not of in ['hcs']:
             perform_external_collection_search(req, cc, [p, p1, p2, p3],
 f, ec, verbose, ln, selected_external_collections_infos)

     return page_end(req, of, ln)

 }}}


 The proposed change maintains all the functionality and has no adverse
 effects - it only partitiones the one very big function into a smaller
 elements, and makes it possible to move reorganize certain elements or
 call them separately if necessary (for example, for speed performance
 reasons). It also makes the code more readable and re-usable - avoiding
 duplication for certain calls. Another example seems to be the
 'perform_external_collection_search' -- unless I am wrong, the call is
 there twice, but just one call after the if..else block would have the
 same effect -- that is not readily apparent inside the 800 lines long
 block, but becomes clear after partitioning.

 The change would not solve the problem that the logic and presentation are
 tightly coupled, but it would make it a less severe problem - because the
 functional blocks are usually printing only warning messages and not parts
 of the html pages, therefore they could be used also by other components.

 I propose to create a patch for this functionality.

-- 
Ticket URL: <http://invenio-software.org/ticket/542>
Invenio <http://invenio-software.org>

[Invenio] #542: WebSearch - all-containing perform_request_search() makes it very hard to re-use search engine

Reply via email to