#542: WebSearch - all-containing perform_request_search() makes it very hard to
re-use search engine
-----------------------+-----------------
Reporter: rchyla | Owner:
Type: defect | Status: new
Priority: major | Milestone:
Component: WebSearch | Version:
Keywords: |
-----------------------+-----------------
The perform_request_search() function consists of almost 800 lines of
code. This code contains the search logic for retrieving the results and
combining the result sets from a different query clauses into one hit-set
(eg. boolean AND, OR, NOT), but it also contains formatting/presentation
instructions.
The function also contains the output formatting and many calls into
req.write(), which means the output of the search results is defined
inside this one function. For example, the external collections search box
will be always printed after the normal search box, the normal search box
will be always printed first etc - it is not easy to move the search box
somewhere or duplicate search box, to have it also at the end of the page
(by easy change, I mean something like a template or at least a functional
call that would not require copying the existing code into several places
inside the function)
The req.write() inside the function starts to print results before
computation is finished, this is done to give the user the impression that
results are coming soon so that she remains seated. However, it also means
that the function is useful as API call only if we want to get the
standard Invenio search page (or recids only).
The req.write() is also a bit of cheating. For example, when doing the
long-running searches: the page says that the search finished in "0.02 s."
but in fact, the function is still sorting and ranking the results. It
starts printing results only after 10s. So the results arrive after 10.02
s. but page was already sent to browser and says 'finished in 0.02s' and
that is actually (cough, cough...:)) not true.
Because perform_request_search() is so long and contains so much logic in
one space, it is also duplicating certain calls - example is printing of
the page ending block, this is needed in several places:
{{{
print_records_prologue(req, of)
print_records_epilogue(req, of)
}}}
As for the logic being concentrated in one function, it is very hard to
read it (I am sure Python interpreter has no problem parsing it, but I
did, wanted even to print it, but gave up as that would take 27 pages of
paper!)
Because this function is API function, we cannot change it (easily without
breaking backward compatibility). But I would propose to split it
internally into functional blocks
It could look like this:
{{{
# the standard API with the existing signature
def perform_request_search(....):
_prepare_search(....)
output = _check_readiness(....)
if output is not None:
return output
_search(....)
# the execution logic
def _search():
## 0 - start output
if recid >= 0: # recid can be 0 if deduced from sysno and if such
sysno does not exist
output = _print_record()
if output is not None:
return output
elif action == "browse":
## 2 - browse needed
of = 'hb'
output = _browse_records()
if output is not None:
return output
elif rm and p.startswith("recid:"):
## 3-ter - similarity search (or old-style citation search) needed
_print_similar_records_by_recid()
elif p.startswith("cocitedwith:"): #WAS EXPERIMENTAL
## 3-terter - cited by search needed
output = _print_cocitedwith()
if output is not None:
return output
else:
## 3 - common search needed
output = _print_common_search()
if output is not None:
return output
# External searches
if of.startswith("h"):
if not of in ['hcs']:
perform_external_collection_search(req, cc, [p, p1, p2, p3],
f, ec, verbose, ln, selected_external_collections_infos)
return page_end(req, of, ln)
}}}
The proposed change maintains all the functionality and has no adverse
effects - it only partitiones the one very big function into a smaller
elements, and makes it possible to move reorganize certain elements or
call them separately if necessary (for example, for speed performance
reasons). It also makes the code more readable and re-usable - avoiding
duplication for certain calls. Another example seems to be the
'perform_external_collection_search' -- unless I am wrong, the call is
there twice, but just one call after the if..else block would have the
same effect -- that is not readily apparent inside the 800 lines long
block, but becomes clear after partitioning.
The change would not solve the problem that the logic and presentation are
tightly coupled, but it would make it a less severe problem - because the
functional blocks are usually printing only warning messages and not parts
of the html pages, therefore they could be used also by other components.
I propose to create a patch for this functionality.
--
Ticket URL: <http://invenio-software.org/ticket/542>
Invenio <http://invenio-software.org>