#985: BibFormat: lazy record formatting
------------------------+-----------------------
Reporter:  simko        |      Owner:
    Type:  enhancement  |     Status:  new
Priority:  major        |  Component:  BibFormat
 Version:               |   Keywords:
------------------------+-----------------------
 0) It is expensive to format records on the fly, especially on pages such
 as search results, that need to display many records on the same page.  So
 `bibreformat` daemon is periodically running in the background and updates
 HTML brief and other formats and stores them in the `bibfmt` format cache,
 from which the search result page simply selects needed format to display
 it to the user.  This is good for display speed in our "many-SELECT's-few-
 UPDATE's" database conditions.

 (Historically speaking, `BibFormat` had PHP support for old formats
 written in PHP, so it was especially expensive to call the formatting
 engine from the search pages that were written in Python.  Now that
 formats are done fully in Python, they are faster to produce.)

 For production, after ingestion, it is good to show a record in its new
 state as soon as possible after ingestion, which is why we have introduced
 and implemented ticket:870.

 Both these goals can be achieved with lazy record formatting which will
 permit us to remove `bibreformat` daemon task from running and have always
 up-to-date formats.

 1) It will be useful to introduce a new option to `format_record()` that
 will be named like `store_in_cache=True` that will proces given record ID
 and that will optionally store resulting format in bibfmt cache.  When
 cache is up-to-date, and `on_thy_fly` is not `False`, it would read the
 cache to return the given format.

 2) `bibfmt` table is growing big.  It will be useful to decouple
 "important" formats such as MARCXML stored there by the upload, and "less
 important" formats such as HB that are simply only cached there and that
 can be recreated at any time from "important" formats.

 So we can introduce a new table like `fmtCACHE` that would store all pre-
 cached formats that used to be generated by `bibformat`, such as HB, HD
 etc.  The structure of `fmtCACHE` could look like that of `bibfmt`.
 In the future it could store even parts of the formats in case some
 elements are dynamic and depend upon logged-in user credentials.

 (Important metadata-related internal formats that were not generated by
 `bibreformat`, such as `recstruct`, would be left in `bibfmt`.)

 3) Once we have lazy record formatting facility in place, `bibreformat`
 daemon would not have to be run as a bibsched task anymore.  We could
 remove this capability and introduce new CLI options that would offer
 operating on the `fmtCACHE` table. E.g. delete pre-cached formats for
 collection //Poetry// when `Poem.bft` template changes, see also
 ticket:972.  E.g. delete all HB formats because we change URL of the
 installation, see [wiki:HowTo/HowToChangeSiteUrl].  Etc.

-- 
Ticket URL: <http://invenio-software.org/ticket/985>
Invenio <http://invenio-software.org>

Reply via email to