#985: BibFormat: lazy record formatting
------------------------+-----------------------
Reporter: simko | Owner:
Type: enhancement | Status: new
Priority: major | Component: BibFormat
Version: | Keywords:
------------------------+-----------------------
0) It is expensive to format records on the fly, especially on pages such
as search results, that need to display many records on the same page. So
`bibreformat` daemon is periodically running in the background and updates
HTML brief and other formats and stores them in the `bibfmt` format cache,
from which the search result page simply selects needed format to display
it to the user. This is good for display speed in our "many-SELECT's-few-
UPDATE's" database conditions.
(Historically speaking, `BibFormat` had PHP support for old formats
written in PHP, so it was especially expensive to call the formatting
engine from the search pages that were written in Python. Now that
formats are done fully in Python, they are faster to produce.)
For production, after ingestion, it is good to show a record in its new
state as soon as possible after ingestion, which is why we have introduced
and implemented ticket:870.
Both these goals can be achieved with lazy record formatting which will
permit us to remove `bibreformat` daemon task from running and have always
up-to-date formats.
1) It will be useful to introduce a new option to `format_record()` that
will be named like `store_in_cache=True` that will proces given record ID
and that will optionally store resulting format in bibfmt cache. When
cache is up-to-date, and `on_thy_fly` is not `False`, it would read the
cache to return the given format.
2) `bibfmt` table is growing big. It will be useful to decouple
"important" formats such as MARCXML stored there by the upload, and "less
important" formats such as HB that are simply only cached there and that
can be recreated at any time from "important" formats.
So we can introduce a new table like `fmtCACHE` that would store all pre-
cached formats that used to be generated by `bibformat`, such as HB, HD
etc. The structure of `fmtCACHE` could look like that of `bibfmt`.
In the future it could store even parts of the formats in case some
elements are dynamic and depend upon logged-in user credentials.
(Important metadata-related internal formats that were not generated by
`bibreformat`, such as `recstruct`, would be left in `bibfmt`.)
3) Once we have lazy record formatting facility in place, `bibreformat`
daemon would not have to be run as a bibsched task anymore. We could
remove this capability and introduce new CLI options that would offer
operating on the `fmtCACHE` table. E.g. delete pre-cached formats for
collection //Poetry// when `Poem.bft` template changes, see also
ticket:972. E.g. delete all HB formats because we change URL of the
installation, see [wiki:HowTo/HowToChangeSiteUrl]. Etc.
--
Ticket URL: <http://invenio-software.org/ticket/985>
Invenio <http://invenio-software.org>