#786: refextract: introduce daemon operation mode
-------------------------+----------------------
Reporter: simko | Owner: chayward
Type: enhancement | Status: new
Priority: major | Milestone:
Component: RefExtract | Version:
Keywords: |
-------------------------+----------------------
Currently refextract runs on a file by file basis. It will be useful to
introduce refextract daemon that will optionally run in the background,
looking for new and/or modified documents in given collections, and
automatically extract and eventually upload the extracted keywords.
The daemon could be modelled just like the indexer, ranker or keyword
classifier. That is, refextract would have two operating modes,
standalone and daemon mode.
The daemon configuration can live in tables just like idxINDEX, rnkMETHOD
or clsMETHOD. The new table can be called xtrJOB and would contain the
name of the daemon jobs to be run. The options for every job could either
live in the DB, or better yet, could live in INI-style files such as
BibRank's `wrd.cfg` that will provide all the necessary options; for
example:
{{{
/opt/invenio/etc/refextract/daemon/giva.cfg
-------------------------------------------
[refextract]
extraction_mode = extract_authors
upload_mode = holding_pen
collections = Preprints,Theses
}}}
The job name in this example is `giva`, so xtrJOB table would reference
this job name, and would look like:
{{{
xtrJOB
--------
id
name
description
last_updated
}}}
P.S. Note that to avoid cyclic processing and for better performance, we
may need to differentiate between record modification times from the
metadata update point of view, document modification times, document
textification times, etc. This will be part of another ticket.
P.S. Note also that care has to be done in order not to override human-
input references. This can be achieved by provenance tags ($2, $9) and
will also be part of another ticket.
--
Ticket URL: <http://invenio-software.org/ticket/786>
Invenio <http://invenio-software.org>