#786: refextract: introduce daemon operation mode
-------------------------+----------------------
 Reporter:  simko        |      Owner:  chayward
     Type:  enhancement  |     Status:  new
 Priority:  major        |  Milestone:
Component:  RefExtract   |    Version:
 Keywords:               |
-------------------------+----------------------
 Currently refextract runs on a file by file basis.  It will be useful to
 introduce refextract daemon that will optionally run in the background,
 looking for new and/or modified documents in given collections, and
 automatically extract and eventually upload the extracted keywords.

 The daemon could be modelled just like the indexer, ranker or keyword
 classifier.  That is, refextract would have two operating modes,
 standalone and daemon mode.

 The daemon configuration can live in tables just like idxINDEX, rnkMETHOD
 or clsMETHOD.  The new table can be called xtrJOB and would contain the
 name of the daemon jobs to be run.  The options for every job could either
 live in the DB, or better yet, could live in INI-style files such as
 BibRank's `wrd.cfg` that will provide all the necessary options; for
 example:

 {{{
 /opt/invenio/etc/refextract/daemon/giva.cfg
 -------------------------------------------
 [refextract]
 extraction_mode = extract_authors
 upload_mode = holding_pen
 collections = Preprints,Theses
 }}}

 The job name in this example is `giva`, so xtrJOB table would reference
 this job name, and would look like:

 {{{
   xtrJOB
   --------
   id
   name
   description
   last_updated
 }}}

 P.S. Note that to avoid cyclic processing and for better performance, we
 may need to differentiate between record modification times from the
 metadata update point of view, document modification times, document
 textification times, etc.  This will be part of another ticket.

 P.S. Note also that care has to be done in order not to override human-
 input references.  This can be achieved by provenance tags ($2, $9) and
 will also be part of another ticket.

-- 
Ticket URL: <http://invenio-software.org/ticket/786>
Invenio <http://invenio-software.org>

Reply via email to