#786: refextract: introduce daemon operation mode
--------------------------+----------------------
  Reporter:  simko        |      Owner:  chayward
      Type:  enhancement  |     Status:  closed
  Priority:  major        |  Milestone:
 Component:  RefExtract   |    Version:
Resolution:  fixed        |   Keywords:
--------------------------+----------------------
Changes (by Christopher Hayward <christopher.james.hayward@…>):

 * status:  in_merge => closed
 * resolution:   => fixed


Comment:

 In [85a4d09546e533f494573f8bb5a64e04e57912be]:
 {{{
 #!CommitTicketReference repository=""
 revision="85a4d09546e533f494573f8bb5a64e04e57912be"
 refextract: introduce daemon operation mode

 * Convert Refextract in a form allowable for the submission of extraction
   tasks via Bibtask, for Bibsched, but preserving the independent nature
 of
   Refextract. (Running Refextract as default will cause it to be
 scheduled,
   but when given a fulltext input [using -f, --fulltext], it will run in
   the original standalone mode).

 * Change the method of providing fulltext documents for extraction, so as
   to differentiate between running in standlone mode, and running as a
   scheduled task: -f and --fulltext are now used, to denote each single
   fulltext document.

 * Add two intermediate files: 'refextract_cli.py' and
   'refextract_daemon.py' which will handle the execution mode of
   Refextract, and the submission of a Refextract task to Bibsched.

 * Provide the ability for Refextract to run on specific collections and
   records, using the flags -c --collection and -i --recid.

 * Provide the ability for Refextract to construct a new scheduled
   extraction job, using a predefined 'job configuration file', by
   specifying the name of the job to run (using -e, --extraction-job).
   Each job corresponds to a matching named job file under /etc/bibedit,
   holding the parameters for the job.

 * Add functionality to interact with a new db table called xtrJOB, which
   holds the id, name, and last_updated information for each ran job task
   (specified using -e, or --extraction-job). Use the last_updated info to
   compare against the modification_date of each record; Only newly updated
   files are scheduled to have their references re-extracted.

 * Include an extraction job file (refextract-job-preprints) to act as an
   example template.

 * Include in refextract_config, a list of acceptable job parameters which
   are allowed to be specified inside a Refextract job description file.

 * Change the '-s' flag for controlling the appearance of journal standard
   reference form to '-p' so as not to interfere with the sleep cli option
   for Bibsched.

 * Update Makefile.am to reflect the addition of refextract_daemon and
   refextract_cli files, and also the presence of the template extraction
   job file.

 * Update the refextract-specific bibtask_config.py default values for
   recids and collections as empty lists. These are filled with the
   location of fulltext documents when starting Refextract inside Bibsched.

 * Handle all error messages regardless of the mode that Refextract is
   running in. (Short error messages are shown inside the Bibsched
 interface
   under the 'progress' column, and all are sent to the Bibsched log when
   Refextract is scheduled. Stdout or stderr are used when running
   Refextract as standalone. Stdout and stderr are also used when no xml
   file has been specified to hold the extracted references).

 * Update the oai_harvest_daemon to call Refextract using the new fulltext
   flag (-f, --fulltext).

 * Include the default author kb location, used on ImportError.

 * Display an error message and halt in the situation where a user
 specifies
   an extraction-job to run, alongside other cli options or a path to a
   fulltext document from which to extract, and other daemon-specific flags
   (--collection, --extraction-job).

 * Display the full directory in the error message when an extraction-job
   config file has not been found.

 * Show in --help the three main modes for which to run Refextract.

 * Inside extraction-job files, accept either an absolute path or a base
   name when referencing report number and journal name knowledge bases.
   In the situation where the absolute path is omitted, the daemon falls
   back to the Invenio 'etc' directory.

 * Add the xtrJOB table description to tabcreate.

 (closes #786)
 }}}

-- 
Ticket URL: </ticket/786#comment:4>
Invenio <http://invenio-software.org>

Reply via email to