#799: RefExtract: introduce author extraction mode
-------------------------+----------------------
 Reporter:  simko        |      Owner:  chayward
     Type:  enhancement  |     Status:  new
 Priority:  major        |  Milestone:
Component:  RefExtract   |    Version:
 Keywords:               |
-------------------------+----------------------
 RefExtract should be enhanced with author extraction mode, behaving like
 giva.  That is, provided an input PDF file, one should be able to run:

 {{{
 $ refextract --extract-authors -f 1:file.pdf
 }}}

 and RefExtract should study the beginning portion of the file, looking for
 authors and affiliations, and it should output something like:

 {{{
     <datafield tag="100" ind1=" " ind2=" ">
       <subfield code="a">Doe, J</subfield>
       <subfield code="u">U. Foo</subfield>
     </datafield>
     <datafield tag="700" ind1=" " ind2=" ">
       <subfield code="a">Bloggs, J</subfield>
       <subfield code="u">U. Bar</subfield>
     </datafield>
     <datafield tag="700" ind1=" " ind2=" ">
       <subfield code="a">Mustermann, E</subfield>
       <subfield code="u">U. Xyzzy</subfield>
       <subfield code="u">U. Zyxxy</subfield>
     </datafield>
 }}}

 IOW, refextract would provide two modes: the traditional `--extract-
 references` mode that would be the default, and a new `--extract-authors`
 mode the addition of which is the task of this ticket.

 (Note that this may later touch a question of marking detected fields with
 provenance $2 and $9 information so that operating author extraction on
 the back end may be automatised and that refextract-found fields won't
 overwrite human-edited fields.)

-- 
Ticket URL: <http://invenio-software.org/ticket/799>
Invenio <http://invenio-software.org>

Reply via email to