#799: RefExtract: introduce author extraction mode
-------------------------+----------------------
Reporter: simko | Owner: chayward
Type: enhancement | Status: new
Priority: major | Milestone:
Component: RefExtract | Version:
Keywords: |
-------------------------+----------------------
RefExtract should be enhanced with author extraction mode, behaving like
giva. That is, provided an input PDF file, one should be able to run:
{{{
$ refextract --extract-authors -f 1:file.pdf
}}}
and RefExtract should study the beginning portion of the file, looking for
authors and affiliations, and it should output something like:
{{{
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Doe, J</subfield>
<subfield code="u">U. Foo</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Bloggs, J</subfield>
<subfield code="u">U. Bar</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Mustermann, E</subfield>
<subfield code="u">U. Xyzzy</subfield>
<subfield code="u">U. Zyxxy</subfield>
</datafield>
}}}
IOW, refextract would provide two modes: the traditional `--extract-
references` mode that would be the default, and a new `--extract-authors`
mode the addition of which is the task of this ticket.
(Note that this may later touch a question of marking detected fields with
provenance $2 and $9 information so that operating author extraction on
the back end may be automatised and that refextract-found fields won't
overwrite human-edited fields.)
--
Ticket URL: <http://invenio-software.org/ticket/799>
Invenio <http://invenio-software.org>