#303: plotextrator: support LaTeX context extraction
-------------------------+--------------------------------------------------
Reporter: jlavik | Owner: jlavik
Type: enhancement | Status: new
Priority: major | Milestone: v1.0
Component: MiscUtil | Version:
Keywords: |
-------------------------+--------------------------------------------------
In addition to extracting captions from the plots and figures, which the
plotextractor currently does, it is also a good idea to extract the
context for when the image is referenced in the fulltext LaTeX.
The amount of text to extract can be limited to one sentence each way in
addition to the sentence the reference was found. Things like new
paragraphs or complex LaTeX structures (with \begin\end, figure tags etc.)
should be excluded, but one would still like to keep simple tags like
\cite,\ref etc.
The context for each image can be saved separately in a file and uploaded
via FFT as a subformat of the image (i.e. {{{fig1.png.context}}}) using
{{{.png;context}}} in {{{$f}}} together with 'HIDDEN' keyword in {{{$o}}},
to hide it from metadata.
This can then be used when searching for plots, indexing the same way as
with fulltexts.
(Note: Extracting this from PDF's are just as relevant, but will be
introduced at a later date.)
--
Ticket URL: <http://invenio-software.org/ticket/303>
Invenio <http://invenio-software.org>