Re: scientific publishing process (was Re: Cost and access)

Peter F. Patel-Schneider Mon, 06 Oct 2014 16:42:00 -0700

Neat. This could be extended to putting a full table of contents into themetadata, and in lots of other ways. The other nice thing about it is that itwould be possible to push the same data through a LaTeX to HTML toolchain forthose who want HTML output.


peter


On 10/06/2014 03:18 PM, Norman Gray wrote:


Greetings.

On 2014 Oct 6, at 19:19, Alexander Garcia Castro <[email protected]> wrote:

querying PDFs is NOT simple and requires a lot of work -and usually
produces lots of errors. just querying metadata is not enough. As I said
before, I understand the PDF as something that gives me a uniform layout.
that is ok and necessary, but not enough or sufficient within the context
of the web of data and scientific publications. I would like to have the
content readily available for mining purposes. if I pay for the publication
I should get access to the publication in every format it is available. the
content should be presented in a way so that it makes sense within the web
of data.  if it is the full content of the paper represented in RDF or XML
fine. also, I would like to have well annotated content, this is simple and
something that could quite easily be part of existing publication
workflows. it may also be part of the guidelines for authors -for instance,
identify and annotate rhetorical structures.



The following might add something to this conversation.

It illustrates getting the metadata from a LaTeX file, putting it into an XMP 
packet in a PDF, and getting it out of the PDF as RDF.  Pace Peter's mention of 
/Author, /Title, etc, this just focuses on the XMP packet.

This has the document metadata, the abstract, and an illustrative bit of 
argumentation.  Adding details about the document structure, and (RDF) pointers 
to any figures would be feasible, as would, I suspect, incorporating CSV files 
directly into the PDF.  Incorporating \begin{tabular} tables would be rather 
tricky, but not impossible.  I can't help feeling that the XHTML+RDFa 
equivalent would be longer and need more documentation to instruct the author 
where to put the RDFa magic.

It's not very fancy, and still has rough edges, but it only took me 100 
minutes, from a standing start.

Generating and querying this PDF seems pretty simple to me.

----

[...]

Re: scientific publishing process (was Re: Cost and access)

Reply via email to