Re: linked open data and PDF

Sarven Capadisli Wed, 21 Jan 2015 06:28:52 -0800

On 2015-01-20 18:28, Larry Masinter wrote:

There's some background that you might find helpful
in the discussion.


PDF is now defined by ISO 32000.
PDF has profiles, including PDF/A-3
http://www.digitalpreservation.gov/formats/fdd/fdd000360.shtml
ISO 19005-3. PDF/A-3 defines how to add arbitrary
file attachments to PDF.

XMP http://en.wikipedia.org/wiki/Extensible_Metadata_Platform
is (as of 2012) also an ISO standard, ISO 16684-1, a
format-independent metadata representation
that uses a restricted RDF/XML framework, but
not arbitrary RDF/XML.

A design from scratch today might make different
choices, of course. But for those whose
goal is deployment and integration
with existing workflows, then reuse of what is widely
deployed seems like a path worth investigating.

And XMP is widely implemented not just for PDF but
also for images, as a way of extending metadata
beyond EXIF or IPTC.

Putting linked data in compact form (CSV, for example)
might makes sense, perhaps as a PDF/A-3 file attachment,
if a document is a carrier of tabular data.

Image formats like JPEG and PNG (for which there
is support for XMP) don't have a standard, uniform
way of attaching other files, though, so allowing
data (or a pointer to external data) in the XMP
would broaden the applicability.

In choosing how to make five star open data work
for file formats other than HTML, what other choices
are there?

I would argue that declarative programs are most suitable. Others maydisagree. AFAIK, there is no single widely accepted view on this.

re: "existing workflows", would you mind sharing your thoughts on howthe 4th star, "use URIs to denote things, so that people can point atyour stuff", may be achieved? Say we have:


http://example.org/foo.pdf

and that we go with XMP out of the box, irrespective of the RDFserialization it embeds. How can the 3rd LD design principle, "whensomeone looks up a URI, provide useful information, using the standards(RDF*, SPARQL)", be satisfied?

Example: I want to discover the variables that are declared in thehypothesis of papers.


What would the PDF/XMP look like?

How can I extract the information (without breaking my head) using offthe shelf *open* tools?

Sure, not all PDFs have good quality XMP metadata,
but not all HTML has quality RDFa or metadata either.

I can agree to that. We can also look at it this way: majority of theWeb pages are essentially "broken", yet, the Web somehow "just works".How would/does a PDF look or work on the Web if there is a non-trivialbyte off - never mind the XMP?


-Sarven
http://csarven.ca/#i

Re: linked open data and PDF

Reply via email to