On 2015-01-20 18:28, Larry Masinter wrote:
There's some background that you might find helpful
in the discussion.
PDF is now defined by ISO 32000.
PDF has profiles, including PDF/A-3
http://www.digitalpreservation.gov/formats/fdd/fdd000360.shtml
ISO 19005-3. PDF/A-3 defines how to add arbitrary
file attachments to PDF.
XMP http://en.wikipedia.org/wiki/Extensible_Metadata_Platform
is (as of 2012) also an ISO standard, ISO 16684-1, a
format-independent metadata representation
that uses a restricted RDF/XML framework, but
not arbitrary RDF/XML.
A design from scratch today might make different
choices, of course. But for those whose
goal is deployment and integration
with existing workflows, then reuse of what is widely
deployed seems like a path worth investigating.
And XMP is widely implemented not just for PDF but
also for images, as a way of extending metadata
beyond EXIF or IPTC.
Putting linked data in compact form (CSV, for example)
might makes sense, perhaps as a PDF/A-3 file attachment,
if a document is a carrier of tabular data.
Image formats like JPEG and PNG (for which there
is support for XMP) don't have a standard, uniform
way of attaching other files, though, so allowing
data (or a pointer to external data) in the XMP
would broaden the applicability.
In choosing how to make five star open data work
for file formats other than HTML, what other choices
are there?
I would argue that declarative programs are most suitable. Others may
disagree. AFAIK, there is no single widely accepted view on this.
re: "existing workflows", would you mind sharing your thoughts on how
the 4th star, "use URIs to denote things, so that people can point at
your stuff", may be achieved? Say we have:
http://example.org/foo.pdf
and that we go with XMP out of the box, irrespective of the RDF
serialization it embeds. How can the 3rd LD design principle, "when
someone looks up a URI, provide useful information, using the standards
(RDF*, SPARQL)", be satisfied?
Example: I want to discover the variables that are declared in the
hypothesis of papers.
What would the PDF/XMP look like?
How can I extract the information (without breaking my head) using off
the shelf *open* tools?
Sure, not all PDFs have good quality XMP metadata,
but not all HTML has quality RDFa or metadata either.
I can agree to that. We can also look at it this way: majority of the
Web pages are essentially "broken", yet, the Web somehow "just works".
How would/does a PDF look or work on the Web if there is a non-trivial
byte off - never mind the XMP?
-Sarven
http://csarven.ca/#i