On 10/06/2014 11:03 AM, Kingsley Idehen wrote:
On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
It's not hard to query PDFs with SPARQL. All you have to do is extract the
metadata from the document and turn it into RDF, if needed. Lots of programs
extract and display this metadata already.
Peter,
Having had 200+ (some-non-rdf-doc} to RDF document transformers built under my
direct guidance, there are issues with your claim above:
Huh? Every single PDF reader that I use can extract the PDF metadata and
display it. The metadata that I see in PDF documents uses a core set of
properties that are easy to transform into RDF. Of course, this core set is
very small (title, author, and a few other things) so you don't get all that
much out of the core set.
1. The extractors are platform specific -- AWWW is about platform agnosticism
(I don't want to mandate an OS for experiencing the power of Linked Open Data
transformers / rdfizers)
Well, the extractors would be specific to PDF, but that's hardly surprising, I
think.
2. It isn't solely about metadata -- we also have raw data inside these
documents confined to Tables, paragraphs of sentences
Well, sure, but is extracting information directly from the figures or tables
or text being considered here? I sure would like this to be possible. How
would it work in an HTML context?
3. If querying a PDF was marginally simple, I would be demonstrating that
using a SPARQL results URL in response to this post :-)
I'm not saying that it is so simple. You do have to find the metadata block
in the PDF and then look for the /Title, /Author, ... stuff.
Possible != Simple and Productive.
Yes, but there are lots of tools that display PDF metadata, so there are some
who believe that the benefit is greater than the cost.
We want to leverage the productivity and simplicity that AWWW brings to data
representation, access, interaction, and integration.
Sure, but the additional costs, if any, on paper authors, reviewers, and
readers have to be considered. If these costs are eliminated or at least
minimized then this good is much more likely to be realized.
peter