querying PDFs is NOT simple and requires a lot of work -and usually
produces lots of errors. just querying metadata is not enough. As I said
before, I understand the PDF as something that gives me a uniform layout.
that is ok and necessary, but not enough or sufficient within the context
of the web of data and scientific publications. I would like to have the
content readily available for mining purposes. if I pay for the publication
I should get access to the publication in every format it is available. the
content should be presented in a way so that it makes sense within the web
of data.  if it is the full content of the paper represented in RDF or XML
fine. also, I would like to have well annotated content, this is simple and
something that could quite easily be part of existing publication
workflows. it may also be part of the guidelines for authors -for instance,
identify and annotate rhetorical structures.

On Mon, Oct 6, 2014 at 11:03 AM, Kingsley Idehen <kide...@openlinksw.com>

> On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
>> It's not hard to query PDFs with SPARQL.  All you have to do is extract
>> the metadata from the document and turn it into RDF, if needed. Lots of
>> programs extract and display this metadata already.
> Peter,
> Having had 200+ (some-non-rdf-doc} to RDF document transformers built
> under my direct guidance, there are issues with your claim above:
> 1. The extractors are platform specific -- AWWW is about platform
> agnosticism (I don't want to mandate an OS for experiencing the power of
> Linked Open Data transformers / rdfizers)
> 2. It isn't solely about metadata  -- we also have raw data inside these
> documents confined to Tables, paragraphs of sentences
> 3. If querying a PDF was marginally simple, I would be demonstrating that
> using a SPARQL results URL in response to this post :-)
> Possible != Simple and Productive.
> We want to leverage the productivity and simplicity that AWWW brings to
> data representation, access, interaction, and integration.
> --
> Regards,
> Kingsley Idehen
> Founder & CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog 1: http://kidehen.blogspot.com
> Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
> Twitter Profile: https://twitter.com/kidehen
> Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

Alexander Garcia

Reply via email to