On 10/6/14 2:19 PM, Alexander Garcia Castro wrote:
querying PDFs is NOT simple and requires a lot of work -and usually produces lots of errors.

Yes, I believe I indicated that in my response to Peter i.e., it isn't simple or productive.

just querying metadata is not enough.

Yes, I said that too i.e., we want access to raw data.

As I said before, I understand the PDF as something that gives me a uniform layout. that is ok and necessary, but not enough or sufficient within the context of the web of data and scientific publications. I would like to have the content readily available for mining purposes. if I pay for the publication I should get access to the publication in every format it is available. the content should be presented in a way so that it makes sense within the web of data. if it is the full content of the paper represented in RDF or XML fine. also, I would like to have well annotated content, this is simple and something that could quite easily be part of existing publication workflows. it may also be part of the guidelines for authors -for instance, identify and annotate rhetorical structures.

Modulo any confusing typos in my earlier posts, I don't see where we are disagreeing :-)


Kingsley

On Mon, Oct 6, 2014 at 11:03 AM, Kingsley Idehen <kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:

    On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:

        It's not hard to query PDFs with SPARQL.  All you have to do
        is extract the metadata from the document and turn it into
        RDF, if needed. Lots of programs extract and display this
        metadata already.


    Peter,

    Having had 200+ (some-non-rdf-doc} to RDF document transformers
    built under my direct guidance, there are issues with your claim
    above:

    1. The extractors are platform specific -- AWWW is about platform
    agnosticism (I don't want to mandate an OS for experiencing the
    power of Linked Open Data transformers / rdfizers)

    2. It isn't solely about metadata  -- we also have raw data inside
    these documents confined to Tables, paragraphs of sentences

    3. If querying a PDF was marginally simple, I would be
    demonstrating that using a SPARQL results URL in response to this
    post :-)

    Possible != Simple and Productive.

    We want to leverage the productivity and simplicity that AWWW
    brings to data representation, access, interaction, and integration.


-- Regards,

    Kingsley Idehen
    Founder & CEO
    OpenLink Software
    Company Web: http://www.openlinksw.com
    Personal Weblog 1: http://kidehen.blogspot.com
    Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
    <http://www.openlinksw.com/blog/%7Ekidehen>
    Twitter Profile: https://twitter.com/kidehen
    Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
    LinkedIn Profile: http://www.linkedin.com/in/kidehen
    Personal WebID:
    http://kingsley.idehen.net/dataspace/person/kidehen#this





--
Alexander Garcia
http://www.alexandergarcia.name/
http://www.usefilm.com/photographer/75943.html
http://www.linkedin.com/in/alexgarciac



--
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to