Re: scientific publishing process (was Re: Cost and access)

Kingsley Idehen Tue, 07 Oct 2014 04:59:04 -0700

On 10/7/14 5:39 AM, Norman Gray wrote:

Kingsley and all, hello.


On 2014 Oct 7, at 02:18, Kingsley Idehen <[email protected]> wrote:

On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote:


On 10/06/2014 11:03 AM, Kingsley Idehen wrote:

On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:

It's not hard to query PDFs with SPARQL.  All you have to do is extract the

Huh?  Every single PDF reader that I use can extract the PDF metadata and 
display it.

Again, this isn't about metadata.

With all respect to the larger goal of having fully semanticked-up documents, I 
think the question _is_ all about metadata.

It can't be. The metadata focus is a subtle misconception. We need access to all of the data in the document.

   The original spark to the thread was a lament that SW and LD conferences 
don't mandate something XMLish for submissions because X(HT)ML is clearly 
better for... well ... dammit, it's Better.

The initial gripe (as I've always seen it) is that we are trying to tell the world about Linked Open Data virtues while rarely putting them to use (instinctively) ourselves. It just so happens that conferences are provide an example that most have experienced in some capacity.


_One_ thing it would be better for is supporting the sort of full-scale 
RDF-everything view that you've described so eloquently.  But if that's your 
goal, then lexing the source text is really going to be the least of your 
problems.

A more modest goal, which is still valuable and _much_ more achievable, is to 
get at least some RDF out of submitted articles.

Yes, or just make references to RDF sources relevant to the paper, but on the basis that those references (to the degree possible) resolve. This also about the data represented in tabular form (as tables) and the data behind the tables, so to speak.

  That practically means metadata, plus perhaps some document structure, plus, 
if you're keen and can get the authors to invest their effort, some 
argumentation.  That's available for free (and right now) from LaTeX authors, 
and available from XHTML authors depending on how hard it would be to get them 
to put @profile attribute in the right places.

So no, not just about 'metadata' in the narrow sense, but I think this thread 
is about what RDF you can in practice extract from the materials that authors 
can in practice be induced or obliged to submit to conference proceedings.

For those conferences associated with themes such as Linked Open Data and the Semantic Web, RDF should be the norm for structured data representation. If that isn't possible then what are we saying to the world about RDF, in regards to structured data representation and data de-silo-fication?


That original lament has overlapped with a parallel lament that PDF is a 
dead-end format -- it's not 'webby'.


The are linked :-)

   I believe that the demo in my earlier message undermines that claim as far 
as RDF goes.

1. The extractors are platform specific -- AWWW is about platform agnosticism
(I don't want to mandate an OS for experiencing the power of Linked Open Data
transformers / rdfizers)

Well, the extractors would be specific to PDF, but that's hardly surprising, I 
think.

[I've lost track of whose comment this is...]

The extractor I demoed wasn't PDF-specific.

"Platform" in the context of my comments really relates to operating systems i.e., most PDF extractors are operating system specific. That's why I mentioned the massive opportunity for Adobe (and 3rd parties too, as Mike Bergman added) in regards to providing Web Services to accessing and indexing PDF document content.

We want to leverage the productivity and simplicity that AWWW brings to data
representation, access, interaction, and integration.

Sure, but the additional costs, if any, on paper authors, reviewers, and 
readers have to be considered.  If these costs are eliminated or at least 
minimized then this good is much more likely to be realized.

With some help from Adobe we can have the best of all worlds here. I am going 
to take a look at their latest cloud offerings and associated APIs.

I forgot to attach the extractor I wrote -- done.  The demo didn't use any 
Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it.


You forgot the extractor demo link :)


All the best,

Norman



--
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

smime.p7s
Description: S/MIME Cryptographic Signature

Re: scientific publishing process (was Re: Cost and access)

Reply via email to