Re: scientific publishing process (was Re: Cost and access)

Norman Gray Tue, 07 Oct 2014 10:19:43 -0700

Sarven, hello.

On 2014 Oct 7, at 13:13, Sarven Capadisli <[email protected]> wrote:

> On 2014-10-07 11:39, Norman Gray wrote:
>> The original spark to the thread was a lament that SW and LD conferences 
>> don't mandate something XMLish for submissions because X(HT)ML is clearly 
>> better for... well ... dammit, it's Better.
> 
> Straw man argument. Please stop that now!
> 
> I will spell out the main proposal and purpose for you because it sounds like 
> you are completely oblivious to them. Let me know if anything is unclear.

My remark was intended as facetious rather than fractious, but if you feel I 
misjudged the balance, I apologise.

I want to clarify what I meant, because on reflection it explains (at least to 
me) why I'm participating in this thread at such length.  My intention was to 
indicate that I don't feel that HTML is as central as you, amongst others, seem 
to assert it is.

I characterise the web as:

  1. URIs for addressing things,
  2. HTTP for retrieving things (other protocols exist, but...),
  3. a downloadable format which clients can parse to obtain more URIs, with a 
'follow this' semantic.

Now, the obvious candidate for (3) is of course HTML; but on the web, and 
_especially_ on the Semantic Web, it can be anything: RDF in one or other 
format, XML+GRDDL, some discipline-specific format with has a link semantic in 
it, or even a PDF file with a standardised lump of RDF/XMP inside it.  That RDF 
may be immediately present, or it may require some sort of heuristic or 
deterministic extraction (as Kingsley has discussed).

All of these are web-native technologies, and I'd go as far as to say that the 
_least_ interesting thing you can find at the end of a URI is an HTML file.

The big deal, for me, in the idea of the Semantic Web, and the RDF world, is 
the realisation that the RDF model is sufficiently general that you can turn 
almost any structured data into RDF, put it into a big bucket, and start 
inferencing, querying, linking, and so on.  That generation/extraction of RDF 
is probably easier if the stuff is already pointy-bracketed for you, but that's 
only a detail.

The interesting thing, for me, is just how the web as a whole can go about 
collectively managing or facilitating this generation/extraction in a way which 
balances faithfulness to the original with interoperable meaning (Dublin Core 
and FOAF are truly wonderful things).  That is why I do feel that -- especially 
in this SW/LD community --

    HTML is a bit of a sideshow.

HTML is a splendid thing for all the reasons that you know and I know, but if 
it's seen as central, if all questions turn into "what does that look like in 
HTML?", if it's so in-our-face that we can't see round it, then we miss the 
interesting questions.  So it's not that I've a particular downer on HTML, or a 
particular enthusiasm for PDF, but I think that "what does that look like in 
PDF?" and "what does that look like in FITS?" (the format of choice in my area) 
are more interesting.

(or put another way, I don't think that HTML is the SW/LD community's dogfood 
to eat -- for WHATWG, yes; us no)

The sub-threads here about practicalities are amongst those questions, because 
they pick up the questions of "how does semantics get attached to documents in 
practice?", "why would authors bother?", "how does that information get passed 
around faithfully?"  It would be more interesting and productive if (and I 
don't mean this completely unseriously) the SW/LD community _forbade_ HTML from 
its conferences and journals.

So, this is where the opposite end of the spectrum is, from your position.  
This may make a little more sense of what I've been saying.

Best wishes,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

Re: scientific publishing process (was Re: Cost and access)

Reply via email to