Hi Phillip, Eric, et. al.
--------------------------------------------
On Fri, 10/3/14, Phillip Lord <[email protected]> wrote:


 
 Eric Prud'hommeaux
 <[email protected]>
 writes:
 
 > Let's work through the requirements and a plausible migration plan. We need:
 >
 > 1 persistent storage: it's hard to beat books for a feeling of persistence.
 > Contracts with trusted archival institutions can help but we might also
 > want some assurances that the protocols and formats will persist as well.
 [snip] 
 Protocols and formats, yes, true a problem. I think in an argument between 
HTML and PDF,
 then it's hard to see one has the advantage over another. My experience is 
that HTML is easier
 to extract text from, which is always going to be base line.
---------------------------------------
Easier still is (X)HTML or XML written in plain text with Character Entities 
Hex Escaped.  Clipboards are "owned" by the OS and for ordinary users, syntax 
errors are fatal; Bread&Butter (full employment) for Help Desks.  Personally, I 
am un-fond of that ideology.  XSLT 2.0 has a (flawless) translation mechanism 
which eases user pain.  I've used it several times for StratML projects.  If 
you want a copy of the transform, contact me off line.
 ---------------------------------------
 For what it is worth, there are achiving solutions, including archive.org and 
arxiv.org both of which leap to mind.
 ---------------------------------------
The archiving solutions work well for the persistance of protocols and formats. 
 Persistance of Linked Data depends upon the ability of an archive to reduce 
<owl:sameAs> and <rdfs:*> to their *export* standards.  Professional 
credibility in all disciplines relies on how well one hefts the lingo - applies 
the schema labels to shared concepts. Publishers are very sensitive to this 
concern and it may be Linked Data with the deaf ear.
----------------------------------------
[snip]
 Okay. I would like to know who made the decision that HTML is not acceptable 
and why.
----------------------------------------
This is a related issue.  The "decision" to ignore the seperation of concerns 
issue mentioned above is a user acceptance impediment when "protocols and 
formats" are the only parameters considered.  In a few decades perhaps we will 
have real AI, Turing Machines, and academic disciplines will have their own 
Ontologies which speak to them.  As a container, I think HTML is fine.  I am 
not comfortable with RDFa "decorations" or /html/head meta data as absentee 
ownership of documents.

In the meantime, Archives will have to develop methods to recycle and reduce 
rdfs:Labels, and they will have to be (uncharactaristically) ruthless.  The 
statistics of RDF rely on a well known "paradox" 
(http://en.wikipedia.org/wiki/Birthday_problem).  Close matches between name 
spaces and Ontologies have an extreme bias toward "high probability" 
identification.  In the end, the probability is just a number, but it 
intimidates ordinary partial fractions who believe it is the "smartest guy in 
the room".  That is rather a bad thing.

Cheers,
Gannon 


 
 Phil
 
 

Reply via email to