Mark Birbeck wrote:
I did think though, that one of the things about the RDF/XML structure
was an attempt to enable many XML layouts to be interpreted as RDF.
But obviously that's enormously difficult.
The striping design of RDF/XML, by design or accident, makes it very
well suited to be the target of XSLT transformations. See
http://lists.w3.org/Archives/Public/semantic-web/2008Jul/0037.html for a
stylesheet that will transform any XML document to Infoset RDF/XML. You
could of course write out the RDF graph in any other notation you
choose, but RDF/XML is no more difficult than another.
Infoset RDF might not be a big step forward, but at least it puts you
into the RDF world where you can merge graphs and do whatever semantic
processing you like.
What we would really like to do is vivify the meaning that the XML
author was aiming for when he marked up the character stream in the
first place. We won't get at that meaning from the grammar alone; we
must look at the semantics of the markup itself. The direction was
pointed years ago in this article:
http://xml.coverpages.org/xmlAndSemantics.html, and possibly in other
articles undiscovered to me.
In this discussion I will set aside DTDs and XML Schemas and all other
such tools of the grammarians and computer scientists; for I wish to
focus on the basic semantic gestures of markup itself. Structural
markup, as in SGML and XML, is a means of breaking up a sequence of
characters into components of interest. The syntactical rules for
well-formed XML enable a primitive--yet reliable and robust--set of
semantic gestures, to wit:
- naming (components of interest can be named)
- attributing (components can have properties)
- sequence (a component can have a positional predecessor)
- containment (a component can be contained in another)
Nothing could be easier than making an RDFS vocabulary of these notions.
And it is only slightly harder to modify the stylesheet referenced above
to emit RDF/XML using this vocabulary. (If I were to implement this I
would add a "Chunk" class to contain character strings, instead of
representing them as sequences of named things with a common parent.) So
you can have, with very little effort, a system that reveals, for any
XML instance, the fundamental semantic gestures of its author.
In XML, as in natural language, we have many ways of expressing nearly
the same meaning. If we must decide if two utterances have the same
meaning, we cannot do it by comparing the sounds of the utterances--we
must consult some rules about the language: word definitions,
grammatical rules, and usage conventions. Just so with XML--it is
useless to compare the surface structure. We must first of all expose
the semantic structure of each instance, then apply some rules of
synonymy. Putting an XML document into some such RDF as described above
makes it easier to apply these rules.
--Paul