On May 20, 2009, at 10:27, Henri Sivonen wrote:
However, in order to usefully apply RELAX NG or Schematron to a
microdata-base infoset, the infoset conversion should turn property
names into element names. Since XML places arbitrary limitations on
element names (and element content), this mapping would have exactly
the same complications as mapping microdata to RDF/XML.
Here's an attempt at mapping microdata to XML:
* Have a root element (it doesn't matter what it's called) with
attribute xml:lang that has the language of the root element of the
HTML document.
* Have a child of root with local name 'title', namespace 'http://purl.org/dc/terms/title'
and content that is the content of HTML <title>
* For each link relation in the document, have a child of root that
has as its local name the ASCII-lowercased rel token (or ALTERNATE-
STYLESHEET for alternate stylesheet), namespace http://www.w3.org/1999/xhtml/vocab#
and no-namespace attribute 'url' that contains the absoluticized
href of the link relation.
* For each <meta name content>, have a child of root with the value
of the name attribute of the <meta> as local name, namespace http://www.w3.org/1999/xhtml/vocab#
and the value of the content attribute as element content. If the
language of the <meta> differs from root, have xml:lang with the
different language.
* For cites, do the link thing analogously to how cites are handled
in the RDF conversion.
* For items and properties:
- map the property name to XML namespace,local pair as follows and
use the result as the element name for the 'property element':
* If itemprop contains a colon: Locate the last # or / whichever
comes last but isn't the last character of the URI. Make the part up
to and including that character the namespace URI and the part after
the local name.
* Otherwise: Namespace is http://www.w3.org/1999/xhtml/custom#
and the propitem token is the local name.
- If value is a URL, put the URL value in an attribute called
'url' on the property element.
- If the value is itself an item, put the value of the item
attribute on the property element in the value of an attribute called
'type' in no namespace.
- Otherwise, put the string value in the content of the property
element and put the language of the property on the xml:lang attribute
of the property element if different from its nearest ancestor xml:lang.
Without actually trying, on the face of things, this kind of mapping
seems tractable to RELAX NG schemas.
And, as mentioned before, this breaks when:
1) The local name becomes non-NCName.
2) textContent in HTML contains non-XML characters
Use the infoset coercion rules for those. However, the Uhhhhhh
notation may be collided, because microdata property names aren't
lowercased.
--
Henri Sivonen
[email protected]
http://hsivonen.iki.fi/