On May 20, 2009, at 10:27, Henri Sivonen wrote:

However, in order to usefully apply RELAX NG or Schematron to a microdata-base infoset, the infoset conversion should turn property names into element names. Since XML places arbitrary limitations on element names (and element content), this mapping would have exactly the same complications as mapping microdata to RDF/XML.


Here's an attempt at mapping microdata to XML:

* Have a root element (it doesn't matter what it's called) with attribute xml:lang that has the language of the root element of the HTML document. * Have a child of root with local name 'title', namespace 'http://purl.org/dc/terms/title' and content that is the content of HTML <title> * For each link relation in the document, have a child of root that has as its local name the ASCII-lowercased rel token (or ALTERNATE- STYLESHEET for alternate stylesheet), namespace http://www.w3.org/1999/xhtml/vocab# and no-namespace attribute 'url' that contains the absoluticized href of the link relation. * For each <meta name content>, have a child of root with the value of the name attribute of the <meta> as local name, namespace http://www.w3.org/1999/xhtml/vocab# and the value of the content attribute as element content. If the language of the <meta> differs from root, have xml:lang with the different language. * For cites, do the link thing analogously to how cites are handled in the RDF conversion.
 * For items and properties:
- map the property name to XML namespace,local pair as follows and use the result as the element name for the 'property element': * If itemprop contains a colon: Locate the last # or / whichever comes last but isn't the last character of the URI. Make the part up to and including that character the namespace URI and the part after the local name. * Otherwise: Namespace is http://www.w3.org/1999/xhtml/custom# and the propitem token is the local name. - If value is a URL, put the URL value in an attribute called 'url' on the property element. - If the value is itself an item, put the value of the item attribute on the property element in the value of an attribute called 'type' in no namespace. - Otherwise, put the string value in the content of the property element and put the language of the property on the xml:lang attribute of the property element if different from its nearest ancestor xml:lang.

Without actually trying, on the face of things, this kind of mapping seems tractable to RELAX NG schemas.

And, as mentioned before, this breaks when:
 1) The local name becomes non-NCName.
 2) textContent in HTML contains non-XML characters

Use the infoset coercion rules for those. However, the Uhhhhhh notation may be collided, because microdata property names aren't lowercased.

--
Henri Sivonen
[email protected]
http://hsivonen.iki.fi/


Reply via email to