Re: XMLLiteral handling in RDFa in HTML

Shelley Powers Tue, 26 May 2009 04:45:37 -0700

Philip Taylor wrote:

Manu Sporny wrote:
[...]
[17:37:12] Shane McCarron: we have no option about well formedness of
xml literals.  its a requirement
[17:37:18] … not our requirement.  RDF
Slightly off-topic: RDF seems to always require canonicalisation(http://www.w3.org/TR/rdf-concepts/#dfn-rdf-XMLLiteral - "encoding asUTF-8 yields exclusive Canonical XML (with comments, with emptyInclusiveNamespaces PrefixList)").http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.sparqlseems to ignore that requirement since it allows different ways ofserialising the XML. Is that intentional?
More general comment: How is this different to XMLLiterals inRDFa-in-XHTML? When you're implementing that, you can't just copybytes from the data input stream directly - you at least have toinsert xmlns declarations to ensure the output isnamespace-well-formed. And if the XMLLiteral contains some &entity;that's defined in the XHTML page's DTD then it will have to beexpanded out so that it's correct once it's separated from the DTD,and so on. As far as I can see, that's pretty much impossible toimplement unless you parse the whole page with an XML parser and thenuse an XML serialisation algorithm on an element sub-tree to get theXMLLiteral.
It seems logical to me that RDFa-in-HTML should work the same way -you parse the whole page with an HTML parser and then use exactly thesame XML serialisation algorithm as before.
(I understand that it may be impossible to specify that behaviour ifyou're relying on HTML4, since HTML4 doesn't specify how to parse intoa structure that can be serialised as XML; but that's why I'd want tobase it on the HTML5 parsing algorithm instead, which makes this allquite easy :-) )
[17:36:32] … I think we have two really nasty issues right now: The
first being xmlns + case sensitivity: both of which are fixed if we move
to @prefix and declare that prefix is always case-sensitive (and
implement the legacy case-insensitivity stuff for xmlns that we've been
talking about in the community over the past couple of days).
Related the two issues, a consequence of implementing XMLLiteralsusing HTML5's parsing and XML-serialisation algorithms is that contentlike:
  <div property="..."><span xmlns:foo="..."></span></div>
would fail to generate an XMLLiteral. The 'xmlns:foo' gets parsed intoan attribute with local name "xmlns:foo" in no namespace. That localname is not an NCName, so it's impossible to serialise as XML, and theXML serialisation algorithm will fail.
Some possible solutions for this issue:
* Change the HTML5 parsing algorithm so xmlns:foo gets local name"foo" in the XML Namespaces namespace. (That seems very unlikely tohappen, because of backward-compatibility issues with existing content.)
* Add some ugly hacks in the serialisation process, e.g. find allattributes named "xmlns:foo" and pretend they were called "foo" in theXML Namespaces namespace while serialising.
* Don't support XMLLiterals.

Don't support XMLLitererals in HTML, would, I think, be the safestapproach. What is supported with HTML is already a subset of what'ssupported in XHTML because of the former's inherent limitations. Thisapproach wouldn't add a burden that folks aren't already operating underbecause they're using HTML.

* Discourage the use of xmlns:foo attributes, and replace them with@prefix or something.
(In all but the last of those cases, xmlns:foo would still be aproblem for any other tool that attempts to convert HTML to XML (usingHTML5's parsing rules), e.g.http://services.philip.html5.org/html-to-xhtml/ strips out theattributes entirely because they can't be represented in XML.)

Folks have to remember that adding a new attribute to handle differencesbetween HTML and XHTML is probably the more extreme option, and willlead to confusion, as well as breakage when people use XHTML, but servepages up as HTML. Which many, many people do now.

Supporting a subset of RDFa in HTML would, to me, be better thansupporting a different version of RDFa.


Shelley

Re: XMLLiteral handling in RDFa in HTML

Reply via email to