On Wed, 17 Jun 2009 13:47:12 +0200, Jonathan Rees <[email protected]> wrote: > I don't see how your answer or the linked documents bear on my > question, so let me amplify. > > The ideal situation: you can take any HTML5 document, convert it to > some XML-based language designed for the purpose (not necessarily > XHTML), convert it back, and get a semantically equivalent HTML5 > document.
The parser of the HTML syntax is Turing-complete so that will not work. (You can inject characters into the tokenizer.) > The problem I'm worried about is the lack of interoperability between > HTML5 and XML processors. (It has nothing to do with browsers.) Other > specs such as OWL 2 and XQuery have addressed this problem by > providing XML syntax as an alternative. But this only achieves the > intended effect if semantics-preserving round trips work. > > For comparison, 'tidy' provides conversion from HTML4 to XHTML (I > think), and the resulting XHTML is in a subset (I think) of HTML4, so > the round trip property holds. I assume this approach doesn't work for > HTML5, which is why I do not necessarily have XHTML in mind as the > representation. If 'tidy' is good enough and you consider it working I do not see why that would not work for HTML5. -- Anne van Kesteren http://annevankesteren.nl/
