Henri Sivonen wrote: > > Moreover, trying to implementing exclusions in the main RELAX NG > schema causes the number of grammar production to roughly double per > each exclusion pair. This kind of growth is not manageable in a human- > written schema.
I wrote an email about this when I was drafting the first parts of the HTML5 schema back in 2005. The old relaxng mailing list system was down, though, so I never sent it... Basically, I think RelaxNG needs to be able to intersect two grammars: to say that an element conforms if its content model matches *both* pattern A and pattern B. ~fantasai -------- Original Message -------- Subject: Relax NG limitations Date: Tue, 23 Aug 2005 09:07:42 -0400 From: fantasai <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] CC: Henri Sivonen <[EMAIL PROTECTED]> Granted, I'm having trouble understanding the specs, so I might've missed something, but I've run up against two unsolveable problems while trying to write a Relax NG schema for (x)HTML 5. The first problem I'm running into is the same one James Clark apparently ran into when writing the Relax NG schema for XHTML: there is no good way to express exclusions without validating against multiple schemas. http://www.thaiopensource.com/relaxng/xhtml/ It's possible to express a schema that validates a document if it matches at least *one* of several grammars. If it were possible to express a schema that validates a document only if it matches *all* of several grammars, then that would solve the problem. At the moment, you need to script the validator to check several schemas in sequence. While this method makes it easy to incorporate other forms of validation in the validator, it's not conducive to other uses of the Relax NG schema itself (e.g. guiding editors). Of course, a real exclusion pattern (the element's model must NOT match this pattern) would make that much more powerful and convenient than simply relying on name classes with exceptions to do negation. The second problem is the unquantifiable nature of the 'text' pattern combined with the restriction on mixing data and elements. I can't figure out any way to express a *non-empty* requirement in Relax NG: # Some elements are defined to have as a content model significant # inline content. This means that at least one descendant of the # element must be significant text or embedded content. # # Significant text, for the purposes of determining the presence # of significant inline content, consists of any character other # than those falling in the Unicode categories Zs, Zl, Zp, Cc, # and Cf. [UNICODE] Even without the Unicode-level restrictions, I don't see any way of requiring an element to have *some* inline content. Perhaps a 'requiredText' pattern that didn't have 'text's implicit 'zeroOrMore' would help here. (It would have to count white space as insufficient text.) ************************************************************************** The difficult characteristic of HTML 5, schema-wise, is that it propagates several different content model restrictions *through* descendants: Interactive elements cannot not have interactive descendants, and many elements can be used in two different inline model contexts: structured and strict. Currently I have four different inline content models going in parallel, with four definitions of each inline element. Adding in another facet would push that up to eight, which is too unweildy. Try to add in the "significant content" requirement, which is applied to elements like <p> and <a>, plus another handful of random per-element exclusions and a Relax NG schema without some kind of grammar intersection quickly becomes very impractical. Given a requiredText pattern, it would be possible to construct a SignificantContent pattern, using wildcards and a class of "significant content elements" (requiredText | img.elem | object.elem | etc.), that expresses the requirement that there be at least one of the significant content elements as a descendant. Intersecting that with the normal content model requirements of <p> and <a> would express that conformance requirement without placing a heavy burden on the rest of the schema. Intersection would also let me do things like require a common optional attribute on a particular element (e.g. 'dir' on <bdo>) while still using the named common attribute collection (instead of duplicating the common attributes specially just for <bdo>). ~fantasai
