Henri Sivonen wrote:
> 
> Moreover, trying to implementing exclusions in the main RELAX NG  
> schema causes the number of grammar production to roughly double per  
> each exclusion pair. This kind of growth is not manageable in a human- 
> written schema.

I wrote an email about this when I was drafting the first parts of
the HTML5 schema back in 2005. The old relaxng mailing list system
was down, though, so I never sent it...

Basically, I think RelaxNG needs to be able to intersect two grammars:
to say that an element conforms if its content model matches *both*
pattern A and pattern B.

~fantasai

-------- Original Message --------
Subject: Relax NG limitations
Date: Tue, 23 Aug 2005 09:07:42 -0400
From: fantasai <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED],  [EMAIL PROTECTED],  [EMAIL PROTECTED], 
[EMAIL PROTECTED]
CC: Henri Sivonen <[EMAIL PROTECTED]>

Granted, I'm having trouble understanding the specs, so I might've
missed something, but I've run up against two unsolveable problems
while trying to write a Relax NG schema for (x)HTML 5.


The first problem I'm running into is the same one James Clark
apparently ran into when writing the Relax NG schema for XHTML:
there is no good way to express exclusions without validating
against multiple schemas.
   http://www.thaiopensource.com/relaxng/xhtml/

It's possible to express a schema that validates a document if it
matches at least *one* of several grammars. If it were possible to
express a schema that validates a document only if it matches *all*
of several grammars, then that would solve the problem. At the
moment, you need to script the validator to check several schemas
in sequence. While this method makes it easy to incorporate other
forms of validation in the validator, it's not conducive to other
uses of the Relax NG schema itself (e.g. guiding editors).

Of course, a real exclusion pattern (the element's model must NOT
match this pattern) would make that much more powerful and convenient
than simply relying on name classes with exceptions to do negation.


The second problem is the unquantifiable nature of the 'text' pattern
combined with the restriction on mixing data and elements. I can't
figure out any way to express a *non-empty* requirement in Relax NG:

   # Some elements are defined to have as a content model significant
   # inline content. This means that at least one descendant of the
   # element must be significant text or embedded content.
   #
   # Significant text, for the purposes of determining the presence
   # of significant inline content, consists of any character other
   # than those falling in the Unicode categories Zs, Zl, Zp, Cc,
   # and Cf. [UNICODE]

Even without the Unicode-level restrictions, I don't see any way of
requiring an element to have *some* inline content. Perhaps a
'requiredText' pattern that didn't have 'text's implicit 'zeroOrMore'
would help here. (It would have to count white space as insufficient
text.)

**************************************************************************

The difficult characteristic of HTML 5, schema-wise, is that it propagates
several different content model restrictions *through* descendants:
Interactive elements cannot not have interactive descendants, and many
elements can be used in two different inline model contexts: structured
and strict. Currently I have four different inline content models going
in parallel, with four definitions of each inline element. Adding in
another facet would push that up to eight, which is too unweildy. Try
to add in the "significant content" requirement, which is applied to
elements like <p> and <a>, plus another handful of random per-element
exclusions and a Relax NG schema without some kind of grammar intersection
quickly becomes very impractical.

Given a requiredText pattern, it would be possible to construct a
SignificantContent pattern, using wildcards and a class of "significant
content elements" (requiredText | img.elem | object.elem | etc.), that
expresses the requirement that there be at least one of the significant
content elements as a descendant. Intersecting that with the normal
content model requirements of <p> and <a> would express that conformance
requirement without placing a heavy burden on the rest of the schema.

Intersection would also let me do things like require a common optional
attribute on a particular element (e.g. 'dir' on <bdo>) while still
using the named common attribute collection (instead of duplicating the
common attributes specially just for <bdo>).


~fantasai

Reply via email to