comments on XHTML Modularization 1.1 from XML Schema WG

C. M. Sperberg-McQueen Tue, 27 Feb 2007 14:46:06 -0800


Dear colleagues:


On behalf of the XML Schema Working Group, I congratulate the
HTML Working Group on your progress with XHTML Modularization.

As described in the comments below, owing to a snafu
the XML Schema WG did not review the Last Call WD of XHTML
Modularization 1.1 last summer.  In the hopes that the maxim
"better late than never" is true in this case, we transmit
to you now our comments on the document.  My apologies for
the snafu.

Our comments are available at any of the URIs

  http://www.w3.org/XML/Group/2007/02/m12n-of-xhtml.xsd-comments
  http://www.w3.org/XML/Group/2007/02/m12n-of-xhtml.xsd-comments.xml
  http://www.w3.org/XML/Group/2007/02/m12n-of-xhtml.xsd-comments.html

A text version is provided below for those who find it more
convenient.

--C. M. Sperberg-McQueen
  on behalf of the W3C XML Schema WG




Notes on

XHTML Modularization 1.1

   Ed. by

C. M. Sperberg-McQueen

Submitted to the HTML Working Group on behalf of the XML Schema Working
Group

27 February 2007

$Id: m12n-of-xhtml.xsd-comments.html,v 1.1 2007/02/27 22:36:18 cmsmcq
Exp $
     _________________________________________________________

     * 1. [7]Background
     * 2. [8]Substantive comments
          + 2.1. [9]Charset type
          + 2.2. [10]Color type
          + 2.3. [11]ContentType
          + 2.4. [12]Coords type
          + 2.5. [13]FPI type
          + 2.6. [14]FrameTarget type
          + 2.7. [15]LinkTypes type
          + 2.8. [16]Tightening other types
          + 2.9. [17]Named model groups vs. substitution groups
          + 2.10. [18]Adding attributes
          + 2.11. [19]A missing scenario
     * 3. [20]Editorial comments
          + 3.1. [21]Make the introduction less DTD-specific
          + 3.2. [22]The term PCDATA
          + 3.3. [23]Section 4.3 Attribute Types
          + 3.4. [24]Length type: well done
          + 3.5. [25]Shape type
          + 3.6. [26]White space in the document source
     * 4. [27]Comments half substantive and half editorial
          + 4.1. [28]Testing the schema documents
          + 4.2. [29]Where is the html element?
          + 4.3. [30]Case insensitivity and XML Schema patterns or
            enumerations
     _________________________________________________________

   NOTE:
   This document contains comments on the [31]Last Call Working Draft
   of XHTML™ Modularization 1.1. Several different readers formulated
   the comments; the editor has not attempted to unify and organize
   them strictly. The comments are forwarded to the XHTML Working Group
   on behalf of the XML Schema Working Group, but it should be noted
   that the XML Schema Working Group has not had the leisure to
   consider them in detail.

   The Last Call comment period on this draft ended 4 August 2006, so
   these comments are very late. They are being forwarded nonetheless
   in the hopes that even at this late date they may prove useful to
   those responsible for the XHTML Modularization spec.
   To minimize wasted effort, the copy actually consulted is the
   [32]editor's copy of 19 February 2007.

     [31] http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705

[32] http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization-20070219/introduction.html


1. Background

   Owing apparently to human error, the XML Schema Working Group failed
   to attend to the publication of the Last Call draft of [33]XHTML
   Modularization 1.1, and consequently failed to review the spec
   during the scheduled last-call comment period.

   We apologize for this oversight; our chair has administered severe
   counseling to our staff contact, and our staff contact has promised
   he will endeavor not to make similar mistakes in future.
   Since HTML and XHTML constitute by far the most widely used
   vocabularies published by any W3C Working Group, the Schema Working
   Group has a deep interest in making sure the formulations of XHTML
   using XML Schema are as useful as possible.

   The following comments have been prepared in haste, in an attempt to
   perform as useful a review as possible.

   The Schema Working Group's previous comments (apparently on the
   [34]Last Call draft of 9 December 2002) are at
   <URL:[35]http://www.w3.org/XML/Group/2003/01/xmlschema-notes-on-xhtm
   l-modularization.html> and were transmitted to the HTML WG in
   <URL:[36]http://lists.w3.org/Archives/Public/www-html-editor/2003Jan
   Mar/0043.html> and
   <URL:[37]http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003J
   an/0099.html>.

   A quick summary of the earlier comments:

    1. Please use the appropriate simple types.
    2. Exploit substitution groups.
    3. Explain what to do about multiple schemas for same namespace.
    4. Don't declare everything blocked and final!
    5. Sec 2.2.6 is opaque.
    6. Point to external documentation.
    7. Provide internal documentation.
    8. Clarify conformance.
    9. More concrete extension scenarios.
   10. Exhibit structure of schema better.

     [33] http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705
     [34] http://www.w3.org/TR/2002/WD-xhtml-m12n-schema-20021209/

[35] http://www.w3.org/XML/Group/2003/01/xmlschema-notes-on-xhtml-modularization.html[36] http://lists.w3.org/Archives/Public/www-html-editor/2003JanMar/0043.html[37] http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003Jan/0099.html


   It appears that the current document addresses a number of these
   comments very directly; others less so or not at all.
   The XML Schema Working Group appears not to have reviewed or sent
   comments on the later working drafts of [38]3 October 2003 or [39]13
   February 2006.

     [38] http://www.w3.org/TR/2003/WD-xhtml-m12n-schema-20031003/
     [39] http://www.w3.org/TR/2006/PR-xhtml-modularization-20060213/

2. Substantive comments

   The following comments are substantive in the sense that they
   propose changes which would affect the validity of some documents in
   the XHTML family. Whether they are substantive in the sense that
   they would invalidate existing reviews of the Modularization
   document, we leave to others to decide.

2.1. Charset type

   Charset is defined as a vacuous restriction of xsd:string. That may
   be the right thing to do, but it seems likely that a better
   definition can be formulated. First, RFC 2045 defines charset values
   as either tokens or quoted-strings; it defines token as containing
   only ASCII characters and it seems to take over the definition of
   quoted-string from RFC 822, which define quoted-string as containing
   only ASCII characters. So a better definition of Charset might be

 <xsd:simpleType name="Other-Charset-identifier">
  <xsd:annotation>
   <xsd:documentation>
    <div xmlns="http://www.w3.org/1999/xhtml";>
     <p>Charset values predefined by RFC 2046.  The RFC
      restricts these values to ASCII characters,
      i.e. those in the Unicode BasicLatin block.</p>
    </div>
   </xsd:documentation>
  </xsd:annotation>
  <xsd:restriction base="xsd:string">
   <xsd:pattern value="\p{IsBasicLatin}">
   </xsd:pattern>
  </xsd:restriction>
 </xsd:simpleType>

   The IANA registry seems to say that in fact charset identifiers are
   limited to 40 characters, but it's not clear whether that rule is
   intended by the XHTML spec to be binding on Charset values in HTML
   documents.

   Another point is that it might be more helpful for readers (and
   possibly implementors) to define the type in such a way as to
   identify at least some of the well-known identifiers which user
   agents should recognize — e.g. those mentioned in RFC 2046 — as well
   as others. One way to do this would be to define a type listing the
   charset values identified in RFC 2046, and then define a union of
   that type with xsd:string. The well-known charset values can be
   enumerated:

 <xsd:simpleType name="RFC2046-Predefined-charsets">
  <xsd:annotation>
   <xsd:documentation>
    <div xmlns="http://www.w3.org/1999/xhtml";>
     <p>Charset values predefined by RFC 2046.  Other
      values are also accepted as charset values.</p>
    </div>
   </xsd:documentation>
  </xsd:annotation>
  <xsd:restriction base="xsd:string">
   <xsd:enumeration value="US-ASCII">
    <xsd:annotation>
     <xsd:documentation>As defined in ANSI X3.4-1986.</xsd:documentatio
n>
    </xsd:annotation>
   </xsd:enumeration>
   <xsd:enumeration value="ISO-8859-1"/>
   <xsd:enumeration value="ISO-8859-2"/>
   <xsd:enumeration value="ISO-8859-3"/>
   <xsd:enumeration value="ISO-8859-4"/>
   <xsd:enumeration value="ISO-8859-5"/>
   <xsd:enumeration value="ISO-8859-6"/>
   <xsd:enumeration value="ISO-8859-7"/>
   <xsd:enumeration value="ISO-8859-8"/>
   <xsd:enumeration value="ISO-8859-9"/>
   <xsd:enumeration value="ISO-8859-10"/>
  </xsd:restriction>
 </xsd:simpleType>

   The problem with this is that the RFCs define charset values as
   case-insensitive. So probably a better way to define the well known
   charset values would be with patterns:

 <xsd:simpleType name="RFC2046-Predefined-charsets">
  <xsd:annotation>
   <xsd:documentation>
    <div xmlns="http://www.w3.org/1999/xhtml";>
     <p>Charset values predefined by RFC 2046.  Other
      values are also accepted.</p>
    </div>
   </xsd:documentation>
  </xsd:annotation>
  <xsd:restriction base="xsd:string">
   <xsd:whiteSpace value="collapse"/>
   <xsd:pattern value="[Uu][Ss]-[Aa][Ss][Cc][Ii][Ii]">
    <xsd:annotation>
     <xsd:documentation>As defined in ANSI X3.4-1986.</xsd:documentatio
n>
    </xsd:annotation>
   </xsd:pattern>
   <xsd:pattern value="[Ii][Ss][Oo]-8859-(10|[1-9])">
    <xsd:annotation>
     <xsd:documentation>ISO-8859 parts 1-10.</xsd:documentation>
    </xsd:annotation>
   </xsd:pattern>
  </xsd:restriction>
 </xsd:simpleType>

   The actual definition of Charset could usefully be a union of these
   two:
 <xsd:simpleType name="Charset">
  <xsd:annotation>
   <xsd:documentation>
    <div xmlns="http://www.w3.org/1999/xhtml";>
     <p>Charset values.  Accept values predefined by RFC 2046,
      and also other values.</p>
    </div>
   </xsd:documentation>
  </xsd:annotation>
  <xsd:union memberTypes="
   xh11d:RFC2046-Predefined-charsets
   xh11d:Other-Charset-identifier
   ">
  </xsd:union>
 </xsd:simpleType>

   A more ambitous definition might mention all of the values in the
   IANA type registry, but the result, when examined, is rather long
   and not really very informative — rather like the registry itself
   — and it is not included here.

2.2. Color type

   Two things seem puzzling in the current definition of Color: (1) it
   allows any NMTOKEN, rather than just the sixteen well known color
   names. And (2) while six-digit hexadecimal values are allowed,
   three-digit values are not allowed. (The description of Color in
   HTML 4.01 (<URL:[40]http://www.w3.org/TR/html401/types.html#h-6.5>)
   doesn't actually specify how many digits are to be used for hex
   color values.)

   If these properties are unintentional, a type that identifies the
   well-known names and allows three-digit hex values may be better:

 <!-- sixteen color names or RGB color expression-->
 <xsd:simpleType name="Color">
  <xsd:union>
   <xsd:simpleType>
    <!--* Known color names are case-insensitive *-->
    <xsd:restriction base="xsd:NMTOKEN">
     <xsd:pattern value="[Bb][Ll][Aa][Cc][Kk]"/>
     <xsd:pattern value="[Gg][Rr][Ee][Ee][Nn]"/>
     <xsd:pattern value="[Ss][Ii][Ll][Vv][Ee][Rr]"/>
     <xsd:pattern value="[Ll][Ii][Mm][Ee]"/>
     <xsd:pattern value="[Gg][Rr][Aa][Yy]"/>
     <xsd:pattern value="[Oo][Ll][Ii][Vv][Ee]"/>
     <xsd:pattern value="[Ww][Hh][Ii][Tt][Ee]"/>
     <xsd:pattern value="[Yy][Ee][Ll][Ll][Oo][Ww]"/>
     <xsd:pattern value="[Mm][Aa][Rr][Oo][Oo][Nn]"/>
     <xsd:pattern value="[Nn][Aa][Vv][Yy]"/>
     <xsd:pattern value="[Rr][Ee][Dd]"/>
     <xsd:pattern value="[Bb][Ll][Uu][Ee]"/>
     <xsd:pattern value="[Pp][Uu][Rr][Pp][Ll][Ee]"/>
     <xsd:pattern value="[Tt][Ee][Aa][Ll]"/>
     <xsd:pattern value="[Ff][Uu][Cc][Hh][Ss][Ii][Aa]"/>
     <xsd:pattern value="[Aa][Qq][Uu][Aa]"/>
    </xsd:enumeration>
    </xsd:restriction>
   </xsd:simpleType>
   <xsd:simpleType>
    <!--* Other numbers are expressed using a hash mark plus a
        * three- or six-digit hexadecimal number *-->
    <xsd:restriction base="xsd:token">
     <xsd:pattern value="#[0-9a-fA-F]{3}([0-9a-fA-F]{3})?"/>
    </xsd:restriction>
   </xsd:simpleType>
  </xsd:union>
 </xsd:simpleType>

     [40] http://www.w3.org/TR/html401/types.html#h-6.5

   If it's desired to allow other NMTOKEN values to count as valid, as
   well as the sixteen named by HTML 4.01 (e.g. for the system colors
   allowed by CSS2
   <URL:[41]http://www.w3.org/TR/REC-CSS2/syndata.html#value-def-color
   >]), then inserting

   <xsd:simpleType>
    <xsd:restriction base="xsd:NMTOKEN"/>
   </xsd:simpleType>

     [41] http://www.w3.org/TR/REC-CSS2/syndata.html#value-def-color

   as a final union member would do that. (Since the system colors of
   CSS2 appear to be a finite enumerated list, they could be defined in
   the same was as the sixteen names in HTML 4.01, although for clarity
   they should probably go into a different member type. That's left as
   an exercise for the reader.)

2.3. ContentType

   Like Charset, this could be defined as a union whose first member(s)
   recognize well-known values defined by the RFCs or in the IANA
   registry and whose final type (here xsd:string) takes care of
   extensibility. It's not clear to me whether the values are in fact
   limited by the RFC to ASCII characters; if so, xsd:string is a bit
   too broad.

2.4. Coords type

   Since the possible values of Coords values are so clearly specified
   in the spec, it seems a shame not to define the type a little more
   tightly. The absence of macros in XML Schema regular expressions
   makes life a little harder, but one reason XML Schema doesn't need
   macros in regexes is that we can use general entities. If we write
   the following entity declarations into the internal subset of the
   schema document, we have general entities which correspond to the
   important bits of coordinate strings, as defined in HTML
   (<URL:[42]http://www.w3.org/TR/html401/struct/objects.html#adef-coor
   ds>):

  <!ENTITY Pixel "\d+">
  <!ENTITY Percent "(\d+[%]|\d*\.\d+[%])">
  <!ENTITY Length "(&Pixel;|&Percent;)">
  <!ENTITY Comma  "\s*,\s*">
  <!ENTITY Pair   "&Length;&Comma;&Length;">

     [42] http://www.w3.org/TR/html401/struct/objects.html#adef-coords

   That allows the declarations to be fairly clear about their
   structure:

 <xsd:simpleType name="Coords.rect">
  <xsd:restriction base="xsd:token">
   <xsd:pattern value="(&Length;&Comma;){3}(&Length;)"/>
  </xsd:restriction>
 </xsd:simpleType>
 <xsd:simpleType name="Coords.circle">
  <xsd:restriction base="xsd:token">
   <xsd:pattern value="(&Length;&Comma;){2}(&Length;)"/>
  </xsd:restriction>
 </xsd:simpleType>
 <xsd:simpleType name="Coords.poly">
  <xsd:restriction base="xsd:token">
   <xsd:pattern value="(&Pair;&Comma;){2,unbounded}(&Pair;)"/>
  </xsd:restriction>
 </xsd:simpleType>

   If they prove to cause trouble for any schema processors, of course,
   the entity references can be expanded.
   And the Coords type can be clear that what is expected is either the
   coordinates for a rectangle, or those for a circle, or those for a
   polygon. (Type-aware systems can use the information about which
   member type in the union actually accepted the value to perform a
   sanity check: if the coords attribute has type Coords.rect, then the
   value of the shape attribute had better be 'rect', and vice versa.)

 <xsd:simpleType name="Coords">
  <xsd:union memberTypes="
    xh11d:Coords.rect
    xh11d:Coords.circle
    xh11d:Coords.poly">
  </xsd:union>
 </xsd:simpleType>

2.5. FPI type

   ISO 8879 appears to define the formal public identifier using a
   regular language, which means it's not necessary to allow any
   xsd:normalizedString value. (The formalization below assumes that
   only unregistered owner identifiers are to be used, since section
   3.6 of this spec says the value must begin with '-'.) Building it up
   gradually using entities, one can write:

  <!ENTITY minimum-data "[ a-zA-Z()+,\-./:/?]*">
  <!ENTITY owner-id   "&minimum-data;">
  <!ENTITY textclass1 "(DTD|ELEMENTS|ENTITIES|NOTATION|TEXT)">
  <!ENTITY textclass2 "(CAPACITY|CHARSET|DOCUMENT|LPD|NONSGML|SHORTREF|
SUBDOC|SYNTAX)">
  <!ENTITY textclass  "(&textclass1;|&textclass2;)">

   It's not clear that any of the names in textclass2 make any sense
   whatever for modules intended for use in the XHTML family, so one
   might choose to omit them.

  <!ENTITY langname   "(\i\c*)">
  <!ENTITY designator "&minimum-data;">
  <!ENTITY lang-or-des "(&langname;|&designator;)">
  <!ENTITY display    "&minimum-data;">

  <!ENTITY textid "&textclass; (-//)?&textdesc;//&lang-or-des;(//&displ
ay;)?">

  <!ENTITY fpi "-//&ownerid;//&textid;">

   The pattern is then quite simple:

 <xsd:simpleType name="FPI">
  <xsd:restriction base="xsd:normalizedString">
   <xsd:pattern value="&fpi;"/>
  </xsd:restriction>
 </xsd:simpleType>

2.6. FrameTarget type

   The HTML spec
   (<URL:[43]http://www.w3.org/TR/html401/types.html#h-6.16>) seems to
   want a slightly tighter definition of frame target names. Perhaps
   something like the following should be used.

 <xsd:simpleType name="FrameTarget">
  <xsd:union>
   <xsd:simpleType>
    <xsd:restriction base="xsd:NMTOKEN">
     <xsd:enumeration value="_blank"/>
     <xsd:enumeration value="_self"/>
     <xsd:enumeration value="_parent"/>
     <xsd:enumeration value="_top"/>
    </xsd:restriction>
   </xsd:simpleType>
   <xsd:simpleType>
    <xsd:restriction base="xsd:string">
     <xsd:pattern value="[a-zA-Z].*"/>
    </xsd:restriction>
   </xsd:simpleType>
  </xsd:union>
 </xsd:simpleType>

     [43] http://www.w3.org/TR/html401/types.html#h-6.16

2.7. LinkTypes type

   LinkTypes is a good example of a type with what is sometimes called
   a ‘semi-open’ list of values. Some set of well-known values is
   defined, which software is encouraged to recognize and which authors
   are encouraged to use when appropriate, but for strict validity, a
   much larger set of values is allowed.

   In such cases, it's good practice to document the recognized types
   in the type definition. Since the well known values here are case
   insensitive, that's best done with a list of patterns rather than
   with an enumeration:

 <xsd:simpleType name="KnownLinkTypes">
  <xsd:restriction base="xsd:NMTOKEN">
   <xsd:pattern value="[Aa][Ll][Tt][Ee][Rr][Nn][Aa][Tt][Ee]"/>
   <xsd:pattern value="[Ss][Tt][Yy][Ll][Ee][Ss][Hh][Ee][Ee][Tt]"/>
   <xsd:pattern value="[Ss][Tt][Aa][Rr][Tt]"/>
   <xsd:pattern value="[Nn][Ee][Xx][Tt]"/>
   <xsd:pattern value="[Pp][Rr][Ee][Vv]"/>
   <xsd:pattern value="[Cc][Oo][Nn][Tt][Ee][Nn][Tt][Ss]"/>
   <xsd:pattern value="[Ii][Nn][Dd][Ee][Xx]"/>
   <xsd:pattern value="[Gg][Ll][Oo][Ss][Ss][Aa][Rr][Yy]"/>
   <xsd:pattern value="[Cc][Oo][Pp][Yy][Rr][Ii][Gg][Hh][Tt]"/>
   <xsd:pattern value="[Cc][Hh][Aa][Pp][Tt][Ee][Rr]"/>
   <xsd:pattern value="[Ss][Ee][Cc][Tt][Ii][Oo][Nn]"/>
   <xsd:pattern value="[Ss][Uu][Bb][Ss][Ee][Cc][Tt][Ii][Oo][Nn]"/>
   <xsd:pattern value="[Aa][Pp][Pp][Ee][Nn][Dd][Ii][Xx]"/>
   <xsd:pattern value="[Hh][Ee][Ll][Pp]"/>
   <xsd:pattern value="[Bb][Oo][Oo][Kk][Mm][Aa][Rr][Kk]"/>
   </xsd:enumeration>
  </xsd:restriction>
 </xsd:simpleType>

 <xsd:simpleType name="LinkTypes">
  <xsd:union memberTypes="xh11d:KnownLinkTypes xsd:NMTOKEN"/>
  </xsd:union>
 </xsd:simpleType>

2.8. Tightening other types

   If we continue in the same way, we risk belaboring out point past
   reason. So instead of commenting in detail on individual types which
   could, it seems to us, usefully be made more restrictive, or more
   informative, or both, by means of enumerations or patterns to
   recognize well known values or unions to combine subtypes (including
   more and less restrictive definitions of a datatype), we will merely
   say that we believe other types should also be given definitions
   closer to the requirements of the prose. (MultiLength, for example,
   is not really that hard to capture with a pattern.)

2.9. Named model groups vs. substitution groups

   We reiterate our advice of four years ago: the definition of the
   XHTML vocabulary would be easier to follow, and it would be easier
   to extend it, if the schema documents used substitution groups
   wherever feasible.

   If you have had specific problems applying substitution groups to
   XHTML, we would very much like to know what they were; we can
   speculate, but would prefer to hear from you.
   Using named model groups for extensibility has a number of
   unfortunate side effects. For example, the schema includes this
   definition:

  <xs:group
         name="xhtml.title.content">
         <xs:sequence/>
     </xs:group>

   What's the point of that, exactly? Presumably the idea is to play a
   similar trick to what you did when this was a DTD and splice your
   own stuff in there from your own namespace. But how does using a
   group get you there? It's not impossible, but it is harder than
   necessary and you could just as easily redefine the element in
   question directtly. So defining all these content groups just gums
   up the schema and makes it harder to read. (Those accustomed to
   DTD-based extension of vocabularies may have little trouble
   following the logic here, but that group may no longer be as large
   as it once was.)

   If a user wants to use XHTML and just add one little inline element
   or allow some new content in, say, the title element, the user has
   to jump through a few unnecessary hoops.

   This scenario could be better enabled even within the existing
   architecture just by adding an abstract substitution group head as a
   choice to all the named model groups.

   So even if you don't restructure the schema documents to use
   substitution groups wherever possible, you could simplify
   extensibility for users of the spec a great deal by just adding an
   abstract element to each group, or each content model where
   extensibiity is an obvious requirement, to provide hooks for later
   schema authors.

2.10. Adding attributes

   It's not clear that the way modules add attributes works. For
   example, the client side image map module adds attributes to the img
   element. All well and good, but looking at the schema I see an
   attribute group defined:

  <!-- modify img attribute definition list -->
     <xs:attributeGroup name="xhtml.img.csim.attlist">
         <xs:attribute name="usemap" type="xs:IDREF"/>
     </xs:attributeGroup>

   I can't see where this actually is used anywhere in the schema. I
   think what the module should be doing is a redefine of the groups.

2.11. A missing scenario

   One important scenario that seems to be missing is just plonking
   bits of the XHTML namespace into specific places in some other
   namespace. Maybe its too obvious/easy, but it is actually the most
   common scenario. e.g. MyOwnLanguage has its own things, and I'll
   just put some XHTML inline elements here.

   Introducing XHTML elements into the xsd:documentation elements in a
   schema document is another instance of the scenario.

3. Editorial comments

   The following comments are editorial; we hope that they can be made
   without invalidating any existing reviews of the specification.

3.1. Make the introduction less DTD-specific

   Section 1 Introduction
   <URL:[44]http://www.w3.org/TR/xhtml-modularization/introduction.html
   > also
   <URL:[45]http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization
   -20070219/introduction.html>
   sec 1.2 para 1: "These abstract modules are implemented in this
   specification using the XML Document Type Definition language, but
   an implementation using XML Schemas is expected." Read "These
   abstract modules are implemented in this specification using both
   the XML Document Type Definition language and XML Schema 1.0."?
   sec 1.3.4 para 2:

     [44] http://www.w3.org/TR/xhtml-modularization/introduction.html

[45] http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization-20070219/introduction.html


     A document is an instance of one particular document type defined
     by the DTD identified in the document's prologue. Validating the
     document is the process of checking that the document complies
     with the rules in the document type definition.

   Here (as elsewhere) there are traces of DTD-only terminology. Some
   SGML experts maintain that the term "document type definition" of
   ISO 8879 and XML is defined broadly enough to include schemas
   defined with XSD or with any other language currently known to
   information technology — on that reading, the only problem with the
   paragraph just quoted is the assumption that the document and its
   DTD are associated in the document's prologue.
   Normal usage, however, uses the term "document type definition" with
   narrower scope nowadays, to mean only those schemas written using
   the bracket-bang keyword syntax of ISO 8879 and the XML spec. On
   that reading, there are several things in this paragraph that apply
   only to conventional XML DTDs, not to schemas in general:
   In fact, any document is an instance of an infinite number of
   document types and schemas (or document type definitions), just as
   any object is contained by an infinite number of sets. This fact
   does not conflict with the equally important fact that an author may
   wish to advertise conformance to a particular schema or affiliation
   with a particular document type, either for the sake of tool support
   or for other reasons.

   Documents may be associated with a schema by their prolog, or by
   xsi:schemaLocation hints in the document instance, or by out-of-band
   associations between document and schema (e.g. by parameters passed
   to the validator at invocation time).
   Validation is the process of checking whether, not the process of
   ensuring that, a document complies with the rules in the document
   type definition.

   To make this paragraph cover the current situation (where you're
   providing normative XSD schema documents as well as normative DTDs),
   you might consider saying something like the following. If you're
   willing to adopt the term "schema" as the general term for a formal
   machine-readable expression of the rules for a document type, then:

     A document may be associated with a particular document type
     defined by a schema. The document's prolog may identify a DTD, or
     xsi:schemaLocation attributes may be used to associated the
     document with a schema written in XML Schema 1.0, or the document
     may be associated with a schema by other means (e.g.
     validation-time identification of the schema by means of a
     parameter passed to a validator). Validating the document is the
     process of testing whether the document complies with the rules in
     the schema.

   Or if you'd prefer to stay with "document type definition", you
   could write:

     A document may be associated with a particular document type. The
     document's prolog may identify a DTD, or xsi:schemaLocation
     attributes may be used to associated the document with a document
     type definition written in XML Schema 1.0, or the document may be
     associated with a document type definition by other means (e.g. a
     parameter passed to a validator). Validating the document is the
     process of testing whether the document complies with the rules in
     the document type definition.

   If you stick with "document type definition", you might want to add
   something to the definition of "document type definition" in the
   glossary, e.g. by changing the sentence:

     The same markup model may be expressed by a variety of DTDs.

   to something like

     The same markup model may be expressed by a variety of document
     type definitions, written in a variety of languages, such as the
     DTD notation of XML or XML Schema 1.0.

   just to make explicit somewhere that you're using "document type
   definition" to cover rules written in a variety of languages. You
   could mention Relax NG and/or Schematron, too, if you wish.

3.2. The term PCDATA

   Section 4.2
   <URL:[46]http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization
   -20070219/abstraction.html>
   4.2 para 1 reads in part

[46] http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization-20070219/abstraction.html


     ... In these cases, the symbol used for text is PCDATA (processed
     characted data). This is a term, defined in the XML 1.0
     Recommendation, that refers to processed character data. ...

   Strictly speaking, XML 1.0 doesn't define the term; it only says

     The keyword #PCDATA derives historically from the term "parsed
     character data."

   (Note also the typo 'characted' for 'character'.)
   We'd suggest rewording to say something like

     ... In these cases, the symbol used for text is PCDATA; this is
     short for "parsed character data", denoting sequences of
     characters which are to be parsed for markup by an XML processor.
     ...

3.3. Section 4.3 Attribute Types

   Congratulations to the editors; this section is much easier to read
   and follow than is sometimes the case when specs defined (or fail to
   define) fundamental types used throughout them.
   Some comments on the definitions of some of the datatypes, as found
   in
   <URL:[47]http://www.w3.org/TR/xhtml-modularization/SCHEMA/xhtml-data
   types-1.xsd> and other schema documents, may be found elsewhere.

[47] http://www.w3.org/TR/xhtml-modularization/SCHEMA/xhtml-datatypes-1.xsd


3.4. Length type: well done

   The definition for Length seems well done. Good work!

3.5. Shape type

   Shouldn't the overview in section 4.3 say that Shape has just the
   four values rect, circle, ply, and default?

3.6. White space in the document source

   Minor but extremely irritating:
   <URL:[48]http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization
   -20070219/schema_module_defs.html#a_smodule_Text>
   <URL:[49]http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization
   -20070219/schema_module_defs.html#a_smodule_Presentation> (and
   presumably others) have the tabbing alignment in the schema messed
   up, making it harder to read.

[48] http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization-20070219/schema_module_defs.html#a_smodule_Text[49] http://www.w3.org/MarkUp/Group/2007/WD-xhtml-modularization-20070219/schema_module_defs.html#a_smodule_Presentation


4. Comments half substantive and half editorial

   The following comments may be regarded as purely editorial, or they
   may be regarded as substantive; we leave that judgment to you.

4.1. Testing the schema documents

   We endeavored to test the schema documents for syntax errors or
   other problems, but encountered some difficulty knowing where to
   start. Which file(s) should be used as the top-level driver file(s)?
   One test reported:

   I'm using files extracted from
   <URL:[50]http://www.w3.org/TR/xhtml-modularization/xhtml-modularizat
   ion.zip>.

[50] http://www.w3.org/TR/xhtml-modularization/xhtml-modularization.zip


   xhtml-framework-1.xsd seems to be the root (the first one mentioned
   in Appendix C). But it won't compile (missing many att-groups like
   "xhtml.Core.extra.attrib" and "xhtml.I18n.extra.attrib"). I can't
   tell whether this is an error or users of these schemas must provide
   definitions of those att-groups. (Looks like the latter, because one
   of the examples myml-model-1.xsd defines those missing groups.)

   I was hoping testing.xml can be a little more helpful, but
   unfortunately it refers to
   <URL:[51]file:/C:/cygwin/home/ahby/htmlwg/xhtml-modularization/SCHEM
   A/xhtml11.xsd>
   I really hope I can't access someone else's "file:/C:/"
   xhtml11.xsd doesn't exist anywhere.

[51] file://localhost/C:/cygwin/home/ahby/htmlwg/xhtml-modularization/SCHEMA/xhtml11.xsd


   So I gave up on that. Then I looked in the examples directory.
   "simpleml-1_0.xsd" doesn't refer to anything like "../". It
   redefines "xhtml.Misc.class" in
   http://www.w3.org/MarkUp/SCHEMA/xhtml-basic10.xsd. But Xerces-J
   fails to locate that group in the schema being redefined. (I found a
   Misc.class, but nothing starts with "xhtml.".) I then got many more
   errors about missing components. Similar to the ones I got from
   xhtml-framework-1.xsd, but different. (Note that these errors are
   from schema files in http://www.w3.org/MarkUp/SCHEMA/.)

   My last hope was those .html files in examples. Unfortunately they
   all they gave me was more errors, both in the schema and the
   instance.

   In summary, I don't know how these files should be used, so I can't
   claim that they are broken. No useful input from me ...

   [Later information from Shane McCarron is that this spec doesn't
   provide a driver, but that
   <URL:[52]http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd> might be
   consulted as an example. To be followed up ...)

     [52] http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd

4.2. Where is the html element?

   (Possibly related to the preceding.)
   Where is the html element defined?
   After some searching, starting not from this document but from
   <URL:[53]http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd>, we found a
   definition in
   <URL:[54]http://www.w3.org/MarkUp/SCHEMA/xhtml11-model-1.xsd>.
   This may be solely an editorial issue: the abstract says

     [53] http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd
     [54] http://www.w3.org/MarkUp/SCHEMA/xhtml11-model-1.xsd

     This modularization provides a means for subsetting and extending
     XHTML, a feature needed for extending XHTML's reach onto emerging
     platforms. This specification is intended for use by language
     designers as they construct new XHTML Family Markup Languages.

   and this had lead at least some readers to infer that the modules
   defined here would include everything needed for a definition of
   XHTML 1.1, including the top-level driver files.
   If the problem is editorial, the solution is also editorial: the
   spec needs to make clear(er) that no top-level driver for XHTML is
   provided. (And, for the instruction of those seeking to understand
   how to use these modules, a pointer to the XHTML 1.1 driver modules
   would be very useful. If such a pointer is already present, then let
   this note serve as a record that at least some readers didn't see
   the pointer when they needed to.)

   But the issue appears to at least some readers as at least partly
   substantive: that is, it seems to us that a specification describing
   a modular definition of the XHTML 1.1 vocabulary ought, in the
   nature of things, to include a top-level driver module which calls
   in all the others.

4.3. Case insensitivity and XML Schema patterns or enumerations

   Several of the alternative type definitions offered elsewhere in
   these comments propose to use patterns (rather than enuemerations,
   as one might expect) to handle the well known values for types which
   have well known values. In the numerous cases in which the values
   are defined as case insensitive, the pattern for a
   (case-insensitive) value like “black” is written “<xsd:pattern
   value="[Bb][Ll][Aa][Cc][Kk]"/>”.

   The regularity with which this technique must be used suggests that
   perhaps XML Schema should add a caseInsensitive flag to patterns.
   This would allow writing the pattern “<xsd:pattern value="black"
   caseInsensitve="true"/>” instead.

   Given that many regex libraries already have such flags, such an
   addition wouldn't seem to be difficult for implementors.
   Should the XML Schema Working Group consider such a change?

   And if so, what is to be done about Unicode characters for which the
   upper/lowercase mapping is not 1:1? And what should be done about
   title case?

comments on XHTML Modularization 1.1 from XML Schema WG

Reply via email to