From: "Jon Hanna" <[EMAIL PROTECTED]> > Lots of different things happen that affect the whitespace of an XML > document (whether a DOM tree is constructed or not, since it isn't the only > legal way to process an XML document).
Of course one is not required to build an actual DOM tree, however XML, HTML and alike is now defined in terms of the DOM, where the text/xml syntax is just a serialization, which is the only place where whitespaces normalization is defined (such normalization does not occur at the DOM level, and a XML document may be serialized with another concrete syntax than the one assigned to the "text/xml" MIME type, registered and documented by the W3C. When processing XML documents, the DOM part is the most important feature and it is logically separated from the concrete syntax used by text XML parsers. The W3C defines very strict rules to ensure that the DOM-equivalent data will be preserved, and whitespace normalization in XML documents serialized as "text/xml" is mandatory, or it is not a valid "text/xml" serialization. Processing a "text/xml" document in a way that would be incompatible with what a DOM tree builder would create is not conforming. If this is different, then it is not XML but a derived language (for example HTML or SGML which are using more "relaxed" syntaxes). In XML, whitespace normalization can be overriden using very precise rules within the parser only, but not in the resulting DOM-tree, so it is important to understand each step that goes from the concreate text/xml syntax to the DOM-tree or its equivalents (notably the successive steps required in parsed entities, named entities, ...) No XML application is required to use the "text/xml" MIME syntax, and there exists such examples (for example the serialization and compression formats used by WAP, MMS, Nec's i-Mode, and SOAP). If an application does not build the DOM tree, it is still required to perform namespace resolution and to solve named entities according to the standard "text/xml" MIME rules formulated by the W3C reference, including all its facets, needed for interoperability of document properties independantly of the character encoding used in the serialized document, or its syntaxic representation. In my opinion, all XML-based languages should be defined now in terms of its DOM structure, and the XML application should be defined by a valid DTD, or beter now with a now standard XSD schema, that can be processed by validating parsers (parsers that absolutely need to create a DOM-like tree or flow of tokens with strictly defined properties, value sets and behavior.) Without DOM interoperability, XML would be another imprecise language like HTML, with very little reusability due to naming conflicts. This is the most important benefit of XHTML (strictly based on XML) face to HTML (4.x and before) and SGML (all versions), notably when a schema is explicitly specified for the document, and is loaded for validating purposes (some schemas are normative like XHTML, and canot be changed by authors)

