on Tue, Jul 8, 2025 Leif wrote:
Her I am with a controversial subject, once again :-)
Yes, this has been discussed before. See your "RFE: Improved whitespace treatment", https://www.mail-archive.com/[email protected]/msg12213.html
In XXE (as of version 10.12) indentation of HTML differs from that other
formats. For instance, this is how XXE indents the following DITA code:
<body>
<p>Foo</p>
<p>Bar</p>
</body>
And XXE also renders the above DITA code “normally”. Whereas for the
HTML, if the above code had been a piece of HTML code, then XXE would
have rendered an empty line – AKA line break – between each of the
paragraph elements. Irritating and visually confusing!
Hence, XXE’s seeming difficulty with rendering HTML, must be the reason
why XXE would in fact not have indented the above code, had it in been
HTML. Because, in that case, the “indentation” would have looked like this:
<body><p>Foo</p><p>Bar</p></body>
So far so good. However, given XXE’s difficulties with rendering HTML, I
think that there is a glitch in how XXE *indents* HTML. Because, what
happens if the user indents each child element above, in the following
manner – or if XXE opens a file with the following HTMLcode?
<body>
<p>Foo</p>
<p>Bar</p>
</body>
This happens: When the user saves the document, the indents are
converted to a single space character *between* (but not before [for
some reason] or after ...) each child element:
<body><p>Foo</p> <p>Bar</p></body>
In XXE, the result of those space characters between the elements, is
that it looks as if there is a empty line before eacg <p> element.
It is as if the body element had <body xml:space="preserve"> (with the
exception that it would not apply to *before* the first child element,
and *after* the last child element).
For me/us, after having worked on a document for some while, these empty
line sooner or later pops up, and make the document look «funny» inside
XXE. It has often caused me to remove whitespace in the code, just to
satify XXE’s demands with regard to how to render things. A waste or
time, really.
So why does XXE do this? What is the benefit of replacing the indents
with a single space character? And is it even compatible with the specs?
It is one thing that Web browser would not have any trouble with that
space character - they would not render it, whereas XXE will render it.
But why does XXE insert the space character there in the first place,
when it is only itself that has trouble with it?
For instance, MDN discusse the generic xml:space [1] attribute and says
that in the default state (which I assume is equalt to just skipping the
entire xml:space attribute), the following happens:
1. All newline characters are removed.
2. All tab characters are converted into space characters.
3. All leading and trailing space characters are removed.
4. All contiguous space characters are collapsed into a single space
character.
If XXE, instead of the current behavior, had followed the steps
described by MDN, the result would be *no* space character between the
child elements (since, by step 3, all the space characters would have
been removed).
Which would be incorrect. Example, the author types this: --- <p> <b>Hello</b> <i>world!</i> </p> ---If you remove all the whitespace between the child elements, you'll get this:
--- <p><b>Hello</b><i>world!</i></p> --- XXE (<p> may contain text; implicit xml:space=default) gives you this: --- <p> <b>Hello</b> <i>world!</i> </p> ---
Or to put it diferently, by following those steps, we
would be back to the initial one line *without* space between the child
elements.
Again, XXE does *not* promise to save the whitespace infrastructure. I
accept that. But why not, for HTML, at least «destroy» the indentation
in a way that is compatible both with (default) XML whitespace treatment
as well with the way XXE’s rendering works?
By the way: The above does not describe in entirety how XXE indents
HTML. For certain elements, notably lists and the table element (and the
child elements they are made up of), line breaks between elements are
preserverd. Also the current indentation, as described above, is
meaningful if one replaces the <p> elements with inline elements, e.g
with <span> elements (because, then the space between elements is
usually meaningful).
Solutions: I think there are 2 or 3 ways to solve the problem here:
(1) Change the way XXE renderes whitespace in HTML.
(2) Or change the way XXE indents HTML.
(3) Or a combination of both.
I would be fine with either solution. But given that XXE already indents
different HTML elements in different ways, how about introducing even
more nuance, by implementing xml:space='default' compatible indentation
for child elements that are block elements, and retain the current
behavior for child elements that are inline elements?
May be it would not give us perferct results, but it should at least
become *more* perfect, and solve some common and irritating cases.
[1]
https://developer.mozilla.org/en-US/docs/Web/SVG/Reference/Attribute/xml:space
I'm sorry but once again, we will not implement your RFE. XMLmind XML Editor being an XML editor, it simply follows "the XML rules".* DITA <body> may not contain text data. See https://docs.oasis-open.org/dita/dita/v1.3/errata02/os/complete/part2-tech-content/contentmodels/cmtcb.html . Hence XXE feels free to indent the <p> children of a <body> when saving the DITA document and then to completely remove this indentation when reopening the saved document.
* (X)HTML 5 <body> has flow content and as such, may contain text data. See https://html.spec.whatwg.org/multipage/sections.html#the-body-element . See https://html.spec.whatwg.org/multipage/dom.html#flow-content-2 . Therefore XXE will not indent the <p> children of a <body> because any text inside a <body>, including indentation spaces, is considered to be meaningful.
Now when you indent manually as follows your saved XHTML5 file to workaround this XXE "deficiency":
<body>
<p>Foo</p>
<p>Bar</p>
</body>
XXE not finding any xml:space=preserve, considers xml:space=default.
Therefore XXE conforms to "2.10 White Space Handling",
https://www.w3.org/TR/xml/#sec-white-space , and to make it simple,
collapses contiguous indentation spaces to 1 space char.
Note that it's pretty easy to solve your problem, simply replace the XHTML5 schema by a customized version where <body> may not contain text. (XHTML 1.0 Strict and XHTML 1.1 were like this.)
This should pose no interchange problem at all because (1) your customized XHTML5 schema is more restrictive than the stock one and(2) to our knowledge, there is no "official schema" for XHTML5 published by the WHATWG (see https://html.spec.whatwg.org/multipage/).
Here's how to do that. In XXE_install_dir/addon/config/xhtml/xsd/5/xhtml5.xsd replace:
---
<xs:element name="body">
<xs:complexType>
<xs:complexContent>
<xs:extension base="flowContentElement">
<xs:attributeGroup ref="bodyEventHandlerAttributes"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:element>
---
by:
---
<xs:element name="body">
<xs:complexType>
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:group ref="flowContent"/>
</xs:sequence>
<xs:attributeGroup ref="globalAttributes"/>
<xs:attributeGroup ref="bodyEventHandlerAttributes"/>
</xs:complexType>
</xs:element>
---
See attached xhtml5.xsd.
xhtml5.xsd
Description: XML document
-- XMLmind XML Editor Support List [email protected] http://www.xmlmind.com/mailman/listinfo/xmleditor-support

