On 11.07.2025 12:19 by Hussein Shafie:

on Tue, Jul 8, 2025 Leif wrote:

For instance, MDN discusse the generic xml:space [1] attribute and says
that in the default state (which I assume is equalt to just skipping the
entire xml:space attribute), the following happens:

 1. All newline characters are removed.
 2. All tab characters are converted into space characters.
 3. All leading and trailing space characters are removed.
 4. All contiguous space characters are collapsed into a single space
    character.

If XXE, instead of the current behavior, had followed the steps
described by MDN, the result would be *no* space character between the
child elements (since, by step 3, all the space characters would have
been removed).

Which would be incorrect.


This is just a boring question from me: Are you saying that the «algorithm» that I quoted above from MDN is incorrect? Or that I did read it incorrectly? As I read the above, there would be no single character *within* a line, after the above algorithm has been run.

But note that when I said «no single character *within* a line», I had this example in mind:

<p>
<a>b</a>
<c>d</d>
</p>

I did not say that the above would remove any spaces in this example:

<p> <a>b</a> <c>d</d> </p>


Example, the author types this:
---
<p>   <b>Hello</b>   <i>world!</i> </p>
---

If you remove all the whitespace between the child elements, you'll get this:
---
<p><b>Hello</b><i>world!</i></p>
---

XXE (<p> may contain text; implicit xml:space=default) gives you this:
---
<p> <b>Hello</b> <i>world!</i> </p>
---


Related to what I said above, and for the record, you discuss here what should happen to space(s) within a single line. That was a actually not meant to be my topic.


Now when you indent manually as follows your saved XHTML5 file to workaround this XXE "deficiency":

   <body>
     <p>Foo</p>
     <p>Bar</p>
   </body>

Regarding «this XXE "deficiency"»:

Note that my focus is not improved (that is: more readable) formatting of the source code, but more reliable WYSIWYG rendering in XXE. (But your example from DITA, probably demonstrates that you (too) *would* make the code more readable, if possible.)

Hence, when/if the user formats the code like the above, it is actually not (at least not for me) in order to workaround XXE’s deficency, but out of (temporary, at least) unawareness of XXE’s defiencey when it comes to how XXE would WYSYWIG render the above .

If I would want to work around XXE’s deficiency, I would format the code like XXE itself does when itself takes full control over the matter:

<body><p>Foo</p><p>Bar</p></body>

XXE not finding any xml:space=preserve, considers xml:space=default. Therefore XXE conforms to "2.10 White Space Handling", https://www.w3.org/TR/xml/#sec-white-space , and to make it simple, collapses contiguous indentation spaces to 1 space char.


Having *tested* it some more, I admit that there does indeed remain a single space character. However, having *read* the spec, I have problems understanding where, in the XML spec’s whitespace removal algorithm, that single character comes from ... I guess I need more time to re-read ...


Note that it's pretty easy to solve your problem, simply replace the XHTML5 schema by a customized version where <body> may not contain text. (XHTML 1.0 Strict and XHTML 1.1 were like this.)

This should pose no interchange problem at all because
(1) your customized XHTML5 schema is more restrictive than the stock one
and
(2) to our knowledge, there is no "official schema" for XHTML5 published by the WHATWG (see https://html.spec.whatwg.org/multipage/).

Here's how to do that. In XXE_install_dir/addon/config/xhtml/xsd/5/xhtml5.xsd replace:

---
  <xs:element name="body">
    <xs:complexType>
      <xs:complexContent>
        <xs:extension base="flowContentElement">
          <xs:attributeGroup ref="bodyEventHandlerAttributes"/>
    </xs:extension>
      </xs:complexContent>
    </xs:complexType>
  </xs:element>
---

by:

---
  <xs:element name="body">
    <xs:complexType>
      <xs:sequence minOccurs="0" maxOccurs="unbounded">
        <xs:group ref="flowContent"/>
      </xs:sequence>
      <xs:attributeGroup ref="globalAttributes"/>
      <xs:attributeGroup ref="bodyEventHandlerAttributes"/>
    </xs:complexType>
  </xs:element>

First reaction: Would it not also be possible to apply some kind of XSLT (or CSS) in order to simply set text nodes that are direct child of <body> to display:none in the editor?

Second reaction: Thanks for this solution. I think it would make sense to somehow include this solution as an option in the XHTML preferences. As it is easy to forget such solutions. Here are some good reasons to do implement such a thing:

1. For a structured XML editor like XXE, regardless of what the HTML5
   spec permits, there is hardly ever a reason for XXE’s users to add
   text as direct children of <body>, <article> or <section>, <header>
   or<footer>.
2. It would make XXE users stop cursing the Universe (or XXE or XhtML
   rendering rules) for those empty lines that pops up.
3. There would be one less reason to discuss XXE’s source code formatting.
4. XXE’s current behavior has several times caused us to type text (and
   not just space characters) as direct children of the <body> elements
   (and other elements where such direct text nodes are illogical).
   Why? Because it is difficult to spot that you are actually typing a
   direct child text node, instead of typing inside a <p> element.

I would make the above XSD variant (with support for <section>, <article> etc) the default, and offer users to permit text nodes as direct children, if they explitly ask for it.

Leif Halvard Silli
--
XMLmind XML Editor Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to