Her I am with a controversial subject, once again :-)

In XXE (as of version 10.12) indentation of HTML differs from that other formats. For instance,  this is how XXE indents the following DITA code:

  <body>
    <p>Foo</p>
    <p>Bar</p>
  </body>

And XXE also renders the above DITA code “normally”. Whereas for the HTML, if the above code had been a piece of HTML code, then XXE would have rendered an empty line – AKA line break – between each of the paragraph elements. Irritating and visually confusing!

Hence, XXE’s seeming difficulty with rendering HTML, must be the reason why XXE would in fact not have indented the above code, had it in been HTML. Because, in that case, the “indentation” would have looked like this:

  <body><p>Foo</p><p>Bar</p></body>

So far so good. However, given XXE’s difficulties with rendering HTML, I think that there is a glitch in how XXE *indents* HTML. Because, what happens if the user indents each child element above, in the following manner – or if XXE opens a file with the following HTMLcode?

  <body>
    <p>Foo</p>
    <p>Bar</p>
  </body>

This happens: When the user saves the document, the indents are converted to a single space character *between* (but not before [for some reason] or after ...) each child element:

<body><p>Foo</p> <p>Bar</p></body>

In XXE, the result of those space characters between the elements, is that it looks as if there is a empty line before eacg <p> element.

It is as if the body element had <body xml:space="preserve"> (with the exception that it would not apply to *before* the first child element, and *after* the last child element).

For me/us, after having worked on a document for some while, these empty line sooner or later pops up, and make the document look «funny» inside XXE. It has often caused me to remove whitespace in the code, just to satify XXE’s demands with regard to how to render things. A waste or time, really.

So why does XXE do this? What is the benefit of replacing the indents with a single space character? And is it even compatible with the specs?

It is one thing that Web browser would not have any trouble with that space character - they would not render it, whereas XXE will render it. But why does XXE insert the space character there in the first place, when it is only itself that has trouble with it?

For instance, MDN discusse the generic xml:space [1] attribute and says that in the default state (which I assume is equalt to just skipping the entire xml:space attribute), the following happens:

1. All newline characters are removed.
2. All tab characters are converted into space characters.
3. All leading and trailing space characters are removed.
4. All contiguous space characters are collapsed into a single space
   character.

If XXE, instead of the current behavior, had followed the steps described by MDN, the result would be *no* space character between the child elements (since, by step 3, all the space characters would have been removed). Or to put it diferently, by following those steps, we would be back to the initial one line *without* space between the child elements.

Again, XXE does *not* promise to save the whitespace infrastructure. I accept that. But why not, for HTML, at least «destroy» the indentation in a way that is compatible both with (default) XML whitespace treatment as well with the way XXE’s rendering works?

By the way: The above does not describe in entirety how XXE indents HTML. For certain elements, notably lists and the table element (and the child elements they are made up of), line breaks between elements are preserverd. Also the current indentation, as described above, is meaningful if one replaces the <p> elements with inline elements, e.g with <span> elements (because, then the space between elements is usually meaningful).

Solutions: I think there are 2 or 3 ways to solve the problem here:

    (1) Change the way XXE renderes whitespace in HTML.
    (2) Or change the way XXE indents HTML.
    (3) Or a combination of both.

I would be fine with either solution. But given that XXE already indents different HTML elements in different ways, how about introducing even more nuance, by implementing xml:space='default' compatible indentation for child elements that are block elements, and retain the current behavior for child elements that are inline elements?

May be it would not give us perferct results, but it should at least become *more* perfect, and solve some common and irritating cases.

[1] https://developer.mozilla.org/en-US/docs/Web/SVG/Reference/Attribute/xml:space

Leif Halvard Silli


--
XMLmind XML Editor Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to