Alexander Dupuy wrote:
> 
> Although it's not explicitly documented (that I could see), the XML
> source indentation algorithm used by XXE appears to turn on text fill
> for (and disable line breaks before children in) all elements that can
> have #PCDATA, whether they contain it or not.
> 
> This generally works well, but it breaks down a bit for some of our XML
> formats, which use modular doctypes, where an extension element is
> defined to contain ANY, and an external parameter entity provides the
> DTD subset for the elements that are contained within the extension
> element.  Using this mechanism, new DTD pieces can be composed to
> support new data structures without modifying the existing DTDs.  The
> attached DTDs and XML files demonstrate how this works.
> 
> The problem is that because the <extwrapper> element is defined as ANY,
> it can contain #PCDATA, although in practice it never does.  As a
> result, the contents of <extwrapper>s are filled even though they never
> contain #PCDATA.  As you can see in the example.xml, this can lead to
> very long lines, since there may not be any whitespace for breaking
> lines (after stripping superfluous whitespace).
> 
> Ideally, text fill would only be turned on if there were actually text
> present in the ANY item (e.g. the third <extwrapper> in example.xml).
> But if that's too tricky, I would make a case that although ANY can
> contain #PCDATA, in practice it rarely does, and that therefore elements
> with ANY syntax should not be treated as though they might contain
> #PCDATA (for indentation/text fill purposes only).

I'm sorry to have to answer no to your very clear and very
well-documented feature request.

What you request could have been implemented like this:
* When loading a document with a DTD, guess which types of ANY elements
are in fact child-elements-only (as opposed to ANY elements which are in
fact ANY MIXED) and in such case, trim superfluous whitespace.
* When saving a document with a DTD, guess which types of ANY elements
are in fact child-elements-only and in such case, do not use the ``text
fill'' style of output.

The main reason to say no is not that the implementation described above
is a lot of work but that, in fact, this implementation is a hack.
Rarely used feature or not, the XML recommendation says that ANY means
that the element can contain zero or more child elements of any declared
type, as well as character data, whitespace or not.

Reply via email to