Hi Henri,
Henri Sivonen wrote:
On Feb 10, 2009, at 22:26, Henry S. Thompson wrote:
> And there's good reason for that: XML actually _is_ usable by
> authors and authoring well-formed XML is _not_ hard.
However, writing XML-outputting software whose output is always well-
formed even in the case of malicious input is hard.
> b) points to a piece of broken _software_;
[..]
> one article that points to a page in which someone trying to
> introduce an _intentional_ markup error made the wrong error.
> And there's good reason for that: XML actually _is_ usable by
> authors and authoring well-formed XML is _not_ hard.
However, writing XML-outputting software whose output is always well-
formed even in the case of malicious input is hard.
> b) points to a piece of broken _software_;
[..]
> one article that points to a page in which someone trying to
> introduce an _intentional_ markup error made the wrong error.
It is a pretty significant problem if an attacker can intentionally
introduce a markup error into a system so that the administrator of
the system is denied service when trying to use a browser-based UI for
managing the system (and all other users are denied service, too).
> Hardly a compelling set of evidence that well-formed XML is too hard
> for ordinary mortals.
So far Philip Taylor (the author of
http://lists.w3.org/Archives/Public/www-archive/2009Feb/0058.html
) has found well-formedness holes in every XML-outputting system he
has cared to try.
He even managed to make Validator.nu produce ill-formed output. The
bug was in the Xalan serializer--a widely distributed library written
by experts. (Astral characters were serialized as two numeric
character references for the corresponding surrogates.)
I have to say this is severely overstating things. It is not clear
from the XML recommendation that such surrogate pairs are not
permitted. Several of the XML parsers I'm familiar with support that.
I wouldn't be surprised to hear about others that do not support that
and even trigger fatal errors, but could you point out some.
The XML recommendation says:
Well-formedness constraint: Legal Character
Characters referred to using character references must match the
production for Char.
While this clearly means that � is not permitted it would be
fair to say that a pair of character references that were a valid pair
of surrogates would match the production for Char. If the
recommendation instead said "A character referred to using a character
reference must match the production for Char., then you would have a
stronger case. If this is indeed the only error you've found, then I
would say you haven't yet found an error. Or do you have a different
part of the recommendation you're reading that makes this a well-
formedness error?
Take care,
Rob