On Thu, 2005-05-05 at 02:35 +0200, Jochen Wiedmann wrote: >Daniel Rall wrote: ... >What does "invalid as un-encoded XML" mean? Not being within the >encodings character set?
I was referring to characters which had not been entity-encoded using references like < or �xffff;. >If so, the range 0x20 to 0xff is quite arbitrarily and not even valid in >all cases. For example, it fails for "US-ASCII" encoding. In other >words, to me this wasn't good. This change was only intended to catch characters invalid in XML, which it did an incomplete job of. ... >- Choose UTF-8 as the encoding; that means, that only very few > characters ('<', for example) has to be escaped. Ideally speaking, this option also strikes me as the cleanest. Sadly, the reality is that there are a lot of old XML-RPC clients and servers out there in production, and that we could only offer this behavior as a non-default configuration toggle. >- Choose US-ASCII as the encoding. In other words, escape everything > beyond 0x7f. John Wilson also made this suggestion. Given the very real inter-op concerns we have to live with, I propose that this be the default behavior. >- Invent a new interface and let the user decide, for example: > > public class XmlRpcEncoder { > String getEncoding(); > boolean isEscaped(char pChar); > } Not to over-engineer things, I also envisioned this type of solution to implement the UTF-8 toggle discussed above.