John Wilson wrote: > The problem with allowing arbitrary encoding is that the writer has no > idea of what the mapping of code Unicode code point to character > encoding is. i.e. there is no way of answering the question "I have a > Unicode code point with value X can I represent that directly in > encoding Y?" If the answer to this question is "NO" then it has to emit > a character reference.
For version 3, I have code ready that checks the presence of Java 1.4. It that is available, an instance of Charset is being queried. > For maximum interoperability I would suggest we use UTF-8 but use > character references for all values > 0X7F. This means that even if the > other end gets the encoding wrong it will still almost certainly > understand the characters. If the other end does not understand > character encodings it will be very easy to see what the problem is > (which is not quite so easy to do if it mistakes UTF-8 for ISO8859-1, > for example) That is, as far as I can say, what Daniels proposed patch does. Jochen