Re: [PATCH] characters invalid for an encoding

John Wilson Fri, 06 May 2005 07:01:30 -0700


On 6 May 2005, at 12:03, Jochen Wiedmann wrote:


For version 3, I have code ready that checks the presence of Java 1.4.
It that is available, an instance of Charset is being queried.

Yes that works fine - I'm too used to living with the need to support J2ME. I forget the nice things in 1.4 :)

For maximum interoperability I would suggest we use UTF-8 but use character references for all values > 0X7F. This means that even if the other end gets the encoding wrong it will still almost certainly understand the characters. If the other end does not understand character encodings it will be very easy to see what the problem is (which is not quite so easy to do if it mistakes UTF-8 for ISO8859-1, for example)
That is, as far as I can say, what Daniels proposed patch does.

Yes It would appear to do this. However it also seems to emit invalid XML code points as character references (e.g. the NULL character would be emitted as ). I do not believe that the XML spec allows this. I believe that these code points cannot appear in a well formed document in any form. The intent is to allow the consuming application to be 100% sure it never sees these characters.


John Wilson
The Wilson Partnership
http://www.wilson.co.uk

Re: [PATCH] characters invalid for an encoding

Reply via email to