Re: [PATCH] characters invalid for an encoding

Jochen Wiedmann Fri, 06 May 2005 04:04:00 -0700

John Wilson wrote:

> The problem with allowing arbitrary encoding is that the writer has  no
> idea of what the mapping of code Unicode code point to character 
> encoding is. i.e. there is no way of answering the question "I have a 
> Unicode code point with value X can I represent that directly in 
> encoding Y?" If the answer to this question is "NO" then it has to  emit
> a character reference.


For version 3, I have code ready that checks the presence of Java 1.4.
It that is available, an instance of Charset is being queried.


> For maximum interoperability I would suggest we use UTF-8 but use 
> character references for all values > 0X7F. This means that even if  the
> other end gets the encoding wrong it will still almost certainly 
> understand the characters. If the other end does not understand 
> character encodings it will be very easy to see what the problem is 
> (which is not quite so easy to do if it mistakes UTF-8 for ISO8859-1, 
> for example)

That is, as far as I can say, what Daniels proposed patch does.


Jochen

Re: [PATCH] characters invalid for an encoding

Reply via email to