Re: New Charakter Proposal

Markus Scherer Fri, 01 Nov 2002 15:35:47 -0800

David Starner wrote:

Chances are nearly 100% that overlong UTF-8 was a spoofing attempt, or the result of something other than a UTF-8 encoder.

With the exception of overlong sequences for null (C0 80?), which Java
generates in an attempt to avoid true nulls.

I am aware of this one. This encoding is not UTF-8, however - it is more like CESU-8 with a 2-byte encoding for NUL. Even if some documentation claims this to be UTF-8, it isn't, and a conformant UTF-8 decoder must reject byte sequences from this beast that don't belong in UTF-8 - and the same for a CESU-8 decoder.

This rather proves my point above.

markus

Re: New Charakter Proposal

Reply via email to