Every few years it seems that this subject blossoms on the list. Remember that this stuff was done a long time ago. A variant of UTF-8 was devised by the Java people that would allow them to *losslessly* convert between String and a representation that C could handle. And losslessly means that since U+0000 is legal in String, it had to be representable anywhere in the C string. This was done very early in the development of Java, even before there was an internationalization group in Javasoft.
The only real problem with this was that they simply called this UTF-8 at that time. They have since documented, in response to requests by the Unicode Consortium, that this is a modified, variant UTF-8. It is worked in too heavily into the structure of Java for them to do much beyond documenting, and I really haven't heard of real cases where this has caused a problem. I doubt that any further discussion of this will be productive. âMark

