This thread, was inspired by exactly that. Someone point me to this page, using it as "proof" that modified UTF-8 is an acceptable thing to do.
While you are well aware, the users aren't. I think it would be a good idea to add a small note saying that this feature is going to be changed in future versions of Java, or perhaps Deprecated, due to its incompatibility. Just a small note, on that page and similar pages, with the phrase "This will be deprecated in the future because it currently contradicts the standard behaviour"... that would make a *huge* difference.
That aside.
I'm just curious about the \0 thing. What problems would having a \0 in UTF-8 present, that are not presented by having \0 in ASCII? I can't see any advantage there.
The only advantage I can imagine, would be using UTF-8 for storing \0 in places that previously weren't possible. To me, that sounds like a strange way to add a feature.
On 12 Nov 2004, at 23:58, A. Vine wrote:
FYI, we are well aware of this shortcoming (modified UTF-8), and with each release try to mitigate it even further. The problem is that it is so deep in the code (note that it is since Java 1.0) that it is not easy to eliminate without breaking a lot of existing stuff, something that the Java team strive to avoid.
Theodore H. Smith wrote:
http://java.sun.com/j2se/1.5.0/docs/api/java/io/ DataInput.html#modified-utf-8
If only people could sue for suggesting bad coding practices ;o)
--
Theodore H. Smith - Software Developer.
http://www.elfdata.com

