Doug Ewell said:
instead of overloading the string type.  Strings are for text.  Text
does not need nulls.

Nulls are legal Unicode characters, also for use in plain text and since ever in ASCII, and all ISO 8-bit charset standards. Why do you want that a legal Unicode string containing NULL (U+0000) *characters* become illegal when converted to C strings?


A null *CHARACTER* is valid in C string, because C does not mandate the string encoding (which varies according to locale conventions at run-time).

It just assigns a special role to the null *BYTE* as a end-of-string terminator.

There are many reasons why one would want to store null *characters* in C strings, using a proper escaping mechanism (a transport syntax like the transformation of 00 generated by UTF-8, into C080) or an encoding scheme (UTF-8 does not fit here, one needs another scheme like the Sun modified version).

And I don't consider this to be "broken" encoding. It's just another encoding, fully compatible with Unicode *and* with C string conventions. Using pure UTF-8 in C strings would not be conforming to either Unicode or C conventions because it will illegitimately restrict the legal embedding of U+0000 in strings...





Reply via email to