>> Null bytes aren't always "terminators". You can embed null bytes into data
>> and still
>> want to do utf8 processing with it.
>
> that's questionable ... the desire to have ASCII NUL in utf-8
> sequences (without breaking the "utf-8 sequences are usable as c
> strings" property) is the main reason for the existence of "modified
> utf-8".
Admittedly, that’s the first time I’ve heard of "modified utf-8". There seems
to be different flavors for every language (the Java one seems to be the most
prominent) which means not everyone is gonna use it. Because there is no
standard
Still, U+0000 is a valid code point, and having a special case especially for
it that isn’t mentioned but you have to watch out for is either a bug or a
documentation error.
— Oliver Webb <[email protected]>
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net