>> Null bytes aren't always "terminators". You can embed null bytes into data 
>> and still
>> want to do utf8 processing with it.
>
> that's questionable ... the desire to have ASCII NUL in utf-8
> sequences (without breaking the "utf-8 sequences are usable as c
> strings" property) is the main reason for the existence of "modified
> utf-8".

Admittedly, that’s the first time I’ve heard of "modified utf-8". There seems 
to be different flavors for every language (the Java one seems to be the most 
prominent) which means not everyone is gonna use it. Because there is no 
standard

Still, U+0000 is a valid code point, and having a special case especially for 
it that isn’t mentioned but you have to watch out for is either a bug or a 
documentation error.

— Oliver Webb <[email protected]>
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to