Re: Corrigendum #9

David Starner Wed, 02 Jul 2014 11:22:26 -0700

On Wed, Jul 2, 2014 at 8:02 AM, Karl Williamson <[email protected]> wrote:
> In
> UTF-8, an example would be that Sun, I'm told, and for reasons I've
> forgotten or never knew, did not want raw NUL bytes to appear in text
> streams, so used the overlong sequence \xC0\x80 to represent them; overlong
> sequences generally being considered "bad" because they could be used to
> insert malicious payloads into the input.


In C, NUL ends a string. If you have to run data that may have NUL
characters through C functions, you can't store the NULs as \0. I
might argue 11111111b for 0x00 in UTF-8 would be technically
legal--the standard never specifies which bit sequences correspond to
which byte values--but \xC0\x80 would probably be more reliably
processed by existing code.

-- 
Kie ekzistas vivo, ekzistas espero.
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Corrigendum #9

Reply via email to