On Sun, Jan 19, 2020 at 12:20:18AM -0500, Paul Procacci wrote: > On Sun, Jan 19, 2020 at 12:12 AM yary <[email protected]> wrote: > > > In UTF-16 every character is 16 bits, so all 8 bits of zeros tells you is > > that it's possibly a big-endian ascii character or a little-endian > > non-ascii character at a position divisible by 256. All zeros U+0000 is > > unicode NULL, which the windows UTF-16 C convention uses to terminate the > > string. > > Perfect. Obviously didn't know that. My assumption that only the first > byte gets checked was obviously wrong.
It is correct if you're talking about UTF-8, not UTF-16 :)
G'luck,
Peter
--
Peter Pentchev roam@{ringlet.net,debian.org,FreeBSD.org} [email protected]
PGP key: http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
signature.asc
Description: PGP signature
