[sqlite] UTF8 and NUL

J Decker Thu, 25 Jan 2018 19:58:23 -0800

NUL is a valid utf8 character
but FF is never valid.  (would be like a 36 bit length specification)
and practically anthing more than F8 is invalid utf8 character.
Other than BOM
https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
EF BB BF 239 187 191


// EF - 80 | 3b - 80 | 3f
( 0xfeff  )


Many Windows <https://en.wikipedia.org/wiki/Microsoft_Windows> programs
(including Windows Notepad <https://en.wikipedia.org/wiki/Notepad_(Windows)>)
add the bytes 0xEF, 0xBB, 0xBF at the start of any document saved as UTF-8.
Th

(Not that BOM is even required, because, it's already ordered bytes)
----------
But anYway FF could be used as a string terminator instead of 00.  It is
never legal in any utf-8 sequence.
(F8,F9,FA,FB,FC,FD,FE,FF)
F8 would be a 5 byte encoding, but that is more code points than unicode
has allocated.  It could be potentially useful to permit a little extra
space in sequences , so I would avoid F8(F9,FA,FB) and stick to FC-FF for
possible control characters.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] UTF8 and NUL

Reply via email to