https://en.wikipedia.org/wiki/List_of_Unicode_characters#Control_codes
Even the Control codes within unicode aren't FF.

U+009C 156 String Terminator ST
literal bytes \xC2\x9c  are string terminator ... Was thinking that like
APC and ST were higher than that... more in the range of 0xF8-0xFF



On Thu, Jan 25, 2018 at 7:57 PM, J Decker <d3c...@gmail.com> wrote:

> NUL is a valid utf8 character
> but FF is never valid.  (would be like a 36 bit length specification)
> and practically anthing more than F8 is invalid utf8 character.
> Other than BOM
> https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
> EF BB BF 239 187 191
>
> // EF - 80 | 3b - 80 | 3f
> ( 0xfeff  )
>
>
> Many Windows <https://en.wikipedia.org/wiki/Microsoft_Windows> programs
> (including Windows Notepad
> <https://en.wikipedia.org/wiki/Notepad_(Windows)>) add the bytes 0xEF,
> 0xBB, 0xBF at the start of any document saved as UTF-8. Th
>
> (Not that BOM is even required, because, it's already ordered bytes)
> ----------
> But anYway FF could be used as a string terminator instead of 00.  It is
> never legal in any utf-8 sequence.
> (F8,F9,FA,FB,FC,FD,FE,FF)
> F8 would be a 5 byte encoding, but that is more code points than unicode
> has allocated.  It could be potentially useful to permit a little extra
> space in sequences , so I would avoid F8(F9,FA,FB) and stick to FC-FF for
> possible control characters.
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to