Tom Hummel a écrit :
Scite also doesn't recognize utf8 encoded files when there's no BOM. I'd
like to see Scite to switch to UTF-8 mode when it detects a UTF-8 code
point while loading which is outside of the ASCII range. This would make
working with utf8 encoded text much much easier.

I am not a Unicode guru, but I have just read a discussion on the issues with UTF-8 and other encodings in the Lua mailing list. One fact that came out is that there is no way, by design, to detect automatically UTF-8 encoding, if there is neither a Bom nor some cookie like Python's:
# -*- encoding: utf-8 -*-
or
# coding: utf-8
or some other shbang-like code (eg. XML's code).
Why? Because it is 8-bit clean, ie. it never uses codes below 32. So if you detect some char combinations looking like UTF-8 code point, it can be actually plain Ascii text. OK, that's unlikely, but not impossible (hey, I can write in my ISO-8859-1 code page: "en UTF-8, À est écrit À"!) Now, if you know a good and reliable way to automatically detect UTF-8, please share with us.

--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --

_______________________________________________
Scite-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scite-interest

Reply via email to