Tom Hummel a écrit :
Scite also doesn't recognize utf8 encoded files when there's no BOM. I'd
like to see Scite to switch to UTF-8 mode when it detects a UTF-8 code
point while loading which is outside of the ASCII range. This would make
working with utf8 encoded text much much easier.
I am not a Unicode guru, but I have just read a discussion on the issues
with UTF-8 and other encodings in the Lua mailing list.
One fact that came out is that there is no way, by design, to detect
automatically UTF-8 encoding, if there is neither a Bom nor some cookie
like Python's:
# -*- encoding: utf-8 -*-
or
# coding: utf-8
or some other shbang-like code (eg. XML's code).
Why? Because it is 8-bit clean, ie. it never uses codes below 32. So if
you detect some char combinations looking like UTF-8 code point, it can
be actually plain Ascii text. OK, that's unlikely, but not impossible
(hey, I can write in my ISO-8859-1 code page: "en UTF-8, À est écrit À"!)
Now, if you know a good and reliable way to automatically detect UTF-8,
please share with us.
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
_______________________________________________
Scite-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scite-interest