[scite] Re: UTF-8 Encoding and BOM handling

Philippe Lhoste Thu, 14 Dec 2006 20:06:24 -0800

Tom Hummel a écrit :

Scite also doesn't recognize utf8 encoded files when there's no BOM. I'd
like to see Scite to switch to UTF-8 mode when it detects a UTF-8 code
point while loading which is outside of the ASCII range. This would make
working with utf8 encoded text much much easier.

I am not a Unicode guru, but I have just read a discussion on the issueswith UTF-8 and other encodings in the Lua mailing list.One fact that came out is that there is no way, by design, to detectautomatically UTF-8 encoding, if there is neither a Bom nor some cookielike Python's:

# -*- encoding: utf-8 -*-
or
# coding: utf-8
or some other shbang-like code (eg. XML's code).

Why? Because it is 8-bit clean, ie. it never uses codes below 32. So ifyou detect some char combinations looking like UTF-8 code point, it canbe actually plain Ascii text. OK, that's unlikely, but not impossible(hey, I can write in my ISO-8859-1 code page: "en UTF-8, À est écrit Ã€"!)Now, if you know a good and reliable way to automatically detect UTF-8,please share with us.


--
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --

_______________________________________________
Scite-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scite-interest

[scite] Re: UTF-8 Encoding and BOM handling

Reply via email to