Philippe Lhoste wrote:
Neil Hodgson a écrit :
Since it looks like most people see the current Encodings menu as
unclear then, unless there are some more opinions, I will change the
File | Encodings menu to replace 8 Bit -> Default, UTF-8 -> UTF-8 with
BOM, and UTF-8 Cookie -> UTF-8.
Default is ambiguous, no? In Windows XP, it could be UTF-16.
BTW, why do you use UCS-2 instead of UTF-16? I know they are not the
same thing, although the exact nuances escape me, but I thought UCS-2
was a bit outdated.
I don't necessarily mean you should change this terminology, but I would
be happy to get an explanation to improve my knowledge. ;-)
All the Unicode 3.x/4.x stuff is downloadable, I browsed through
all of it the other day. Unicode 5.x will be released online this
month, according to the Unicode website. It's very educational, a
very good bit of reading.
From the Unicode documentation, I gather UCS-2 to mean
ISO-10646's BMP (Basic Multilingual Plane) profile, which limits
the codeset to the (approx) 16-bit codespace. For UTF-16, in order
to track Unicode's implementation, surrogate pairs are now being
defined, which means there are well-defined code pairs of 2 octets
that encodes a character beyond the BMP. So UCS-2 as defined in
ISO-10646 would not support surrogate pairs. The Unicode
documentation a pretty clear discussion about BOMs as well.
As for what Win2000/XP supports, I'm not sure... I haven't seen
news that says surrogate pairs are supported. Markus Kuhn has some
definitive documents online, which will probably be closer to
actual practice, as opposed to the Unicode documents, but I
haven't read them in a long time.
--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia
_______________________________________________
Scite-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scite-interest