Philippe Lhoste wrote:
Neil Hodgson a écrit :
  Since it looks like most people see the current Encodings menu as
unclear then, unless there are some more opinions, I will change the
File | Encodings menu to replace 8 Bit -> Default, UTF-8 -> UTF-8 with
BOM, and UTF-8 Cookie -> UTF-8.

Default is ambiguous, no? In Windows XP, it could be UTF-16.
BTW, why do you use UCS-2 instead of UTF-16? I know they are not the same thing, although the exact nuances escape me, but I thought UCS-2 was a bit outdated. I don't necessarily mean you should change this terminology, but I would be happy to get an explanation to improve my knowledge. ;-)

All the Unicode 3.x/4.x stuff is downloadable, I browsed through all of it the other day. Unicode 5.x will be released online this month, according to the Unicode website. It's very educational, a very good bit of reading.

From the Unicode documentation, I gather UCS-2 to mean ISO-10646's BMP (Basic Multilingual Plane) profile, which limits the codeset to the (approx) 16-bit codespace. For UTF-16, in order to track Unicode's implementation, surrogate pairs are now being defined, which means there are well-defined code pairs of 2 octets that encodes a character beyond the BMP. So UCS-2 as defined in ISO-10646 would not support surrogate pairs. The Unicode documentation a pretty clear discussion about BOMs as well.

As for what Win2000/XP supports, I'm not sure... I haven't seen news that says surrogate pairs are supported. Markus Kuhn has some definitive documents online, which will probably be closer to actual practice, as opposed to the Unicode documents, but I haven't read them in a long time.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

_______________________________________________
Scite-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scite-interest

Reply via email to