Re: newbie: unicode (when used as a coding) = UTF16LE?

Markus Scherer Thu, 13 Feb 2003 10:31:38 -0800

UTF-16, UTF-16BE, and UTF-16LE are charset names that are registered with the IANA. See http://www.iana.org/assignments/character-sets

They are formally defined in RFC 2781 (e.g. ftp://ftp.rfc-editor.org/in-notes/rfc2781.txt)

UTF-32* are defined in UAX #19, as Doug wrote, and are also IANA-registered charset names.

markus

Doug Ewell wrote:

Jungshik Shin <jshin at mailaps dot org> wrote:

Note that "UTF-16 little-endian" is not technically the
same as "UTF-16LE"; the former implies the presence of a BOM while
the latter implies that none is present.)

 Where does this distinction come from?

The sources I checked were UTR #17, "Character Encoding Model," and UAX
#19, "UTF-32."  The latter does not specifically talk about UTF-16BE or
UTF-16LE, but uses the same definitions to distinguish UTF-32, UTF-32BE,
and UTF-32LE that we are using here.


Mark Davis can probably point you to other sources as well.

Re: newbie: unicode (when used as a coding) = UTF16LE?

Reply via email to