Re: newbie: unicode (when used as a coding) = UTF16LE?

starner Wed, 12 Feb 2003 22:38:11 -0800

>Also it seems to me when ContentType in a html page is "unicode", IE tends to 
>understand it as UTF16LE. So it seems UTF16LE is (or was) the standard coding for 
>unicode.


Just because IE does something doesn't mean it's the standard. The whole
world doesn't run IE. The legal content types are listed here:
<http://www.iana.org/assignments/character-sets>; in practice, the vast
majority of those shouldn't be used. Unicode is not a legal context type.
UTF-16BE, UTF-16LE or UTF-16 (all as specified in RFC2781 
<ftp://ftp.rfc-editor.org/in-notes/rfc2781.txt>) are the acceptable 
names for UTF-16 content; UTF-8 is also legal, and usable.(Sadly, many 
of the other names in the file are ill-defined and/or useless. Of the 
other Unicode names, UTF-7, SCSU, BOCU-1 and UTF-32* are useful in 
limited contexts; the rest you should pretend don't exist. (csUnicode
exists, but is UCS2-BE, and shouldn't be used.))

>Is it that, when people say "unicode" without UTF, they mean UTF16LE?

If people just say "unicode", you can't assume any encoding form. If a
Unix guy says "unicode", he's probably thinking UTF-8 or UTF-32. If you
mean an encoding, include one; if they don't include one, ask.

>I am going to design a website with unicode. I don't use UTF-8 because most are CJK 
>text thus UTF-8 html would be too fat. I should use UTF16LE, should I? 

UTF-16LE - so labeled, hyphen and all - is a perfectly acceptable encoding,
as would be UTF-16BE. It's probably irrelevant which you use.

Re: newbie: unicode (when used as a coding) = UTF16LE?

Reply via email to