>Also it seems to me when ContentType in a html page is "unicode", IE tends to >understand it as UTF16LE. So it seems UTF16LE is (or was) the standard coding for >unicode.
Just because IE does something doesn't mean it's the standard. The whole world doesn't run IE. The legal content types are listed here: <http://www.iana.org/assignments/character-sets>; in practice, the vast majority of those shouldn't be used. Unicode is not a legal context type. UTF-16BE, UTF-16LE or UTF-16 (all as specified in RFC2781 <ftp://ftp.rfc-editor.org/in-notes/rfc2781.txt>) are the acceptable names for UTF-16 content; UTF-8 is also legal, and usable.(Sadly, many of the other names in the file are ill-defined and/or useless. Of the other Unicode names, UTF-7, SCSU, BOCU-1 and UTF-32* are useful in limited contexts; the rest you should pretend don't exist. (csUnicode exists, but is UCS2-BE, and shouldn't be used.)) >Is it that, when people say "unicode" without UTF, they mean UTF16LE? If people just say "unicode", you can't assume any encoding form. If a Unix guy says "unicode", he's probably thinking UTF-8 or UTF-32. If you mean an encoding, include one; if they don't include one, ask. >I am going to design a website with unicode. I don't use UTF-8 because most are CJK >text thus UTF-8 html would be too fat. I should use UTF16LE, should I? UTF-16LE - so labeled, hyphen and all - is a perfectly acceptable encoding, as would be UTF-16BE. It's probably irrelevant which you use.

