On Thu, 13 Feb 2003, Zhang Weiwu wrote: > Very newbie question: > 1) I noticed when I save a file as "unicode" in Windows 2000, or > other editor like EditPlus, the file begins with FF FE, which looks > like UTF16LE. Also it seems to me when ContentType in a html page is > "unicode", IE tends to understand it as UTF16LE. So it seems UTF16LE is > (or was) the standard coding for unicode.
What Windows or IE does not make anything more standard-compliant than it actually is. For Windows and MS IE running on intel x86 machines, it may be pretty natural to use UTF-16LE, but that does not hold for other architecture/OS combinations. > 2) But on the FAQ on unicode.org, it says UTF16BE is the prefered > unicode coding. > > Is it that, when people say "unicode" without UTF, they mean UTF16LE? No, UTF-16LE is just one of many Unicode transformation form(at)s. Each UTF has its own pros and cons and you have to choose whatever is appropriate for your own need. > I am going to design a website with unicode. I don't use UTF-8 because > most are CJK text thus UTF-8 html would be too fat. I should use UTF16LE, > should I? Whatever UTF youdecide to use, the only thing you have to take care of is to label/mark it in a standard compliant-way. If you want to use UTF-16LE, you should make sure that your web server emits the correct http header with C-T as following: (note that meta tag in the beg. of html files don't work well for UTF-16/UTF-32) Content-Type: text/html; charset=UTF-16LE On top of that, you may wish to put BOM at teh very beg. of your UTF-16LE html files although that's not necessary with the correct C-T http header as above. BTW, you MUST NOT use 'charset=unicode' assuming that it'll be interpreted as 'utf-16le'. See http://www.i18nguy.com/unicode and http://jshin.net/i18n/utftest Jungshik

