Zhang Weiwu <weiwuzhang at hotmail dot com> asked: > Is it that, when people say "unicode" without UTF, they mean UTF16LE?
and Jungshik Shin <jshin at mailaps dot org> responded: > No, UTF-16LE is just one of many Unicode transformation form(at)s. > Each UTF has its own pros and cons and you have to choose > whatever is appropriate for your own need. but I'm not sure that answered the question Weiwu was really asking. It is true that when Windows and other Microsoft products refer to "Unicode," without qualification, they usually mean UTF-16 little-endian. (Note that "UTF-16 little-endian" is not technically the same as "UTF-16LE"; the former implies the presence of a BOM while the latter implies that none is present.) Despite this Microsoft convention, however, it is not true that "Unicode" automatically means UTF-16, of any type. This was once the case -- as late as TUS 3.0, we were told that "Plain Unicode text consists of sequences of 16-bit character codes" (p. 12) -- but it is no longer true. UTF-8 and UTF-32 are now on equal footing with UTF-16. If you do include a BOM, I don't see any reason you can't send little-endian UTF-16 down the line. The "preference" of big-endian UTF-16 over little-endian has to do with the assumption to be made when no BOM is present. When there is a BOM, no assumptions are necessary; software should interpret text as BE or LE depending on the byte orientation of the BOM. (BTW, I thought Weiwu's so-called "newbie question" was much better expressed and demonstrated better understanding of Unicode than many non-newbie questions I have seen on this list.) -Doug Ewell Fullerton, California

