On Fri, 19 Apr 2002, Tom Gewecke wrote: > > With BOM at the beginning, Netscape 4.x, Netscape 6.x/Mozilla and MS > >IE 5.x/6.x can handle them without much problem except that support > >for characters above BMP varies from browser to browser as Tex tried to > >demonstrate in his test pages.
The autodetection of UTF-16(LE|BE) and UTF-32(LE|BE) doesn't seem to be as robust as I thought. I should have conducted more extensive experiment. I just tried UTF-16LE and UTF-16BE pages with BOM (http://jshin.net/i18n/utftest/bom_utf16le.html and http://jshin.net/i18n/utftest/bom_utf16be.html) at the beginning with Netscape 6/Mozilla, MS IE 6 and Netscape 4.x. It appeared to work fine but when I reloaded the page or went back and forth. Sometimes, the autodetection doesn't work as well as I thought. > Thanks for the info! Do you know of any other utf-16 pages on the web for > testing? I did a lot of searching and could not find any (except a case > where the links were broken). As you wrote and I agreed, it's not so good an idea to put up a web page in UTF-16/UTF-32 and that's why you can't find any. > I'm using Mac OS X and it can read utf-16 ok normally, but not Texin's, > perhaps because of the "endianness." I believe his page is LE and utf-16 > html should be BE. But my understanding of that issue is VERY limited... Well, recently Mark et al went to a great length on the issue.... Anyway, I just put up a set of test pages. There are 20 combinations: - BOM or BOMless - Big endian or Little endian - UTF-16 or UTF-32 - If the http server emits C-T type header with MIME charset parameter as below: Content-Type: text/html; charset=UTF-32LE and if so, whether or not with 'BE|LE' at the end of MIME charset name. My test pages don't have yet characters beyond BMP(I just recycled a page I made a long time ago for Korean testing) . I may later add them. (Tex, can I use your sample page? I'd rather put up a page with some content instead of just a list of characters.) Jungshik Shin

