On Fri, 30 Dec 2011 05:51:16 +0100, Leif Halvard Silli <xn--mlform-iua@målform.no> wrote:
The Trident cache behaviour is a symptom of its over all UTF-16
behaviour: Apart from reading the BOM, it doesn't do any UTF-16
sniffing. I suspect that you want Opera/Firefox to become "as bad" at
'getting' the UTF-16 encoding as Webkit/IE are? (Note that Webkit is
worse than IE - just to, once again, emphasize how difficult it is to
replicate IE.)

How is WebKit worse than IE? And why should there be UTF-16 sniffing?


But is the little endian defaulting really important?
Over all, proper UTF-16 treatment (read: sniffing) on IE/WEbkit's part,
would probably improve the situation more.

You mean there are sites that only work in Gecko/Presto?


I know ... And it precisely therefore that it would have been an
advantage to, for the Web, focus on *requiring* the BOM for UTF-16.

It seems simpler to focus on promoting only UTF-8.


Yeah, I'm going to file a new bug so we can reconsider although the octet sequence the various BOMs represent can have legitimate meanings in
certain encodings,

You mean: In addition to the BOM meaning, I suppose.

No. In e.g. windows-1258 there is no BOM and FF FE simply means U+00FF U+20AB.


it seems in practice people use them for Unicode.
(Helped by the fact that Trident/WebKit behave this way of course.)

Don't forget the fact that Presto/Gecko do not move the BOM into the
<body> when you use UTF-16LE/BE, like they - per the spec of those
encodings - should do. See:
<http://bugzilla.validator.nu/show_bug.cgi?id=890>

Well yes, that's why I'm planning to define utf-16 more in line with implementations (and render the current text obsolete I suppose).


--
Anne van Kesteren
http://annevankesteren.nl/

Reply via email to