Re: [whatwg] [encoding] utf-16

Anne van Kesteren Fri, 30 Dec 2011 12:44:09 -0800

On Fri, 30 Dec 2011 05:51:16 +0100, Leif Halvard Silli<xn--mlform-iua@målform.no> wrote:

The Trident cache behaviour is a symptom of its over all UTF-16
behaviour: Apart from reading the BOM, it doesn't do any UTF-16
sniffing. I suspect that you want Opera/Firefox to become "as bad" at
'getting' the UTF-16 encoding as Webkit/IE are? (Note that Webkit is
worse than IE - just to, once again, emphasize how difficult it is to
replicate IE.)


How is WebKit worse than IE? And why should there be UTF-16 sniffing?

But is the little endian defaulting really important?
Over all, proper UTF-16 treatment (read: sniffing) on IE/WEbkit's part,
would probably improve the situation more.


You mean there are sites that only work in Gecko/Presto?

I know ... And it precisely therefore that it would have been an
advantage to, for the Web, focus on *requiring* the BOM for UTF-16.


It seems simpler to focus on promoting only UTF-8.

Yeah, I'm going to file a new bug so we can reconsider although theoctet sequence the various BOMs represent can have legitimate meaningsin
certain encodings,
You mean: In addition to the BOM meaning, I suppose.

No. In e.g. windows-1258 there is no BOM and FF FE simply means U+00FFU+20AB.

it seems in practice people use them for Unicode.
(Helped by the fact that Trident/WebKit behave this way of course.)


Don't forget the fact that Presto/Gecko do not move the BOM into the
<body> when you use UTF-16LE/BE, like they - per the spec of those
encodings - should do. See:
<http://bugzilla.validator.nu/show_bug.cgi?id=890>

Well yes, that's why I'm planning to define utf-16 more in line withimplementations (and render the current text obsolete I suppose).



--
Anne van Kesteren
http://annevankesteren.nl/

Re: [whatwg] [encoding] utf-16

Reply via email to