Re: [whatwg] [encoding] utf-16

Anne van Kesteren Wed, 28 Dec 2011 07:14:09 -0800

On Wed, 28 Dec 2011 12:30:49 +0100, Leif Halvard Silli<xn--mlform-iua@målform.no> wrote:

I spotted a shortcoming in your testing:

I ran some utf-16 tests using 007A as input data, optionally preceded by
FFFE or FEFF, and with utf-16, utf-16le, and utf-16be declared in the
Content-Type header. For WebKit I tested both Safari 5.1.2 and Chrome

17.0.963.12. Trident is Internet Explorer 9 on Windows 7. Presto isOpera

11.60. Gecko is Nightly 12.0a1 (2011-12-26).


HTTP      BOM   Trident  WebKit  Gecko  Presto
utf-16    -     7A00     7A00    007A   007A
utf-16le  -     7A00     7A00    7A00   7A00
utf-16be  -     007A     007A    007A   007A


The above test row is not complete. You should also run a BOM-less test
using the UTF-16 label but where the 007A is represented in the
big-endian way - a bit like I did here:
<http://malform.no/testing/utf/#html-table-7>. The you get as result
that Opera and Firefox do not take it for a given that files sent as
'utf-16' are big-endian:

  utf-16    -     gibb*    gibb*   007A   007A

*gibb = gibberish/mojibake.

I get U+7A00 as I indicated above. I would not qualify that as gibberishpersonally. (My table is somewhat confusing as input 007A was meant todescribe octets, but the table describes code points.)

Anyway, perhttp://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-July/021102.htmlPresto and Gecko do have some magic, but it seems better if they were thesame as Trident (and WebKit).

That the BOM is removed from the output for utf-16be labelled files,
means that the 'utf-16be' labelled file nevertheless is treated as
UTF-16 (per UTF-16's specification). (Otherwise, if it had not been
removed, the BOM character should have caused quirks mode.)

Taking what you did not test for into account, it would make sense if
'utf-16' continues to be treated as a label under which both big-endian
and litt-endian can be expected. And thus, that Webkit and IE starts to
detect when UTF-16 is big-endian, but without a BOM.


I am not sure what you are trying to say here.


--
Anne van Kesteren
http://annevankesteren.nl/

Re: [whatwg] [encoding] utf-16

Reply via email to