Re: XHR LC comment: header encoding

Julian Reschke Mon, 04 Jan 2010 08:44:52 -0800

Hi Boris,

thanks for the feedback! Comments inline.


Boris Zbarsky wrote:

...
More precisely, what Gecko does here is to take the raw byte string andbyte-inflate it (by setting the high byte of each 16-bit code unit to 0and the low byte to the corresponding byte of the given byte string)before returning it to JS.
This happens to more or less match "decoding as ISO-8859-1", but not quite.
...


Not quite?

...
 From HTTP's point of view, the header field value really is opaque. So
you can put there anything, as long as it fits into the header fieldABNF.
True; what does that mean for converting header values to 16-bit codeunits in practice? Seems like byte-inflation might be the onlyreasonable thing to do...
...

It at least preserves all the information that was there and would allowa caller to re-decode as UTF-8 as a separate step.

Of course that only helps if senders and receivers agree on the
encoding.
True, but "encoding" here needs to mean more than just "encoding ofUnicode", since one can just stick random byte arrays, within the ABNFrestrictions, in the header, right?


Yes.

Right now there is no interoperable encoding, so the best thing to do inAPIs that use character sequences instead of octets is to preserve asmuch information as possible.

It would be nice if we could find out whether anybody relies on thecurrent implementation. Maybe switch it back to byte inflation inMozilla trunk?


Best regards, Julian

Re: XHR LC comment: header encoding

Reply via email to