Hi Boris,

thanks for the feedback! Comments inline.

Boris Zbarsky wrote:
...
More precisely, what Gecko does here is to take the raw byte string and byte-inflate it (by setting the high byte of each 16-bit code unit to 0 and the low byte to the corresponding byte of the given byte string) before returning it to JS.

This happens to more or less match "decoding as ISO-8859-1", but not quite.
...

Not quite?

...
 From HTTP's point of view, the header field value really is opaque. So
you can put there anything, as long as it fits into the header field ABNF.

True; what does that mean for converting header values to 16-bit code units in practice? Seems like byte-inflation might be the only reasonable thing to do...
...

It at least preserves all the information that was there and would allow a caller to re-decode as UTF-8 as a separate step.

Of course that only helps if senders and receivers agree on the
encoding.

True, but "encoding" here needs to mean more than just "encoding of Unicode", since one can just stick random byte arrays, within the ABNF restrictions, in the header, right?

Yes.

Right now there is no interoperable encoding, so the best thing to do in APIs that use character sequences instead of octets is to preserve as much information as possible.

It would be nice if we could find out whether anybody relies on the current implementation. Maybe switch it back to byte inflation in Mozilla trunk?

Best regards, Julian


Reply via email to