[ProgressEvents] How to deal with compressed transfer encodings
Hi All, How should ProgressEvents deal with compressed transfer encodings? The problem is that the Content-Length header (if I understand things correctly) contains the encoded number of bytes, so we don't have access to the total number of bytes which will be exposed to the user until it's all downloaded. I can see several solutions: A) Set total to 0, and loaded to the number of decompressed bytes downloaded so far B) Set total to the contents of the Content-Length header and loaded the number of compressed bytes downloaded so far C) Like A, but also expose a percentage downloaded which is based on the compressed data B seems spec-wise the simplest, but at least gecko doesn't expose the compressed number of bytes downloaded, not sure about other HTTP libraries. It also has the downside that .loaded doesn't match .responseText.length C seems the most confusing for authors and the one I like the least. / Jonas
Re: [ProgressEvents] How to deal with compressed transfer encodings
On Tue, 23 Nov 2010 22:41:00 +0100, Jonas Sicking jo...@sicking.cc wrote: How should ProgressEvents deal with compressed transfer encodings? The problem is that the Content-Length header (if I understand things correctly) contains the encoded number of bytes, so we don't have access to the total number of bytes which will be exposed to the user until it's all downloaded. I can see several solutions: A) Set total to 0, and loaded to the number of decompressed bytes downloaded so far B) Set total to the contents of the Content-Length header and loaded the number of compressed bytes downloaded so far C) Like A, but also expose a percentage downloaded which is based on the compressed data B seems spec-wise the simplest, but at least gecko doesn't expose the compressed number of bytes downloaded, not sure about other HTTP libraries. It also has the downside that .loaded doesn't match .responseText.length When compression does not come into play they will only match for certain encoding / byte streams anyway. E.g. for a UTF-8 encoded character stream with characters that take up more than one byte they will not match. I think it should be B. C seems the most confusing for authors and the one I like the least. -- Anne van Kesteren http://annevankesteren.nl/
Re: [ProgressEvents] How to deal with compressed transfer encodings
* Anne van Kesteren wrote: On Tue, 23 Nov 2010 22:41:00 +0100, Jonas Sicking jo...@sicking.cc wrote: A) Set total to 0, and loaded to the number of decompressed bytes downloaded so far B) Set total to the contents of the Content-Length header and loaded the number of compressed bytes downloaded so far C) Like A, but also expose a percentage downloaded which is based on the compressed data When compression does not come into play they will only match for certain encoding / byte streams anyway. E.g. for a UTF-8 encoded character stream with characters that take up more than one byte they will not match. I think it should be B. That is what the draft already requires, if by compressed Jonas means you remove all transfer encodings but retain the content encodings, and you set .total to zero if the total length is not specified. (There are even more layers of compression to consider if you don't speak plain HTTP but, say, HTTP over TLS, since TLS has its own compression layer; that would be removed aswell under the current draft.) -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [ProgressEvents] How to deal with compressed transfer encodings
* Jonas Sicking wrote: How should ProgressEvents deal with compressed transfer encodings? The problem is that the Content-Length header (if I understand things correctly) contains the encoded number of bytes, so we don't have access to the total number of bytes which will be exposed to the user until it's all downloaded. I can see several solutions: Well, you have some information, you encode that using a media type, then you possibly encode that using a content encoding, and then you possibly encode that using a transfer encoding. HTTP uses transfer encodings for both message framing (chunked) and transformations, they are property of the transfer, while content encodings are part of the content. I would suggest to ask this question in terms of what .loaded should be when the download has finished. Should that be how much data has been recieved after the header, or how much data has been recieved except for framing information, or what the content developes thinks the size is, or how many bytes you will ultimately feed to, say, the HTML parser. That would be respectively the length of the message body, the length of the message body after removing the chunked transfer encoding, the length of the entity body, and the length of the entity body after removing content encodings. Note that you can apply compression as both content encoding and as transfer encoding, although the latter is only supported by good HTTP implementations, like Opera's, but hey, https://bugzilla.mozilla.org/show_bug.cgi?id=68517 isn't ten years old yet. I note that the draft actually defines this already, and I am pretty sure we discussed this already back in the day. B seems spec-wise the simplest, but at least gecko doesn't expose the compressed number of bytes downloaded, not sure about other HTTP libraries. It also has the downside that .loaded doesn't match .responseText.length Well, to get to the length of the content in terms of UTF-16 code units you have to remove transfer encodings, content encodings, and transcode from whatever character encoding the content is in to said UTF-16 code units, that's yet another layer and not a useful one in most cases here. -- Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [ProgressEvents] How to deal with compressed transfer encodings
On 11/23/10 9:31 PM, Bjoern Hoehrmann wrote: That is what the draft already requires, if by compressed Jonas means you remove all transfer encodings but retain the content encodings This is actually ambiguous, since the near-total lack of server and UA support for Transfer-Encoding: gzip means that Content-Encoding: gzip is used to mean both transfer and content encoding (well, sometimes it also just means my server is misconfigured Apache, but I assume UAs already deal with this, by and large). -Boris