[ProgressEvents] How to deal with compressed transfer encodings

2010-11-23 Thread Jonas Sicking
Hi All,

How should ProgressEvents deal with compressed transfer encodings? The
problem is that the Content-Length header (if I understand things
correctly) contains the encoded number of bytes, so we don't have
access to the total number of bytes which will be exposed to the user
until it's all downloaded. I can see several solutions:

A) Set total to 0, and loaded to the number of decompressed bytes
downloaded so far
B) Set total to the contents of the Content-Length header and
loaded the number of compressed bytes downloaded so far
C) Like A, but also expose a percentage downloaded which is based on
the compressed data

B seems spec-wise the simplest, but at least gecko doesn't expose the
compressed number of bytes downloaded, not sure about other HTTP
libraries. It also has the downside that .loaded doesn't match
.responseText.length

C seems the most confusing for authors and the one I like the least.

/ Jonas



Re: [ProgressEvents] How to deal with compressed transfer encodings

2010-11-23 Thread Anne van Kesteren

On Tue, 23 Nov 2010 22:41:00 +0100, Jonas Sicking jo...@sicking.cc wrote:

How should ProgressEvents deal with compressed transfer encodings? The
problem is that the Content-Length header (if I understand things
correctly) contains the encoded number of bytes, so we don't have
access to the total number of bytes which will be exposed to the user
until it's all downloaded. I can see several solutions:

A) Set total to 0, and loaded to the number of decompressed bytes
downloaded so far
B) Set total to the contents of the Content-Length header and
loaded the number of compressed bytes downloaded so far
C) Like A, but also expose a percentage downloaded which is based on
the compressed data

B seems spec-wise the simplest, but at least gecko doesn't expose the
compressed number of bytes downloaded, not sure about other HTTP
libraries. It also has the downside that .loaded doesn't match
.responseText.length


When compression does not come into play they will only match for certain  
encoding / byte streams anyway. E.g. for a UTF-8 encoded character stream  
with characters that take up more than one byte they will not match. I  
think it should be B.




C seems the most confusing for authors and the one I like the least.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: [ProgressEvents] How to deal with compressed transfer encodings

2010-11-23 Thread Bjoern Hoehrmann
* Anne van Kesteren wrote:
On Tue, 23 Nov 2010 22:41:00 +0100, Jonas Sicking jo...@sicking.cc wrote:

 A) Set total to 0, and loaded to the number of decompressed bytes
 downloaded so far
 B) Set total to the contents of the Content-Length header and
 loaded the number of compressed bytes downloaded so far
 C) Like A, but also expose a percentage downloaded which is based on
 the compressed data

When compression does not come into play they will only match for certain  
encoding / byte streams anyway. E.g. for a UTF-8 encoded character stream  
with characters that take up more than one byte they will not match. I  
think it should be B.

That is what the draft already requires, if by compressed Jonas means
you remove all transfer encodings but retain the content encodings, and
you set .total to zero if the total length is not specified. (There are
even more layers of compression to consider if you don't speak plain
HTTP but, say, HTTP over TLS, since TLS has its own compression layer;
that would be removed aswell under the current draft.)
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: [ProgressEvents] How to deal with compressed transfer encodings

2010-11-23 Thread Bjoern Hoehrmann
* Jonas Sicking wrote:
How should ProgressEvents deal with compressed transfer encodings? The
problem is that the Content-Length header (if I understand things
correctly) contains the encoded number of bytes, so we don't have
access to the total number of bytes which will be exposed to the user
until it's all downloaded. I can see several solutions:

Well, you have some information, you encode that using a media type,
then you possibly encode that using a content encoding, and then you
possibly encode that using a transfer encoding. HTTP uses transfer
encodings for both message framing (chunked) and transformations,
they are property of the transfer, while content encodings are part
of the content.

I would suggest to ask this question in terms of what .loaded should
be when the download has finished. Should that be how much data has
been recieved after the header, or how much data has been recieved
except for framing information, or what the content developes thinks
the size is, or how many bytes you will ultimately feed to, say, the
HTML parser.

That would be respectively the length of the message body, the length
of the message body after removing the chunked transfer encoding, the
length of the entity body, and the length of the entity body after
removing content encodings. Note that you can apply compression as
both content encoding and as transfer encoding, although the latter
is only supported by good HTTP implementations, like Opera's, but hey,
https://bugzilla.mozilla.org/show_bug.cgi?id=68517 isn't ten years old
yet.

I note that the draft actually defines this already, and I am pretty
sure we discussed this already back in the day.

B seems spec-wise the simplest, but at least gecko doesn't expose the
compressed number of bytes downloaded, not sure about other HTTP
libraries. It also has the downside that .loaded doesn't match
.responseText.length

Well, to get to the length of the content in terms of UTF-16 code
units you have to remove transfer encodings, content encodings, and
transcode from whatever character encoding the content is in to said
UTF-16 code units, that's yet another layer and not a useful one in
most cases here.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: [ProgressEvents] How to deal with compressed transfer encodings

2010-11-23 Thread Boris Zbarsky

On 11/23/10 9:31 PM, Bjoern Hoehrmann wrote:

That is what the draft already requires, if by compressed Jonas means
you remove all transfer encodings but retain the content encodings


This is actually ambiguous, since the near-total lack of server and UA 
support for Transfer-Encoding: gzip means that Content-Encoding: 
gzip is used to mean both transfer and content encoding (well, 
sometimes it also just means my server is misconfigured Apache, but I 
assume UAs already deal with this, by and large).


-Boris