Edit report at http://bugs.php.net/bug.php?id=44164&edit=1
ID: 44164
Comment by: panczel dot levente at groware dot hu
Reported by: mplomer at gmx dot de
Summary: Handle "Content-Length" HTTP header when
zlib.output_compression active
Status: Assigned
Type: Bug
Package: *General Issues
Operating System: *
PHP Version: 5.2.5
Assigned To: cataphract
Block user comment: N
Private report: N
New Comment:
Thanks, you are absolutely right pointing at my error: my suggestion
would not work in situations where a Content-Length header was mandatory
or referenced uncompressed body length. The partial response 206, as I
understand, doesnât make Content-Length mandatory. In fact the last
line might be omitted from your example and that is still a valid
response. But since Content-Length is not mandatory in this case either,
I think my thesis still works.
I have not found any explicit remarks in the specification on how
offsets and Content-Encoding should interact. As I see now all fields
are about the document-entity (the one that the script handles and knows
well) except for Content-Encoding and Content-Length fields which are
about the representation of the message body. So Content-Length always
shows the decimal number of octets transferred in the message bodyâs
final byte-stream, and Content-Encoding has to be reversed before other
processing (like matching it to the requested rangeâs size) takes
place.
For all response types where Content-Length is mandatory, I agree with
you, that compression should be turned off (possibly after trying to fit
in the initial 1 buffer that I think is allocated anyways). But we know
that in case of response 200 it is not mandatory, and as I see, for 206
neither. So at least these responses could follow my thesis (and any
others currently do not require a Content-Length field).
> The problem is the zlib.output_compression is not presented as an
output handler that rewrites the response and creates a new entity. It
is presented as an inoffensive performance option that compresses the
output for better performance.
Yes, it rewrites the response; but no, it does not create a new entity.
I think thatâs just what
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11
references by âwithout losing the identity of its underlying media
typeâ. So to send a compressed body, one just has to adjust the
Content-Encoding field and take care that Content-Length is not invalid.
I feel that changing these headers isnât more intrusive than altering
body octets, since they do not affect other content and headers in the
message, except for Transfer-Encoding which I suppose that zlib
compression correctly adjusts to. I think chunked Transfer-Encoding is
relevant for two reasons. If received from the script, it has to be
assembled before compression. And it might be used to maintain
persistent connections (e.g. 1 compressed buffer in each chunk) where
compression was not able to tell the Content-Length in advance.
Please understand that Iâm not pushing for any of these features, just
think that this topic still has potential for inspiring improvement and
finding rare bugs.
Previous Comments:
------------------------------------------------------------------------
[2010-12-17 16:35:30] [email protected]
> Thatâs an error. Both scripts set the correct CL (that they know
very well),
> just the way the specification says they SHOULD. I donât agree that
it would
> be the responsibility of the script to counteract the setting (zlib
output
> compression in this case) of the executing framework (PHP in this
case). If
> the scripts should take care for every such situation then using the
header()
> would be completely illegal, because a future output handler might
interact
> with the output in such a way that invalidates the headers set. This
isnât a
> portable phylosophy since it implicitly requires the script being
aware of
> every aspects of plugins and settings in PHP.
> In fact it is the zlib output handler that was setting the wrong CL
header (by
> not removing the deprecated one). As I see, the handler is
constructing a new
> response entity instead the one it receives from the script; the
consistency of
> this response is entirely the responsibility of the handler. As I
understand
> this has now been patched so that the handler always removes the CL
header, and
> by that it assures correctness. Note: hereâs no refutation of the
correctness
> of the patched handler.
The problem is the zlib.output_compression is not presented as an output
handler that rewrites the response and creates a new entity. It is
presented as an inoffensive performance option that compresses the
output for better performance. And it does so, generally, without the
express assent of the programmer. The programmer can always use
ob_gzhandler to force compression.
Your thesis is that the output handler should not be deactivated;
instead it ought to remove the old header and write a new one, whenever
possible. This looks good. But consider this script:
if (empty($_SERVER["HTTP_RANGE"])) {
$offset = 0;
}
else { //violates rfc2616, which demands ignoring the header if invalid
preg_match("/^bytes=(\d+)-/i",$_SERVER["HTTP_RANGE"], $matches);
if (empty($matches[1]))
$offset = 0;
if (is_num_int($matches[1]) && $matches[1] < $filesize &&
$matches[1]>=0) {
$offset = $matches[1];
if (@fseek($fp,$offset,SEEK_SET) != 0)
InternalError();
header("HTTP/1.1 206 Partial Content");
header("Content-Range: bytes $offset-".($filesize -
1)."/$filesize");
}
elseif ($matches[1] > $filesize) {
header("HTTP/1.1 416 Requested Range Not Satisfiable");
die();
}
else $offset = 0;
}
$conlen = $filesize - $offset;
header("Content-Length: $conlen");
This is no way this script can work correctly under the zlib handler.
206 responses must have a content-length and the offsets are calculated
through the uncompressed size, while under zlib that should be
calculated under the compressed size, which is obviously impossible to
know without first compressing the file.
So actually the only option is to disable the zlib output handler.
------------------------------------------------------------------------
[2010-12-15 01:25:19] panczel dot levente at groware dot hu
Sorry for not being clear enough, let me explain! To put things simple
Iâll use two examples: [A] the one above with the 8K âAâ
characters and the following [B]:
<?php header(âContent-Length: 0â); ?>
> The problem is not the existence of a Content-length header
I never wrote that its existence would be a problem. On the contrary: I
think its correct presence is desirable wherever possible (most possible
requests and most possible layers of the runtime environment).
> it's the fact that you're setting a content-length header indicating a
size you cannot possibly know
Thatâs an error. Both scripts set the correct CL (that they know very
well), just the way the specification says they SHOULD. I donât agree
that it would be the responsibility of the script to counteract the
setting (zlib output compression in this case) of the executing
framework (PHP in this case). If the scripts should take care for every
such situation then using the header() would be completely illegal,
because a future output handler might interact with the output in such a
way that invalidates the headers set. This isnât a portable phylosophy
since it implicitly requires the script being aware of every aspects of
plugins and settings in PHP.
In fact it is the zlib output handler that was setting the wrong CL
header (by not removing the deprecated one). As I see, the handler is
constructing a new response entity instead the one it receives from the
script; the consistency of this response is entirely the responsibility
of the handler. As I understand this has now been patched so that the
handler always removes the CL header, and by that it assures
correctness. Note: hereâs no refutation of the correctness of the
patched handler.
> Apache already adds a Content-length header when it can (i.e. for
small responses), it's not necessary PHP does this
Didnât mean to suggest it would be necessary. It just yields better
performance (if the cost of generating the CL is not high).
> sending it on every compressed response is unpractical because it
would require buffering the entire response
Not for every compressed response; that would be impossible e.g. for
live streams. But on the other hand ALWAYS discarding CL is the worst
one among the correct solutions. Consider example [B]: I imagine that
the script has already finished once the handler receives control, thus
it is able to see that its input (from the script) is already closed. In
this case it does not have to use buffers or make heavy computations: by
skipping compression entirely everything is set, and a correct CL is
transmitted. Iâll get back to this.
> I suppose you can always
Yes, one can always make patches to avoid specific errors that a buggy
RTE produces. I just hope thereâs no software engineer who sees this
as a reason against fixing a bug.
Now back to example [B]. I see itâs not a common use case, but I think
it sheds light on other problems too. Letâs distinguish
administrators, who control webserver (or other environment) and PHP
settings but must not edit application code, and software designers who
have to create a versatile PHP application that can be run on any
platform efficiently without having influence on the specific settings
of the platform. So developer doesnât say âplease turn compression
offâ and admin doesnât add some new lines of code to the script.
Letâs assume the zlib handler has a small buffer (probably the one it
already has and is configurable with the value of
zlib.output_compression). The handler initially fills compressed output
into this buffer. If it has to flush the buffer before input EOF then it
clears CL flushes and replaces itself with the compressor-component in
the stream-chain (or does any other thing it does now to compress the
response body). Otherwise it computes and sets the correct CL and sends
the compressed body of the response.
In this manner correct applications can be written that have the benefit
of using CL without having to care for whether zlib is enabled; software
designers can rest assured that their code is good and runs efficient.
Admin can switch zlib on/off as he sees fit: he will neither break the
served apps, nor cripple their performance. And even better: when admin
sees that 1% of the responses is <4K (the default zlib buffer size) 98%
is between 4K and 20K and only 1% is >20K, he can just go âWhy
wouldnât I sacrifice that 16K/request RAM to have Content-Length
almost always sent to the client in contract to the current habit of
almost never sending?!â ⦠and how right he would be. As you can see
this solution is not only bright for the 0-long [B], not only to the
8K-long [A] but possibly for any environment, since sticking with the
defaults gives a good tradeoff while maintenance personnel has the
opportunity to fine-tune this behavior without adding modifications or
posing constraint to the code.
The answer showed that my previous post wasnât verbose enough to
express my opinion: striving towards such quality solutions as sketched
in the last part _might_ be a better option than choosing the simplest
solution (as the current one is). And Iâm pretty sure that the ones
who wrote the zlib handler can think of solutions that are both more
elegant and more efficient and provide the Web with as many correct CL
headers as possible.
------------------------------------------------------------------------
[2010-12-14 04:59:20] [email protected]
Sorry for the mess; I was betrayed by the browser's autocomplete.
------------------------------------------------------------------------
[2010-12-14 04:58:30] [email protected]
> Our projects make heavy use of Content-Length. Disabling it
unnecessarily is
> costly on networks with large RTT.
The problem is not the existence of a Content-length header, it's the
fact that you're setting a content-length header indicating a size you
cannot possibly know. A wrong Content-length header is worse than none.
Apache already adds a Content-length header when it can (i.e. for small
responses), it's not necessary PHP does this; sending it on every
compressed response is unpractical because it would require buffering
the entire response. If you need this, I suppose you can always
explicitly start the zlib output handler and call ob_get_length.
------------------------------------------------------------------------
[2010-12-14 04:57:52] [email protected]
> Our projects make heavy use of Content-Length. Disabling it
unnecessarily is
> costly on networks with large RTT.
The problem is not the existence of a Content-length header, it's the
fact that you're setting a content-length header indicating a size you
cannot possibly know. A wrong Content-length header is worse than none.
Apache already adds a Content-length header when it can (i.e. for small
responses), it's not necessary PHP does this; sending it on every
compressed response is unpractical because it would require buffering
the entire response. If you need this, I suppose you can always
explicitly start the zlib output handler and call ob_get_length.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/bug.php?id=44164
--
Edit this bug report at http://bugs.php.net/bug.php?id=44164&edit=1