Edit report at https://bugs.php.net/bug.php?id=44164&edit=1
ID: 44164
Comment by: daniel at code-emitter dot com
Reported by: mplomer at gmx dot de
Summary: Handle "Content-Length" HTTP header when
zlib.output_compression active
Status: Assigned
Type: Bug
Package: *General Issues
Operating System: *
PHP Version: 5.2.5
Assigned To: cataphract
Block user comment: N
Private report: N
New Comment:
FYI: This issue is still causing problems.
http://tracker.phpbb.com/browse/PHPBB3-10648
Previous Comments:
------------------------------------------------------------------------
[2010-12-17 19:18:15] panczel dot levente at groware dot hu
Thanks, you are absolutely right pointing at my error: my suggestion would not
work in situations where a Content-Length header was mandatory or referenced
uncompressed body length. The partial response 206, as I understand, doesnât
make Content-Length mandatory. In fact the last line might be omitted from your
example and that is still a valid response. But since Content-Length is not
mandatory in this case either, I think my thesis still works.
I have not found any explicit remarks in the specification on how offsets and
Content-Encoding should interact. As I see now all fields are about the
document-entity (the one that the script handles and knows well) except for
Content-Encoding and Content-Length fields which are about the representation
of the message body. So Content-Length always shows the decimal number of
octets transferred in the message bodyâs final byte-stream, and
Content-Encoding has to be reversed before other processing (like matching it
to the requested rangeâs size) takes place.
For all response types where Content-Length is mandatory, I agree with you,
that compression should be turned off (possibly after trying to fit in the
initial 1 buffer that I think is allocated anyways). But we know that in case
of response 200 it is not mandatory, and as I see, for 206 neither. So at least
these responses could follow my thesis (and any others currently do not require
a Content-Length field).
> The problem is the zlib.output_compression is not presented as an output
> handler that rewrites the response and creates a new entity. It is presented
> as an inoffensive performance option that compresses the output for better
> performance.
Yes, it rewrites the response; but no, it does not create a new entity. I think
thatâs just what
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11 references by
âwithout losing the identity of its underlying media typeâ. So to send a
compressed body, one just has to adjust the Content-Encoding field and take
care that Content-Length is not invalid. I feel that changing these headers
isnât more intrusive than altering body octets, since they do not affect
other content and headers in the message, except for Transfer-Encoding which I
suppose that zlib compression correctly adjusts to. I think chunked
Transfer-Encoding is relevant for two reasons. If received from the script, it
has to be assembled before compression. And it might be used to maintain
persistent connections (e.g. 1 compressed buffer in each chunk) where
compression was not able to tell the Content-Length in advance.
Please understand that Iâm not pushing for any of these features, just think
that this topic still has potential for inspiring improvement and finding rare
bugs.
------------------------------------------------------------------------
[2010-12-17 16:35:30] [email protected]
> Thatâs an error. Both scripts set the correct CL (that they know very well),
> just the way the specification says they SHOULD. I donât agree that it would
> be the responsibility of the script to counteract the setting (zlib output
> compression in this case) of the executing framework (PHP in this case). If
> the scripts should take care for every such situation then using the header()
> would be completely illegal, because a future output handler might interact
> with the output in such a way that invalidates the headers set. This isnât a
> portable phylosophy since it implicitly requires the script being aware of
> every aspects of plugins and settings in PHP.
> In fact it is the zlib output handler that was setting the wrong CL header (by
> not removing the deprecated one). As I see, the handler is constructing a new
> response entity instead the one it receives from the script; the consistency
> of
> this response is entirely the responsibility of the handler. As I understand
> this has now been patched so that the handler always removes the CL header,
> and
> by that it assures correctness. Note: hereâs no refutation of the
> correctness
> of the patched handler.
The problem is the zlib.output_compression is not presented as an output
handler that rewrites the response and creates a new entity. It is presented as
an inoffensive performance option that compresses the output for better
performance. And it does so, generally, without the express assent of the
programmer. The programmer can always use ob_gzhandler to force compression.
Your thesis is that the output handler should not be deactivated; instead it
ought to remove the old header and write a new one, whenever possible. This
looks good. But consider this script:
if (empty($_SERVER["HTTP_RANGE"])) {
$offset = 0;
}
else { //violates rfc2616, which demands ignoring the header if invalid
preg_match("/^bytes=(\d+)-/i",$_SERVER["HTTP_RANGE"], $matches);
if (empty($matches[1]))
$offset = 0;
if (is_num_int($matches[1]) && $matches[1] < $filesize && $matches[1]>=0) {
$offset = $matches[1];
if (@fseek($fp,$offset,SEEK_SET) != 0)
InternalError();
header("HTTP/1.1 206 Partial Content");
header("Content-Range: bytes $offset-".($filesize - 1)."/$filesize");
}
elseif ($matches[1] > $filesize) {
header("HTTP/1.1 416 Requested Range Not Satisfiable");
die();
}
else $offset = 0;
}
$conlen = $filesize - $offset;
header("Content-Length: $conlen");
This is no way this script can work correctly under the zlib handler. 206
responses must have a content-length and the offsets are calculated through the
uncompressed size, while under zlib that should be calculated under the
compressed size, which is obviously impossible to know without first
compressing the file.
So actually the only option is to disable the zlib output handler.
------------------------------------------------------------------------
[2010-12-15 01:25:19] panczel dot levente at groware dot hu
Sorry for not being clear enough, let me explain! To put things simple Iâll
use two examples: [A] the one above with the 8K âAâ characters and the
following [B]:
<?php header(âContent-Length: 0â); ?>
> The problem is not the existence of a Content-length header
I never wrote that its existence would be a problem. On the contrary: I think
its correct presence is desirable wherever possible (most possible requests and
most possible layers of the runtime environment).
> it's the fact that you're setting a content-length header indicating a size
> you cannot possibly know
Thatâs an error. Both scripts set the correct CL (that they know very well),
just the way the specification says they SHOULD. I donât agree that it would
be the responsibility of the script to counteract the setting (zlib output
compression in this case) of the executing framework (PHP in this case). If the
scripts should take care for every such situation then using the header() would
be completely illegal, because a future output handler might interact with the
output in such a way that invalidates the headers set. This isnât a portable
phylosophy since it implicitly requires the script being aware of every aspects
of plugins and settings in PHP.
In fact it is the zlib output handler that was setting the wrong CL header (by
not removing the deprecated one). As I see, the handler is constructing a new
response entity instead the one it receives from the script; the consistency of
this response is entirely the responsibility of the handler. As I understand
this has now been patched so that the handler always removes the CL header, and
by that it assures correctness. Note: hereâs no refutation of the correctness
of the patched handler.
> Apache already adds a Content-length header when it can (i.e. for small
> responses), it's not necessary PHP does this
Didnât mean to suggest it would be necessary. It just yields better
performance (if the cost of generating the CL is not high).
> sending it on every compressed response is unpractical because it would
> require buffering the entire response
Not for every compressed response; that would be impossible e.g. for live
streams. But on the other hand ALWAYS discarding CL is the worst one among the
correct solutions. Consider example [B]: I imagine that the script has already
finished once the handler receives control, thus it is able to see that its
input (from the script) is already closed. In this case it does not have to use
buffers or make heavy computations: by skipping compression entirely everything
is set, and a correct CL is transmitted. Iâll get back to this.
> I suppose you can always
Yes, one can always make patches to avoid specific errors that a buggy RTE
produces. I just hope thereâs no software engineer who sees this as a reason
against fixing a bug.
Now back to example [B]. I see itâs not a common use case, but I think it
sheds light on other problems too. Letâs distinguish administrators, who
control webserver (or other environment) and PHP settings but must not edit
application code, and software designers who have to create a versatile PHP
application that can be run on any platform efficiently without having
influence on the specific settings of the platform. So developer doesnât say
âplease turn compression offâ and admin doesnât add some new lines of
code to the script. Letâs assume the zlib handler has a small buffer
(probably the one it already has and is configurable with the value of
zlib.output_compression). The handler initially fills compressed output into
this buffer. If it has to flush the buffer before input EOF then it clears CL
flushes and replaces itself with the compressor-component in the stream-chain
(or does any other thing it does now to compress the response body). Otherwise
it computes and sets the correct CL and sends the compressed body of the
response.
In this manner correct applications can be written that have the benefit of
using CL without having to care for whether zlib is enabled; software designers
can rest assured that their code is good and runs efficient. Admin can switch
zlib on/off as he sees fit: he will neither break the served apps, nor cripple
their performance. And even better: when admin sees that 1% of the responses is
<4K (the default zlib buffer size) 98% is between 4K and 20K and only 1% is
>20K, he can just go âWhy wouldnât I sacrifice that 16K/request RAM to have
Content-Length almost always sent to the client in contract to the current
habit of almost never sending?!â ⦠and how right he would be. As you can
see this solution is not only bright for the 0-long [B], not only to the
8K-long [A] but possibly for any environment, since sticking with the defaults
gives a good tradeoff while maintenance personnel has the opportunity to
fine-tune this behavior without adding modifications or posing constraint to
the code.
The answer showed that my previous post wasnât verbose enough to express my
opinion: striving towards such quality solutions as sketched in the last part
_might_ be a better option than choosing the simplest solution (as the current
one is). And Iâm pretty sure that the ones who wrote the zlib handler can
think of solutions that are both more elegant and more efficient and provide
the Web with as many correct CL headers as possible.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
https://bugs.php.net/bug.php?id=44164
--
Edit this bug report at https://bugs.php.net/bug.php?id=44164&edit=1