On 08/03/2010 03:27 AM, Amos Jeffries wrote:
Tsantilas Christos wrote:
Added %http::<bs and %icap::<bs logformat codes to HTTP and ICAP body
sizes received from the next HTTP hop or the ICAP server.

Logging "received message body" is useful because Squid may receive a
lot more or a lot less than it serves to the client or than the
original resource size, which may happen when handling Range requests
and partial responses, when adapting bodies, and for other reasons.

For HTTP, we define "received message body" as message body bytes that
Squid stores, merges, adapts, and/or forwards. In most cases, they are
the same as body bytes sent by the server to Squid. However, the two
bodies may differ for reasons such as errors (where the start of the
body was not found), HTTP transfer encodings (where Squid strips
chunked encoding to find the message body), and generated FTP
directory listings (that were received in a completely different
format on a control connection).

For ICAP, the "received message body" is the Encapsulated sections,
after the encapsulated HTTP body, if any, is dechunked.



src/Server.cc:

ServerStateData::addVirginReplyBody(const char *data, ssize_t len)
{
+ adjustBodyBytesRead(+len);

whats with the +?

Hi Amos,

I will respond to this specific question because the plus sign is my fault. The adjustBodyBytesRead() function accepts zero, negative, and positive deltas. The plus emphasizes that this specific delta is [probably] positive. I believe the posted patch version does not even use negative deltas, making the plus sign even more confusing. We will remove the plus sign.


* I've been a bit uncomfortable since the beginning about the
de-chunking happening before the body bytes are accounted. This
introduces some loss into the protocol accounting apparently being done.
Are those lots bytes accounted for as headers? or is the total reply
bytes value going to be larger than the sum of headers+body ?
Do we stop at de-chunking? what about un-zipping? etc...

It's your implementation but I'm of the opinion this should log the
on-wire data size of the segment used for body. Including the chunking
bytes, and excluding the Squid-added markup bytes on FTP/Gopher listings.

I am glad you raised this question because we have struggled with these issues a lot ourselves. Earlier patch versions did try to count on-the-wire body bytes before dechunking and faced many unforeseen obstacles. After many patch versions, I can identify the biggest problem as follows:

Identifying header and body boundaries is protocol/encoding/etc. specific. Counters that are specific to protocols/encodings/etc. should not use "on the wire" approach. And, vice versa, "on the wire" counters should not pretend to know where the "header", "body", or other protocol/encoding/etc-dependent parts are.

Squid generates a lot of content, under many different conditions. When I talk about protocol/encoding/etc counters, I consider generation-specific counters to belong to that "etc" category.

We may need more "one the wire" counters. We may need more protocol/encoding/etc-specific counters. The current set of counters is incomplete and inconsistent, but this patch is not meant to solve this problem.

The scope of this patch is narrow: log the size of what Squid perceives as "received response body". This is needed to better account for [small] 206 Partial Content responses [with huge resource variant lenghts].

Does this patch provide enough information to reconstruct all sizes of all protocol/encoding/etc. components? No. However, such full reconstruction is rather difficult, and it has not been requested yet. I am not sure, but I suspect that logging sizes for every possible protocol/encoding/etc. part will be impractical. The advantage of this patch is that it provides usable information at a common point for all Server-using protocols/encodings/etc.

For example, if we want to track chunked encoding overheads, we can compare raw "bytes read from the wire" counter with a sum of HTTP headers and body counters. This comparison will be accurate in most cases, but there will be exceptions such as multipart responses or generated FTP listings. Smart script can probably exclude these exceptional cases, given enough access.log fields, but it will always be difficult unless we add a specific "chunked encoding overhead" counter.

In the foreseeable future, we will add gzip encoding support. This patch will still work as intended when we do that, regardless of whether we unzip before and/or after Store. However, we may need to add more counters to track effects and effectiveness of gzip-related algorithms.

Another complication is 1xx control messages. This patch does not account for them. Some other counter, especially a "received one-the-wire bytes" counter will. I would certainly recommend adding such a counter (we already have one for ICAP), but (a) that addition is outside this patch scope and (b) that counter should not try to identify where the message body is.

If there are no strong objections, I recommend preserving the counting point of this patch (Server::addVirginBody) while acknowledging that the new counter does not count everything we might want to count. I am happy to change the logformat code from %<bs to something else if you can come up with a better name.


Thank you,

Alex.

Reply via email to