Re: [PATCH] log received body size

Alex Rousskov Tue, 03 Aug 2010 09:40:05 -0700

On 08/03/2010 03:27 AM, Amos Jeffries wrote:

Tsantilas Christos wrote:

Added %http::<bs and %icap::<bs logformat codes to HTTP and ICAP body
sizes received from the next HTTP hop or the ICAP server.


Logging "received message body" is useful because Squid may receive a
lot more or a lot less than it serves to the client or than the
original resource size, which may happen when handling Range requests
and partial responses, when adapting bodies, and for other reasons.

For HTTP, we define "received message body" as message body bytes that
Squid stores, merges, adapts, and/or forwards. In most cases, they are
the same as body bytes sent by the server to Squid. However, the two
bodies may differ for reasons such as errors (where the start of the
body was not found), HTTP transfer encodings (where Squid strips
chunked encoding to find the message body), and generated FTP
directory listings (that were received in a completely different
format on a control connection).

For ICAP, the "received message body" is the Encapsulated sections,
after the encapsulated HTTP body, if any, is dechunked.

src/Server.cc:

ServerStateData::addVirginReplyBody(const char *data, ssize_t len)
{
+ adjustBodyBytesRead(+len);

whats with the +?


Hi Amos,

I will respond to this specific question because the plus sign ismy fault. The adjustBodyBytesRead() function accepts zero, negative, andpositive deltas. The plus emphasizes that this specific delta is[probably] positive. I believe the posted patch version does not evenuse negative deltas, making the plus sign even more confusing. We willremove the plus sign.

* I've been a bit uncomfortable since the beginning about the
de-chunking happening before the body bytes are accounted. This
introduces some loss into the protocol accounting apparently being done.
Are those lots bytes accounted for as headers? or is the total reply
bytes value going to be larger than the sum of headers+body ?
Do we stop at de-chunking? what about un-zipping? etc...

It's your implementation but I'm of the opinion this should log the
on-wire data size of the segment used for body. Including the chunking
bytes, and excluding the Squid-added markup bytes on FTP/Gopher listings.

I am glad you raised this question because we have struggled with theseissues a lot ourselves. Earlier patch versions did try to counton-the-wire body bytes before dechunking and faced many unforeseenobstacles. After many patch versions, I can identify the biggest problemas follows:

Identifying header and body boundaries is protocol/encoding/etc.specific. Counters that are specific to protocols/encodings/etc. shouldnot use "on the wire" approach. And, vice versa, "on the wire" countersshould not pretend to know where the "header", "body", or otherprotocol/encoding/etc-dependent parts are.

Squid generates a lot of content, under many different conditions. WhenI talk about protocol/encoding/etc counters, I considergeneration-specific counters to belong to that "etc" category.

We may need more "one the wire" counters. We may need moreprotocol/encoding/etc-specific counters. The current set of counters isincomplete and inconsistent, but this patch is not meant to solve thisproblem.

The scope of this patch is narrow: log the size of what Squid perceivesas "received response body". This is needed to better account for[small] 206 Partial Content responses [with huge resource variant lenghts].

Does this patch provide enough information to reconstruct all sizes ofall protocol/encoding/etc. components? No. However, such fullreconstruction is rather difficult, and it has not been requested yet. Iam not sure, but I suspect that logging sizes for every possibleprotocol/encoding/etc. part will be impractical. The advantage of thispatch is that it provides usable information at a common point for allServer-using protocols/encodings/etc.

For example, if we want to track chunked encoding overheads, we cancompare raw "bytes read from the wire" counter with a sum of HTTPheaders and body counters. This comparison will be accurate in mostcases, but there will be exceptions such as multipart responses orgenerated FTP listings. Smart script can probably exclude theseexceptional cases, given enough access.log fields, but it will always bedifficult unless we add a specific "chunked encoding overhead" counter.

In the foreseeable future, we will add gzip encoding support. This patchwill still work as intended when we do that, regardless of whether weunzip before and/or after Store. However, we may need to add morecounters to track effects and effectiveness of gzip-related algorithms.

Another complication is 1xx control messages. This patch does notaccount for them. Some other counter, especially a "receivedone-the-wire bytes" counter will. I would certainly recommend addingsuch a counter (we already have one for ICAP), but (a) that addition isoutside this patch scope and (b) that counter should not try to identifywhere the message body is.

If there are no strong objections, I recommend preserving the countingpoint of this patch (Server::addVirginBody) while acknowledging that thenew counter does not count everything we might want to count. I am happyto change the logformat code from %<bs to something else if you can comeup with a better name.



Thank you,

Alex.

Re: [PATCH] log received body size

Reply via email to