On Mon, 14 Mar 2011 09:26:49 -0600, Alex Rousskov wrote:
On 03/11/2011 05:52 PM, Amos Jeffries wrote:
On 12/03/11 12:59, Alex Rousskov wrote:
Hello,

All of the following is related to the %adapt::<last_h access.log format code (possibly parameterized with the header or element name):

* squid.conf says it is the "last ICAP header received", essentially.

* LFT_ADAPTATION_LAST_HEADER takes ICAP headers from all ICAP messages
received, merges them (last header field wins if there are two ICAP
headers with a field that is named the same), and logs the matching ICAP
header field value.

* LFT_ADAPTATION_LAST_HEADER_ELEM is the same as the
LFT_ADAPTATION_LAST_HEADER above, but logs the matching ICAP header
field value element.

* LFT_ADAPTATION_LAST_ALL_HEADERS logs the last ICAP header received as
described in squid.conf.


In other words,

* squid.conf and %adapt::<last_h without parameters are about logging
the last ICAP header received;

* parameterized %adapt::<last_h merges all the received ICAP headers (later headers overwrite earlier ones if there are name conflicts) and
logs the last matching header field value/element.


I suspect %adapt::<last_h without parameters is generally useless in
access.log due to the volume of mostly irrelevant information. When
somebody is using %adapt::<last_h in production, they probably log
specific header field values or elements.

I know that folks use parameterized %adapt::<last_h to log various
transaction IDs and flags delivered by ICAP and eCAP services. For
example, an eCAP service may identify the detected Virus ID or an
authenticated user ID.

In the cases I am familiar with, the merging performed by parameterized
%adapt::<last_h is the desired/correct behavior because those Squid
admins care about the ID value and not about which adaptation service determined it. They cannot assume that the ID was determined by the last
service Squid talked to.

However, our documentation describes a different behavior (no merging, just use the last received ICAP header), and I would not be surprised if there are users who use parameterized %adapt::<last_h while relying on the documented behavior. These users probably have just one ICAP or eCAP
service or their last service always sets the header they log;
otherwise, they would be logging wrong values due to the current
implementation.


To resolve the above conflicts, we need to change documentation,
implementation, or both. I propose the following:

1. Add "%adapt::<merged_h" and document the currently implemented
merging algorithm where the last header wins conflicts but all headers can play. Make sure the code implements it in all cases (parameterized
and not).

2. Deprecate "%icap::<last_h" and remove support for "%adapt::<last_h". If somebody complaints, we can add "%adapt::<last_h" back, this time
with the right implementation.

Meanwhile, "%icap::<last_h" will default to "%adapt::<merged_h" for
backward compatibility. Since no released version has
"%adapt::<merged_h" code yet (they have "%icap::<last_h"), now is a good
time for this step.

No need for new names IMO.
 We have <last_h being documented as the specific "last received
headers" and <h being a broader "the headers".

Reusing <h sounds good!

IMO:
 * Change existing parameterized adapt::<last_h (merging) to be what
happens with parameterized adapt::<h
* Add new parameterized adapt::<last_h which only considers the last
ICAP headers. As per the documented semantics of <last_h.
 * Implement non-parameterized adapt::<h to output the merged ICAP
headers (somehow).

I am not sure I understand the first bullet correctly. I also think that
parameterized cases should just narrow down what is logged and not
change the semantics of what is logged, so we should not
consider/document them specially (except when reporting implementation
bugs). Does the end-result summary bellow match your recommendation?

1. adapt::<h logs all received headers merged into one. If two or more
responses containing header field(s) with the name NAME are received,
the header field(s) from the last received response containing header
field named NAME win, and all fields named NAME from the previously
received responses are discarded. Parameterized cases just narrow down
what is logged from those merged headers.

2. adapt::<last_h logs the header of the last received response.
Parameterized cases just narrow down what is logged from that response
header.

Yes. Exactly.

I went a little further and looked at how to accomplish it. My bullets were kind of a step-by-step todo list ending in that state.

What I meant was the current implementation of adapt::<{Foo}last_h already does what we actually want adapt::<{Foo}h to be doing. Just shuffling the logformat tags fixes that up as a first step. Followed by implementing the currently missing bits for each tag.


Amos

Reply via email to