On Mon, 14 Mar 2011 09:26:49 -0600, Alex Rousskov wrote:
On 03/11/2011 05:52 PM, Amos Jeffries wrote:
On 12/03/11 12:59, Alex Rousskov wrote:
Hello,
All of the following is related to the %adapt::<last_h
access.log
format code (possibly parameterized with the header or element
name):
* squid.conf says it is the "last ICAP header received",
essentially.
* LFT_ADAPTATION_LAST_HEADER takes ICAP headers from all ICAP
messages
received, merges them (last header field wins if there are two ICAP
headers with a field that is named the same), and logs the matching
ICAP
header field value.
* LFT_ADAPTATION_LAST_HEADER_ELEM is the same as the
LFT_ADAPTATION_LAST_HEADER above, but logs the matching ICAP header
field value element.
* LFT_ADAPTATION_LAST_ALL_HEADERS logs the last ICAP header
received as
described in squid.conf.
In other words,
* squid.conf and %adapt::<last_h without parameters are about
logging
the last ICAP header received;
* parameterized %adapt::<last_h merges all the received ICAP
headers
(later headers overwrite earlier ones if there are name conflicts)
and
logs the last matching header field value/element.
I suspect %adapt::<last_h without parameters is generally useless
in
access.log due to the volume of mostly irrelevant information. When
somebody is using %adapt::<last_h in production, they probably log
specific header field values or elements.
I know that folks use parameterized %adapt::<last_h to log various
transaction IDs and flags delivered by ICAP and eCAP services. For
example, an eCAP service may identify the detected Virus ID or an
authenticated user ID.
In the cases I am familiar with, the merging performed by
parameterized
%adapt::<last_h is the desired/correct behavior because those Squid
admins care about the ID value and not about which adaptation
service
determined it. They cannot assume that the ID was determined by the
last
service Squid talked to.
However, our documentation describes a different behavior (no
merging,
just use the last received ICAP header), and I would not be
surprised if
there are users who use parameterized %adapt::<last_h while relying
on
the documented behavior. These users probably have just one ICAP or
eCAP
service or their last service always sets the header they log;
otherwise, they would be logging wrong values due to the current
implementation.
To resolve the above conflicts, we need to change documentation,
implementation, or both. I propose the following:
1. Add "%adapt::<merged_h" and document the currently implemented
merging algorithm where the last header wins conflicts but all
headers
can play. Make sure the code implements it in all cases
(parameterized
and not).
2. Deprecate "%icap::<last_h" and remove support for
"%adapt::<last_h".
If somebody complaints, we can add "%adapt::<last_h" back, this
time
with the right implementation.
Meanwhile, "%icap::<last_h" will default to "%adapt::<merged_h" for
backward compatibility. Since no released version has
"%adapt::<merged_h" code yet (they have "%icap::<last_h"), now is a
good
time for this step.
No need for new names IMO.
We have <last_h being documented as the specific "last received
headers" and <h being a broader "the headers".
Reusing <h sounds good!
IMO:
* Change existing parameterized adapt::<last_h (merging) to be what
happens with parameterized adapt::<h
* Add new parameterized adapt::<last_h which only considers the
last
ICAP headers. As per the documented semantics of <last_h.
* Implement non-parameterized adapt::<h to output the merged ICAP
headers (somehow).
I am not sure I understand the first bullet correctly. I also think
that
parameterized cases should just narrow down what is logged and not
change the semantics of what is logged, so we should not
consider/document them specially (except when reporting
implementation
bugs). Does the end-result summary bellow match your recommendation?
1. adapt::<h logs all received headers merged into one. If two or
more
responses containing header field(s) with the name NAME are received,
the header field(s) from the last received response containing header
field named NAME win, and all fields named NAME from the previously
received responses are discarded. Parameterized cases just narrow
down
what is logged from those merged headers.
2. adapt::<last_h logs the header of the last received response.
Parameterized cases just narrow down what is logged from that
response
header.
Yes. Exactly.
I went a little further and looked at how to accomplish it. My bullets
were kind of a step-by-step todo list ending in that state.
What I meant was the current implementation of adapt::<{Foo}last_h
already does what we actually want adapt::<{Foo}h to be doing.
Just shuffling the logformat tags fixes that up as a first step.
Followed by implementing the currently missing bits for each tag.
Amos