Re: Associating accesses with cache.log entries

2009-12-17 Thread Mark Nottingham
I've put a basic proof-of-concept into a bug; see 
http://bugs.squid-cache.org/show_bug.cgi?id=2835.

It only logs client FDs, but gives output like this (with debug_options ALL,2):

2009/12/17 22:08:24 client_fd=13 ctx: enter level  0: 'client_fd=13'
2009/12/17 22:08:24 client_fd=13 Parser: retval 1: from 0-47: method 0-2; url 
4-36; version 38-46 (1/1)
2009/12/17 22:08:24 client_fd=13 The request GET http://www.mnot.net/test/ is 
ALLOWED, because it matched 'localhost'
2009/12/17 22:08:24 client_fd=13 clientCacheHit: refreshCheckHTTPStale returned 
1
2009/12/17 22:08:24 client_fd=13 peerSourceHashSelectParent: Calculating hash 
for 127.0.0.1
2009/12/17 22:08:28 client_fd=17 ctx: exit level  0
2009/12/17 22:08:28 client_fd=17 ctx: enter level  0: 'client_fd=17'
2009/12/17 22:08:28 client_fd=17 Parser: retval 1: from 0-35: method 0-2; url 
4-24; version 26-34 (1/1)
2009/12/17 22:08:28 client_fd=17 The request GET http://www.apple.com/ is 
ALLOWED, because it matched 'localhost'
2009/12/17 22:08:28 client_fd=17 clientCacheHit: refreshCheckHTTPStale returned 0
2009/12/17 22:08:28 client_fd=17 clientCacheHit: HIT
2009/12/17 22:08:28 client_fd=17 The reply for GET http://www.apple.com/ is 
ALLOWED, because it matched 'all'
2009/12/17 22:08:29  ctx: exit level  0
2009/12/17 22:08:29  The reply for GET http://www.mnot.net/test/slow.cgi is 
ALLOWED, because it matched 'all'

Feedback / sanity checking appreciated.



On 25/02/2009, at 6:40 PM, Henrik Nordstrom wrote:

 ons 2009-02-25 klockan 12:10 +1100 skrev Mark Nottingham:
 
 What am I missing? The most straightforward way that I can see to do  
 this is to add an identifier to clientHttpRequest and pass that to  
 debug where available...
 
 That is what ctx_enter is about... There is not a single location where
 ctx_enter needs to be called, there is many..
 
 Remember that Squid is a big bunch of event driven state machines, doing
 a little bit of processing at a time interleaved with many other
 unrelated things. ctx_enter indicates which state transition is
 currently being processed, ctx_leave when that state transition has
 completed waiting for next event (even if still at the same state..)
 
 So you need ctx_enter in quite many places, providing a reasonable trace
 of the processing within the state machine so far, based on whatever
 identifier the current small step is about. Each time the processing
 returns to the comm loop you are back at ctx level 0 with no context.
 Sometimes the ctx level may be quite high, having many loosely related
 state transitions in the trace, sometimes even almost completely
 unrelated requests.
 
 Most of the time the state machine starts with something directly
 related to a specific request (read/write on http sockets) however, but
 there is also many other kinds of state transitions like DNS, timers
 etc.
 
 Regards
 Henrik
 

--
Mark Nottingham   m...@yahoo-inc.com




Re: Associating accesses with cache.log entries

2009-12-17 Thread Mark Nottingham
I've made some progress on this, now adding a request identifier and putting 
that in the context (see updated patch in bug).

I'm willing to see this through, but I'd like a bit of feedback first -- 
especially around whether this (i.e., sprinkling ctx_* calls throughout the 
codebase wherever we have a callback) is the right approach, first.

Cheers,



On 17/12/2009, at 10:24 PM, Mark Nottingham wrote:

 I've put a basic proof-of-concept into a bug; see 
 http://bugs.squid-cache.org/show_bug.cgi?id=2835.
 
 It only logs client FDs, but gives output like this (with debug_options 
 ALL,2):
 
 2009/12/17 22:08:24 client_fd=13 ctx: enter level  0: 'client_fd=13'
 2009/12/17 22:08:24 client_fd=13 Parser: retval 1: from 0-47: method 0-2; 
 url 4-36; version 38-46 (1/1)
 2009/12/17 22:08:24 client_fd=13 The request GET http://www.mnot.net/test/ is 
 ALLOWED, because it matched 'localhost'
 2009/12/17 22:08:24 client_fd=13 clientCacheHit: refreshCheckHTTPStale 
 returned 1
 2009/12/17 22:08:24 client_fd=13 peerSourceHashSelectParent: Calculating hash 
 for 127.0.0.1
 2009/12/17 22:08:28 client_fd=17 ctx: exit level  0
 2009/12/17 22:08:28 client_fd=17 ctx: enter level  0: 'client_fd=17'
 2009/12/17 22:08:28 client_fd=17 Parser: retval 1: from 0-35: method 0-2; 
 url 4-24; version 26-34 (1/1)
 2009/12/17 22:08:28 client_fd=17 The request GET http://www.apple.com/ is 
 ALLOWED, because it matched 'localhost'
 2009/12/17 22:08:28 client_fd=17 clientCacheHit: refreshCheckHTTPStale 
 returned 0
 2009/12/17 22:08:28 client_fd=17 clientCacheHit: HIT
 2009/12/17 22:08:28 client_fd=17 The reply for GET http://www.apple.com/ is 
 ALLOWED, because it matched 'all'
 2009/12/17 22:08:29  ctx: exit level  0
 2009/12/17 22:08:29  The reply for GET http://www.mnot.net/test/slow.cgi is 
 ALLOWED, because it matched 'all'
 
 Feedback / sanity checking appreciated.
 
 
 
 On 25/02/2009, at 6:40 PM, Henrik Nordstrom wrote:
 
 ons 2009-02-25 klockan 12:10 +1100 skrev Mark Nottingham:
 
 What am I missing? The most straightforward way that I can see to do  
 this is to add an identifier to clientHttpRequest and pass that to  
 debug where available...
 
 That is what ctx_enter is about... There is not a single location where
 ctx_enter needs to be called, there is many..
 
 Remember that Squid is a big bunch of event driven state machines, doing
 a little bit of processing at a time interleaved with many other
 unrelated things. ctx_enter indicates which state transition is
 currently being processed, ctx_leave when that state transition has
 completed waiting for next event (even if still at the same state..)
 
 So you need ctx_enter in quite many places, providing a reasonable trace
 of the processing within the state machine so far, based on whatever
 identifier the current small step is about. Each time the processing
 returns to the comm loop you are back at ctx level 0 with no context.
 Sometimes the ctx level may be quite high, having many loosely related
 state transitions in the trace, sometimes even almost completely
 unrelated requests.
 
 Most of the time the state machine starts with something directly
 related to a specific request (read/write on http sockets) however, but
 there is also many other kinds of state transitions like DNS, timers
 etc.
 
 Regards
 Henrik
 
 
 --
 Mark Nottingham   m...@yahoo-inc.com
 
 

--
Mark Nottingham   m...@yahoo-inc.com




Re: Associating accesses with cache.log entries

2009-02-24 Thread Mark Nottingham

Not sure I follow.

ctx_enter is called relatively late, in httpProcessReplyHeader and  
destroy_MemObject; I'd think for this case we need to have a request  
id associated quite early, probably around parseHttpRequest.


Also, AFAICT Ctx_Descrs and Ctx_Current_Level isn't really a reliable  
way to identify what the context of a particular debug statement is.


What am I missing? The most straightforward way that I can see to do  
this is to add an identifier to clientHttpRequest and pass that to  
debug where available...



On 28/11/2008, at 12:02 PM, Henrik Nordstrom wrote:


fre 2008-11-28 klockan 10:34 +1100 skrev Mark Nottingham:


Agreed. How would you pass it into debug()? It looks like _db_print
already takes variable length args,


By adding it to the context already used by _db_print.. i.e. by
extending ctx_enter to take the sequence number in addition to url.

Regards
Henrik



--
Mark Nottingham   m...@yahoo-inc.com




Re: Associating accesses with cache.log entries

2009-02-24 Thread Henrik Nordstrom
ons 2009-02-25 klockan 12:10 +1100 skrev Mark Nottingham:

 What am I missing? The most straightforward way that I can see to do  
 this is to add an identifier to clientHttpRequest and pass that to  
 debug where available...

That is what ctx_enter is about... There is not a single location where
ctx_enter needs to be called, there is many..

Remember that Squid is a big bunch of event driven state machines, doing
a little bit of processing at a time interleaved with many other
unrelated things. ctx_enter indicates which state transition is
currently being processed, ctx_leave when that state transition has
completed waiting for next event (even if still at the same state..)

So you need ctx_enter in quite many places, providing a reasonable trace
of the processing within the state machine so far, based on whatever
identifier the current small step is about. Each time the processing
returns to the comm loop you are back at ctx level 0 with no context.
Sometimes the ctx level may be quite high, having many loosely related
state transitions in the trace, sometimes even almost completely
unrelated requests.

Most of the time the state machine starts with something directly
related to a specific request (read/write on http sockets) however, but
there is also many other kinds of state transitions like DNS, timers
etc.

Regards
Henrik



Re: Associating accesses with cache.log entries

2008-12-11 Thread Alex Rousskov
On Thu, 2008-11-27 at 14:21 +1100, Mark Nottingham wrote:
 I've been playing around with associating specific requests with the  
 debug output they generate

snip

 * How do people feel about putting this in cache.log all the time? I  
 don't think it'll break any scripts (there aren't many, and those that  
 are tend to grep for specific phrases, rather than do an actual parse,  
 AFAICT). Is the placement above appropriate?

Identifying HTTP transactions is a very good thing, IMO. Squid3 uses a
similar approach for async jobs and adaptation transactions. They all
have unique IDs (within a single Squid execution).

I have a script that can extract most messages related to a given job or
transaction from a level-9 cache.log. The effect is similar to follow
this TCP stream in Wireshark. Very handy for debugging!

More work is needed though to give unique IDs to HTTP transactions, add
them to headers, and to improve the script to detect relationships
between identified transactions (e.g., show me all jobs that touched
xact149674).

 * To make this really useful, it would be necessary to be able to  
 trigger debug_options (or just all debugging) based upon an ACL match.  
 However, this looks like it would require changing how debug is  
 #defined. Any comments on this?

Yes, this would be useful when, for example, debugging a known URL on a
busy cache where ALL,9 is not practical. I would try to do this by
asking the current async job to check the debugging ACL when Squid
enters the job. The debugging will stop when Squid leaves the job.

Thank you,

Alex.




Re: Associating accesses with cache.log entries

2008-12-11 Thread Amos Jeffries

Alex Rousskov wrote:

On Thu, 2008-11-27 at 14:21 +1100, Mark Nottingham wrote:
I've been playing around with associating specific requests with the  
debug output they generate


snip

* How do people feel about putting this in cache.log all the time? I  
don't think it'll break any scripts (there aren't many, and those that  
are tend to grep for specific phrases, rather than do an actual parse,  
AFAICT). Is the placement above appropriate?


Identifying HTTP transactions is a very good thing, IMO. Squid3 uses a
similar approach for async jobs and adaptation transactions. They all
have unique IDs (within a single Squid execution).

I have a script that can extract most messages related to a given job or
transaction from a level-9 cache.log. The effect is similar to follow
this TCP stream in Wireshark. Very handy for debugging!


OOoh.. can you dump that in the scripts directory for the rest of us?



More work is needed though to give unique IDs to HTTP transactions, add
them to headers, and to improve the script to detect relationships
between identified transactions (e.g., show me all jobs that touched
xact149674).

* To make this really useful, it would be necessary to be able to  
trigger debug_options (or just all debugging) based upon an ACL match.  
However, this looks like it would require changing how debug is  
#defined. Any comments on this?


Yes, this would be useful when, for example, debugging a known URL on a
busy cache where ALL,9 is not practical. I would try to do this by
asking the current async job to check the debugging ACL when Squid
enters the job. The debugging will stop when Squid leaves the job.

Thank you,

Alex.




Amos
--
Please be using
  Current Stable Squid 2.7.STABLE5 or 3.0.STABLE10
  Current Beta Squid 3.1.0.3 or 3.0.STABLE11-RC1


Re: Associating accesses with cache.log entries

2008-12-11 Thread Alex Rousskov
On Fri, 2008-12-12 at 10:23 +1300, Amos Jeffries wrote:

 OOoh.. can you dump that in the scripts directory for the rest of us?

Sure, I have been planning to polish and commit for a long time. Please
bug me if I do not do it soon.

Thank you,

Alex.




Re: Associating accesses with cache.log entries

2008-11-27 Thread Henrik Nordstrom
On tor, 2008-11-27 at 14:21 +1100, Mark Nottingham wrote:
 I've been playing around with associating specific requests with the  
 debug output they generate, with a simple patch to _db_print along  
 these lines:
 
  if (Config.Log.accesslogs  Config.Log.accesslogs-logfile) {
seqnum = LOGFILE_SEQNO(Config.Log.accesslogs-logfile);
  }
  snprintf(f, BUFSIZ, %s %i| %s,
   debugLogTime(squid_curtime),
   seqnum,
   format);

Hmm.. I don't really see how this implementation is different from a
detailed timestamp.

a) It's not associated with the current request, just the last request
logged (or perhaps the next).

b) There is no direct correlation between cache.log and access.log
timing. Requests may well have notices in cache.log long before when
they are logged.

 * Is it reasonable to hardcode this to associate the numbers with the  
 first configured access_log?

I would argue that there needs to be a request sequence number
associated early in the request processing, not related to access.log.
Available for logging in access.log but not derived from there.

 * To make this really useful, it would be necessary to be able to  
 trigger debug_options (or just all debugging) based upon an ACL match.  
 However, this looks like it would require changing how debug is  
 #defined. Any comments on this?

Time to add some lua hooks? (or similar)

Regards
Henrik


signature.asc
Description: This is a digitally signed message part


Re: Associating accesses with cache.log entries

2008-11-27 Thread Henrik Nordstrom
fre 2008-11-28 klockan 10:34 +1100 skrev Mark Nottingham:

 Agreed. How would you pass it into debug()? It looks like _db_print  
 already takes variable length args,

By adding it to the context already used by _db_print.. i.e. by
extending ctx_enter to take the sequence number in addition to url.

Regards
Henrik



Associating accesses with cache.log entries

2008-11-26 Thread Mark Nottingham
I've been playing around with associating specific requests with the  
debug output they generate, with a simple patch to _db_print along  
these lines:


if (Config.Log.accesslogs  Config.Log.accesslogs-logfile) {
  seqnum = LOGFILE_SEQNO(Config.Log.accesslogs-logfile);
}
snprintf(f, BUFSIZ, %s %i| %s,
debugLogTime(squid_curtime),
seqnum,
format);

This leverages the sequence number that's available in custom access  
logs (%sn).


It's really useful for debugging requests that are causing problems,  
etc; rather than having to correlate times and URLs, you can just  
correlate sequence numbers. It also makes it possible to automate  
debug output (which is the direction I want to take this in).


beyond the obvious cleanup that needs to happen (e.g., outputting '-'  
or blank instead of 0 if there isn't an access log line associated, a  
few questions;


* How do people feel about putting this in cache.log all the time? I  
don't think it'll break any scripts (there aren't many, and those that  
are tend to grep for specific phrases, rather than do an actual parse,  
AFAICT). Is the placement above appropriate?


* The sequence number mechanism doesn't guarantee uniqueness in the  
log file; if squid is started between rotates, it will reset the  
counters. Has fixing this been discussed?


* Is it reasonable to hardcode this to associate the numbers with the  
first configured access_log?


* To make this really useful, it would be necessary to be able to  
trigger debug_options (or just all debugging) based upon an ACL match.  
However, this looks like it would require changing how debug is  
#defined. Any comments on this?


Cheers,

--
Mark Nottingham   [EMAIL PROTECTED]




Re: Associating accesses with cache.log entries

2008-11-26 Thread Kinkie
On Thu, Nov 27, 2008 at 4:21 AM, Mark Nottingham [EMAIL PROTECTED] wrote:
 I've been playing around with associating specific requests with the debug
 output they generate, with a simple patch to _db_print along these lines:

if (Config.Log.accesslogs  Config.Log.accesslogs-logfile) {
  seqnum = LOGFILE_SEQNO(Config.Log.accesslogs-logfile);
}
snprintf(f, BUFSIZ, %s %i| %s,
debugLogTime(squid_curtime),
seqnum,
format);

 This leverages the sequence number that's available in custom access logs
 (%sn).

 It's really useful for debugging requests that are causing problems, etc;
 rather than having to correlate times and URLs, you can just correlate
 sequence numbers. It also makes it possible to automate debug output (which
 is the direction I want to take this in).

Looks interesting to me.

 beyond the obvious cleanup that needs to happen (e.g., outputting '-' or
 blank instead of 0 if there isn't an access log line associated, a few
 questions;

 * How do people feel about putting this in cache.log all the time? I don't
 think it'll break any scripts (there aren't many, and those that are tend to
 grep for specific phrases, rather than do an actual parse, AFAICT). Is the
 placement above appropriate?

I'd avoid the | character, but apart from that it makes sense to me

 * The sequence number mechanism doesn't guarantee uniqueness in the log
 file; if squid is started between rotates, it will reset the counters. Has
 fixing this been discussed?

I don't think that uniqueness has much value, correlating seqnum with
the timestamp will address any uncertain cases.

 * Is it reasonable to hardcode this to associate the numbers with the first
 configured access_log?

 * To make this really useful, it would be necessary to be able to trigger
 debug_options (or just all debugging) based upon an ACL match. However, this
 looks like it would require changing how debug is #defined. Any comments on
 this?

YES! It's something I've been thinking about for some time.
Count me in.

-- 
/kinkie