Re: [RFC] connections logging

2014-06-24 Thread Amos Jeffries
On 24/06/2014 12:26 p.m., Alex Rousskov wrote:
 On 06/20/2014 03:04 AM, Amos Jeffries wrote:
 I am playing with the idea of adding a new log file to record just the
 connections handled by Squid (rather than the HTTP request/reply
 transactions).

 This log would include connections opened but never used, port details,
 connection lifetimes, re-use counts on persistent connections, SSL/TLS
 presence/version (or not), and other connection details we find a need
 for later.
 
 
 The driver behind this is to help resolve a growing amount of user
 queries regarding happy eyeballs idle connections and broken TLS
 connections. We are also adding other potentially never-used connections
 ourselves with the standby pools.
 
 A couple of days ago, I came across another use case for logging
 connections unrelated to Happy Eyeballs. It sounds like this is going to
 be a generally useful feature. My use case is still being clarified, but
 here are the already known non-obvious requirements:
 
 1. Logging when the connection is received (unlike transactions that are
 logged when the transaction is completed).
 
 2. Logging client-to-Squid connections.
 
 3. Introduction of some kind of unique connection ID that can be
 associated with HTTP/etc transactions on that connection, correlating
 access.log and connection.log entries. [ uniqueness scope is being
 clarified, but I hope it would just be a single worker lifetime ]
 

For IDs I was thinking of just emitting the MasterXaction::id value sent
in by TcpAcceptor. Assumign this is defined as the TCP transaction (see
below).

 
 Let's coordinate our efforts!
 

Sure.

 
 The Squid native access.log appears unsuitable for adjustment given its
 widely used format and the number of details to be logged.
 
 Agreed. We should support a similarly configurable log format and other
 existing logging knobs but use a different, dedicated log (or logs if we
 want to record from-Squid connections and to-Squid connections
 differently). Reusable logformat codes such as %la and %lp should be
 reused, but new ones will probably be needed as well.
 

Ack.

 
 There is also a useful side effect of MasterXaction. At present the
 TcpAcceptor generates a MasterXaction object to record initial
 connection details for the handler, this creates a bit of complexity in
 the recipient regarding whether the MasterXactino object applies to a
 future HTTP request or one which is currently being parsed. This could
 be simplified if the connection was a transaction and the HTTP requests
 on it each a different transaction spawned from the connection
 MasterXaction when parsing a new requst.
 
 I am not sure what you mean by if the connection was a transaction.

I mean defining the TCP actions of opening a socket/connnection, its
entire lifetime, and close() as being one transaction. With the
TcpAcceptor produced MasterXaction representing that cycle.

HTTP requests arriving on the connection being separate transactions
spawned by the parsing actions. (unfortunately the read(2) does not
always nicely match a parse event identifying the exact HTTP request start).

So we have two new transaction types there. TCP transaction and HTTP
transaction, to go alongside ICAP and eCAP transaction etc.


 The MasterXaction is usually associated with at least one connection
 (via the small t transaction using that connection), and one
 connection is usually associated with at least one MasterTransaction,
 but transactions and connections are completely different concepts at
 Squid level IMO. I doubt we should try to merge the concept of a
 transaction and a connection somehow.

Generating the HTTP transaction MasterXaction object (child of
MasterXaction?) by cloning it from the TCP connection one represents
that relationship. The clone could be seeded with the parents id value
to persist that info for later use (eg your start of HTTP transaction
log entry).

 
 MasterTransaction is created at connection acceptance time because that
 is when the first HTTP/etc. transaction starts (parsing headers is not
 the first step) and, hence, that is when the master transaction starts.

Hmm. We have some issue here satisfying the people who want HTTP
transation only to represent the time of request+reply versus browser
happy eyeballs (and now Squid) opening standby connections with long
durations before first byte. That is one reason supporting separate
transaction MasterXaction for TCP and HTTP.

 
 We could make the connection acceptance event itself a transaction (an
 object of some new AcceptXaction class inherited from a generic
 Transaction class and associated with the corresponding Master
 Transaction object just like HTTP and ICAP transactions should be). I do
 not yet see what that would actually buy us except more classes to worry
 about, but, at the very least, it would not be conceptually wrong.
 

I agree accept by itself is probably useless. The longer TCP lifecycle
on the other hand (see above) has some use for 

Re: [RFC] connections logging

2014-06-24 Thread Amos Jeffries
On 24/06/2014 4:41 p.m., Kinkie wrote:
 On Tue, Jun 24, 2014 at 2:26 AM, Alex Rousskov
 rouss...@measurement-factory.com wrote:
 On 06/20/2014 03:04 AM, Amos Jeffries wrote:
 I am playing with the idea of adding a new log file to record just the
 connections handled by Squid (rather than the HTTP request/reply
 transactions).

 This log would include connections opened but never used, port details,
 connection lifetimes, re-use counts on persistent connections, SSL/TLS
 presence/version (or not), and other connection details we find a need
 for later.


 The driver behind this is to help resolve a growing amount of user
 queries regarding happy eyeballs idle connections and broken TLS
 connections. We are also adding other potentially never-used connections
 ourselves with the standby pools.

 A couple of days ago, I came across another use case for logging
 connections unrelated to Happy Eyeballs. It sounds like this is going to
 be a generally useful feature. My use case is still being clarified, but
 here are the already known non-obvious requirements:
 
 I have an use case, as well: timing logging.
 It would be useful to have a chance to log the timing of certain key
 moments in a request's processing path. Things like accept, end of
 slow acl matching, end of dns resolution(s), connected to uplink, and
 so on. This could help administrators identify congestion issues in
 their infrastructure.
 
Kinkie
 


And that use case gives us a good potential name for it. event.log ?

Amos


Re: [RFC] connections logging

2014-06-24 Thread Alex Rousskov
On 06/24/2014 12:55 AM, Amos Jeffries wrote:
 On 24/06/2014 4:41 p.m., Kinkie wrote:
 On Tue, Jun 24, 2014 at 2:26 AM, Alex Rousskov wrote:
 On 06/20/2014 03:04 AM, Amos Jeffries wrote:
 The driver behind this is to help resolve a growing amount of user
 queries regarding happy eyeballs idle connections and broken TLS
 connections. We are also adding other potentially never-used connections
 ourselves with the standby pools.

 A couple of days ago, I came across another use case for logging
 connections unrelated to Happy Eyeballs. It sounds like this is going to
 be a generally useful feature. My use case is still being clarified, but
 here are the already known non-obvious requirements:

 I have an use case, as well: timing logging.
 It would be useful to have a chance to log the timing of certain key
 moments in a request's processing path. Things like accept, end of
 slow acl matching, end of dns resolution(s), connected to uplink, and
 so on. This could help administrators identify congestion issues in
 their infrastructure.


 And that use case gives us a good potential name for it. event.log ?

I was also going to say that if we want to log more than just
connection-specific events, then we should expand the proposal to
accommodate arbitrary events. Needless to say, the initial
implementation may only support one or two event kinds, but the design
should allow for other future event kinds.

Other known use cases include logging initial HIT or MISS detection: I
had to patch Squid recently to enable that event logging (for legacy
reasons the admin could not use ToS marks).


This kind of circles back to the debugging discussion we are avoiding
on another thread -- in both cases, we need an internal API to signal
useful events and the corresponding configuration knobs to control their
logging. The line between developer-useful (i.e., debugging) events
and admin-useful events is blurry, but we do not have to define it if
we just focus on logging admin-useful stuff. I would not be surprised if
this eventually morphs into a better debugging interface as well.


Cheers,

Alex.



Re: [RFC] connections logging

2014-06-24 Thread Alex Rousskov
On 06/24/2014 12:33 AM, Amos Jeffries wrote:
 On 24/06/2014 12:26 p.m., Alex Rousskov wrote:
 On 06/20/2014 03:04 AM, Amos Jeffries wrote:
 I am playing with the idea of adding a new log file to record just the
 connections handled by Squid (rather than the HTTP request/reply
 transactions).

 This log would include connections opened but never used, port details,
 connection lifetimes, re-use counts on persistent connections, SSL/TLS
 presence/version (or not), and other connection details we find a need
 for later.


 The driver behind this is to help resolve a growing amount of user
 queries regarding happy eyeballs idle connections and broken TLS
 connections. We are also adding other potentially never-used connections
 ourselves with the standby pools.

 A couple of days ago, I came across another use case for logging
 connections unrelated to Happy Eyeballs. It sounds like this is going to
 be a generally useful feature. My use case is still being clarified, but
 here are the already known non-obvious requirements:

 1. Logging when the connection is received (unlike transactions that are
 logged when the transaction is completed).

 2. Logging client-to-Squid connections.

 3. Introduction of some kind of unique connection ID that can be
 associated with HTTP/etc transactions on that connection, correlating
 access.log and connection.log entries. [ uniqueness scope is being
 clarified, but I hope it would just be a single worker lifetime ]

 
 For IDs I was thinking of just emitting the MasterXaction::id value sent
 in by TcpAcceptor. Assumign this is defined as the TCP transaction (see
 below).

Yes, MasterXaction ID can be copied to be used as a connection ID, but
there is no good reason for such a reuse IMO. We need a Connection ID.
IMO, it would be better to just add a Connection-dedicated InstanceId
(especially if it satisfies the yet-unknown uniqueness scope requirements).

Yes, an InstanceId-based ID would work for the worker lifetime
uniqueness scope, and I hope that is all we need to support.


 I am not sure what you mean by if the connection was a transaction.
 
 I mean defining the TCP actions of opening a socket/connnection, its
 entire lifetime, and close() as being one transaction. With the
 TcpAcceptor produced MasterXaction representing that cycle.

I strongly disagree that we should change the MasterXaction definition
like that. We already have a Connection class to represent what you
want, I think. Why drag Master Transaction into this, abandoning its
current role of aggregating information about transactions caused by a
single HTTP request (or equivalent)?

HTTP transactions need a Master Transaction to correlate an HTTP request
received from a client with the corresponding HTTP request(s) sent to
the server, with the corresponding HTTP response(s) received from the
server, with the corresponding ICAP messages, DNS activity, etc. A
Master Transaction usually relates to several Connections as well, but
Connection is not a Master Transaction.


 So we have two new transaction types there. TCP transaction and HTTP
 transaction, to go alongside ICAP and eCAP transaction etc.

It is a bad idea to use the same terminology and class hierarchy for
low-level transport concepts such as TCP and for high level HTTP
transactions executed on top of a transport connection.

We should just use Connection instead of TCP transaction. If we need
TcpConnection, we can add that. The Connection hierarchy is separate
from the Xaction hierarchy, but Xactions may use Connections, of course.


 The MasterXaction is usually associated with at least one connection
 (via the small t transaction using that connection), and one
 connection is usually associated with at least one MasterTransaction,
 but transactions and connections are completely different concepts at
 Squid level IMO. I doubt we should try to merge the concept of a
 transaction and a connection somehow.

 Generating the HTTP transaction MasterXaction object (child of
 MasterXaction?) by cloning it from the TCP connection one represents
 that relationship. The clone could be seeded with the parents id value
 to persist that info for later use (eg your start of HTTP transaction
 log entry).

I am afraid we have a completely different view of what MasterXaction is
and what protocol transactions are (or should be). It is possible that I
am completely misinterpreting your statement about cloning and seeding,
but if you are calling for creation of some abstract MasterXaction class
that will be a parent for various specific FooMasterXaction classes such
as HttpMasterXaction and TcpMasterXaction, then I see no need for (and a
lot of harm from) that level of complexity.

I do not know how to avoid another prolonged discussion here, but
perhaps we can start with definitions of the proposed classes (instead
of discussing cloning and seeding which are just low-level details)? The
classes I am talking are, roughly:

* MasterXaction: [Historical/aggregated 

Re: [RFC] connections logging

2014-06-23 Thread Alex Rousskov
On 06/20/2014 03:04 AM, Amos Jeffries wrote:
 I am playing with the idea of adding a new log file to record just the
 connections handled by Squid (rather than the HTTP request/reply
 transactions).
 
 This log would include connections opened but never used, port details,
 connection lifetimes, re-use counts on persistent connections, SSL/TLS
 presence/version (or not), and other connection details we find a need
 for later.


 The driver behind this is to help resolve a growing amount of user
 queries regarding happy eyeballs idle connections and broken TLS
 connections. We are also adding other potentially never-used connections
 ourselves with the standby pools.

A couple of days ago, I came across another use case for logging
connections unrelated to Happy Eyeballs. It sounds like this is going to
be a generally useful feature. My use case is still being clarified, but
here are the already known non-obvious requirements:

1. Logging when the connection is received (unlike transactions that are
logged when the transaction is completed).

2. Logging client-to-Squid connections.

3. Introduction of some kind of unique connection ID that can be
associated with HTTP/etc transactions on that connection, correlating
access.log and connection.log entries. [ uniqueness scope is being
clarified, but I hope it would just be a single worker lifetime ]


Let's coordinate our efforts!


 The Squid native access.log appears unsuitable for adjustment given its
 widely used format and the number of details to be logged.

Agreed. We should support a similarly configurable log format and other
existing logging knobs but use a different, dedicated log (or logs if we
want to record from-Squid connections and to-Squid connections
differently). Reusable logformat codes such as %la and %lp should be
reused, but new ones will probably be needed as well.


 There is also a useful side effect of MasterXaction. At present the
 TcpAcceptor generates a MasterXaction object to record initial
 connection details for the handler, this creates a bit of complexity in
 the recipient regarding whether the MasterXactino object applies to a
 future HTTP request or one which is currently being parsed. This could
 be simplified if the connection was a transaction and the HTTP requests
 on it each a different transaction spawned from the connection
 MasterXaction when parsing a new requst.

I am not sure what you mean by if the connection was a transaction.
The MasterXaction is usually associated with at least one connection
(via the small t transaction using that connection), and one
connection is usually associated with at least one MasterTransaction,
but transactions and connections are completely different concepts at
Squid level IMO. I doubt we should try to merge the concept of a
transaction and a connection somehow.

MasterTransaction is created at connection acceptance time because that
is when the first HTTP/etc. transaction starts (parsing headers is not
the first step) and, hence, that is when the master transaction starts.

We could make the connection acceptance event itself a transaction (an
object of some new AcceptXaction class inherited from a generic
Transaction class and associated with the corresponding Master
Transaction object just like HTTP and ICAP transactions should be). I do
not yet see what that would actually buy us except more classes to worry
about, but, at the very least, it would not be conceptually wrong.


HTH,

Alex.



Re: [RFC] connections logging

2014-06-23 Thread Kinkie
On Tue, Jun 24, 2014 at 2:26 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 06/20/2014 03:04 AM, Amos Jeffries wrote:
 I am playing with the idea of adding a new log file to record just the
 connections handled by Squid (rather than the HTTP request/reply
 transactions).

 This log would include connections opened but never used, port details,
 connection lifetimes, re-use counts on persistent connections, SSL/TLS
 presence/version (or not), and other connection details we find a need
 for later.


 The driver behind this is to help resolve a growing amount of user
 queries regarding happy eyeballs idle connections and broken TLS
 connections. We are also adding other potentially never-used connections
 ourselves with the standby pools.

 A couple of days ago, I came across another use case for logging
 connections unrelated to Happy Eyeballs. It sounds like this is going to
 be a generally useful feature. My use case is still being clarified, but
 here are the already known non-obvious requirements:

I have an use case, as well: timing logging.
It would be useful to have a chance to log the timing of certain key
moments in a request's processing path. Things like accept, end of
slow acl matching, end of dns resolution(s), connected to uplink, and
so on. This could help administrators identify congestion issues in
their infrastructure.

   Kinkie