For starters. This is probably 3.3. We can continue to hack our way around data passing limitations in 3.2. Although with Alex emphasis on optimization going into 3.2, some of this may help that.

THE RANT:

The more I've been looking at the client-side the more I see duplicated copies and re-copies of transaction details. I know you all agree there is too much of it.

What I've seen myself...

ACLChecklist - storing a copy of as many currently known transaction details as people have asked to be checked so far.

  HttpRequest - storing copy of *almost* all transaction details.

  ClientHttpRequest - copy of the transaction details needed by client-side

ConnSateData - copy of the transaction TCP details and some HttpRequest details needed by pinning, 1xx and other weird state handling.

 AccessLogEntry - copy of all most known transaction details.

... and by "copy" I mean complete duplicate. xstrdup() galore etc.
With a bunch of code doing nothing but checking the local copy is up to date with whatever other copy or function parameter it sourced the detail from. A bunch of other code *assuming* that its getting the right details (sometimes wrongly).

* Have not yet got a good look at the reply handling path in detail yet. Overall it seems to be using the request-path objects in reverse. So no worse or better.



IDEAS:

Note that ClientHttpRequest has a member copy of AccessLogEntry. This is *already* available and unique on a per-request basis from the very start of the HTTP request arrival and parsing. Persists across the whole transaction lifetime and is used for logging at the end.

I propose that the first thing we do is clean up its internal structure design. To make sure it has all the fields we will need in the net step.

I propose then to rename as a general-purpose transaction storage area (TransactionDetails?). To avoid people ignoring it as a "logging-only" thing.

I propose then to roll each step/object along the transaction pathway to using it as their primary storage area for transaction details and history.

- incremental so can be done in the background for low impact starting immediately.
 - will soon lead to removal of several useless copies.
- will mean component/Jobs updated are guaranteed to have *all* details for the current state of the transaction available should they need it.

NOTE: little fine-detail processing pathways like ident will only need a selected refcount/cbdata/locked sub-child of the whole slab object. This is fine and will help drop dependencies. Thus the proposed modular hierarchy structure below.

To kick-start things this is what I've been thinking we need its structure to look like:

class TransactionDetails {

 class TimeDetails {
   // all the timing and wait stats we can dream up.
   // for the transaction as a whole.
   // specific times stay in their own component.
 } time;

 // Details about the TCP links used by this transaction.
 class TcpDetails {
   struct { FD, ip, port, eui, ident } client;
   struct s_ { FP, ip, port, eui, ident } server;
   vector<s_> serverHistory;
   // NP: not sure if we want a server-side history
   // if so it would go here listing all outbound attempts.
 } tcp;

 class SquidInternalDetails {
    // which worker/disker served this request?
    ext_acl; // details from external ACL tested
    auth; // details from proxy-auth helpers
    status; // status flags hit/miss/peer/aborted/timeout etc
    hier; // heirarchy details, HierarchyLogEntry
 } squid;

 // Details about the ICP used by this transaction.
 class IcpDetails {
    icp_opcode opcode;
 }

 // Details about the HTCP used by this transaction.
 class HtcpDetails {
    htcp_opcode? opcode;
 }

 // Details about the HTTP used by this transaction.
 class HttpDetails {
   // currently to be used request/reply.
   // points to the later specific objects
   HttpRequestPointer request;
   HttpReplyPointer reply;

   // specific state objects
   HttpRequestPointer original_request; // original received
   HttpRequestPointer adapted_request; // after adaptation

   HttpRequestPointer original_reply; // original received
   HttpRequestPointer adapted_reply; // after adaptation
   // NP: original reply may be nil if non-HTTP source.
   //     in which case...
   HttpRequestPointer generated_reply; // pre-adaptation.
 } http;

 // Details about the adaptation used by this transaction.
 class AdaptationDetails {
    { ...} icap; // icap state and history, pretty much as-is
    {...} ecap; // ecap state if we find anythig to log.
 } adapt;

 // Details about the FTP used by this transaction.
 class FtpDetails {
   vector<String> protoLog; // FTP msgs used in this fetch.
 }

 // ... other entries similar to FTP for gopher, wais, etc.
}

NOTE that "headers", "private" and "cache" are gone.
 - "headers" blobs are part of HttpRequest (or should be)
 - "private" is duplicate of HttpRequest details
- "cache" is split into whichever component is actually relevant for the particular field.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.12
  Beta testers wanted for 3.2.0.7 and 3.1.12.1

Reply via email to