Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID

2014-06-23 Thread Amos Jeffries
On 23/06/2014 5:43 a.m., Tsantilas Christos wrote:
 Hi all,
 
 The attached patch replaces existin annotation values with the new one
 received from helpers.
 
 Just one question. We are documenting key-value pairs in cf.data.pre
 only for url-rewriter helpers, but annotations works for all squid helpers.
 Should we move the related url-rewriter section to a global section? If
 yes where?
 
 For example something like the following in a global section should be
 enough:
 
 The interface for all helpers has been extended to support arbitrary
 lists of key=value pairs, with the syntax  key=value . Some keys have
 special meaning to Squid, as documented here.
 
 Currently Squid understands the following optional key=value pairs
 received from URL rewriters:
  clt_conn_id=ID
 Associates the received ID with the client TCP connection.
 The clt_conn_id=ID pair is treated as a regular annotation but
 it persists across all transactions on the client connection
 rather than disappearing after the current request. A helper
 may update the client connection ID value during subsequent
 requests by returning a new ID value. To send the connection
 ID to the URL rewriter, use url_rewrite_extras:
 url_rewrite_extras clt_conn_id=%{clt_conn_id}note ...
 

Design issue:
 * can we call this client-tag= or connection-tag= or
connection-label=. We have ID used internally via InstanceId
template members, which are very different to this ID being user-assigned.


in src/Notes.cc:
* replaceOrAdd iterates over the notes list twice in the event of
removals (due to remove() iterating over it from scratch). Please fix
this to iterate over the list only once and do an in-place replace if
the key is found already there, add if end() is reached.

* remove() appears to be unnecessary after the above correction.


in Notes.*:
* findFirst() no longer makes sense. There being no second now.
 - Please rename to find().


in src/cf.data.pre:
* kv-pairs is a format syntax definition used throughout the
documentation. Please retain it. If the term needs re-defining that is
an issue outside this patch scope.

* Please reword Squid understands the following optional key=value
pairs received from URL rewriters:. Since there are optional kv-pairs
already documented in the rewrite vs redirect vs no-change syntax we
need to specify that these are optional extensions.
Perhapse this:
 In addition to the above kv-pairs Squid also understands the following
optional kv-pairs received from URL rewriters:

* Please update the kv-pair description like so (where FOO is the term
tag or label selected to replace ID by the above mentioned design issue)

  Associates a FOO with the client TCP connection.
  The FOO is treated as a regular annotation but persists
  across future requests on the client connection
  rather than just the current request.
  A helper may update the FOO during subsequent
  requests by returning a new kv-pair.


* Reference to url_rewrite_extras should be directy after the helper
input syntax line. Like so:

  [channel-ID SP] URL [SP extras]NL

see url_rewrite_extras on how to send extras with
optional values to the helper.




in src/client_side_request.cc:
* ClientRequestContext::clientRedirectDone was previously calling
SyncNotes to ensure the AccessLogEntry and HttpRequest shared a notes
list before updating the request. Order is important here because Sync
will kill Squid with an assert if the objects have different notes list
objects.
Until that is fixed please retain the order of:
 - sync first, then update the request.

* same in ClientRequestContext::clientStoreIdDone

* in ClientHttpRequest::doCallouts() please use SBuf::isEmpty() instead
of length() to check for existence.


Amos




Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID

2014-06-23 Thread Tsantilas Christos

On 06/23/2014 09:50 AM, Amos Jeffries wrote:

On 23/06/2014 5:43 a.m., Tsantilas Christos wrote:

Hi all,

The attached patch replaces existin annotation values with the new one
received from helpers.

Just one question. We are documenting key-value pairs in cf.data.pre
only for url-rewriter helpers, but annotations works for all squid helpers.
Should we move the related url-rewriter section to a global section? If
yes where?

For example something like the following in a global section should be
enough:

The interface for all helpers has been extended to support arbitrary
lists of key=value pairs, with the syntax  key=value . Some keys have
special meaning to Squid, as documented here.

Currently Squid understands the following optional key=value pairs
received from URL rewriters:
  clt_conn_id=ID
 Associates the received ID with the client TCP connection.
 The clt_conn_id=ID pair is treated as a regular annotation but
 it persists across all transactions on the client connection
 rather than disappearing after the current request. A helper
 may update the client connection ID value during subsequent
 requests by returning a new ID value. To send the connection
 ID to the URL rewriter, use url_rewrite_extras:
 url_rewrite_extras clt_conn_id=%{clt_conn_id}note ...



Design issue:
  * can we call this client-tag= or connection-tag= or
connection-label=. We have ID used internally via InstanceId
template members, which are very different to this ID being user-assigned.


The clt_conn_id is not a client tag, but it can be a connection-tag or 
connection-label.

Lets wait Alex to answer on this.




in src/Notes.cc:
* replaceOrAdd iterates over the notes list twice in the event of
removals (due to remove() iterating over it from scratch). Please fix
this to iterate over the list only once and do an in-place replace if
the key is found already there, add if end() is reached.

* remove() appears to be unnecessary after the above correction.


I can fix a litle replaceOrAdd, but the remove() method still is needed.

The replaceOrAdd() method method will be:
void
NotePairs::replaceOrAdd(const NotePairs *src)
{
 for (std::vectorNotePairs::Entry *::const_iterator  i = 
src-entries.begin(); i != src-entries.end(); ++i) {

remove((*i)-name.termedBuf());
entries.push_back(new NotePairs::Entry((*i)-name.termedBuf(), 
(*i)-value.termedBuf()));

}
}




in Notes.*:
* findFirst() no longer makes sense. There being no second now.
  - Please rename to find().


This is not true. We can still support multiple values per key name.
I am suggesting to not touch this one.




in src/cf.data.pre:
* kv-pairs is a format syntax definition used throughout the
documentation. Please retain it. If the term needs re-defining that is
an issue outside this patch scope.

* Please reword Squid understands the following optional key=value
pairs received from URL rewriters:. Since there are optional kv-pairs
already documented in the rewrite vs redirect vs no-change syntax we
need to specify that these are optional extensions.
Perhapse this:
  In addition to the above kv-pairs Squid also understands the following
optional kv-pairs received from URL rewriters:

* Please update the kv-pair description like so (where FOO is the term
tag or label selected to replace ID by the above mentioned design issue)

   Associates a FOO with the client TCP connection.
   The FOO is treated as a regular annotation but persists
   across future requests on the client connection
   rather than just the current request.
   A helper may update the FOO during subsequent
   requests by returning a new kv-pair.


* Reference to url_rewrite_extras should be directy after the helper
input syntax line. Like so:

   [channel-ID SP] URL [SP extras]NL

 see url_rewrite_extras on how to send extras with
 optional values to the helper.




Ok for these changes





in src/client_side_request.cc:
* ClientRequestContext::clientRedirectDone was previously calling
SyncNotes to ensure the AccessLogEntry and HttpRequest shared a notes
list before updating the request. Order is important here because Sync
will kill Squid with an assert if the objects have different notes list
objects.
Until that is fixed please retain the order of:
  - sync first, then update the request.

* same in ClientRequestContext::clientStoreIdDone


It is the same, it is not make difference.
UpdateRequestNotes will just copy helper notes to the request notes. The 
SyncNotes will check if AccessLogEntry and HttpRequest object points to 
the same NotePairs object.




* in ClientHttpRequest::doCallouts() please use SBuf::isEmpty() instead
of length() to check for existence.


OK.

Lets wait others comments before I post a new patch.




Amos







Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID

2014-06-23 Thread Alex Rousskov
On 06/23/2014 07:44 AM, Tsantilas Christos wrote:
 On 06/23/2014 09:50 AM, Amos Jeffries wrote:
 On 23/06/2014 5:43 a.m., Tsantilas Christos wrote:
   clt_conn_id=ID
  Associates the received ID with the client TCP connection.
  The clt_conn_id=ID pair is treated as a regular annotation but
  it persists across all transactions on the client connection
  rather than disappearing after the current request. A helper
  may update the client connection ID value during subsequent
  requests by returning a new ID value. To send the connection
  ID to the URL rewriter, use url_rewrite_extras:
  url_rewrite_extras clt_conn_id=%{clt_conn_id}note ...


   * can we call this client-tag= or connection-tag= or
 connection-label=. We have ID used internally via InstanceId
 template members, which are very different to this ID being
 user-assigned.


 The clt_conn_id is not a client tag, but it can be a connection-tag or
 connection-label.


client-tag does not work well indeed because the tag is specific to
the connection, not the client, for some popular definitions of a
client. For example, many differently authenticated clients may
share the same tagged connection (when that connection comes from a peer
that aggregates traffic from many end-users).

connection-tag does not work well because the proposed feature is
specific to the connection coming from an HTTP client. We may decide to
tag connections to servers later.


The internal instance IDs are not particularly relevant to this
discussion about an external interface name IMO, but yes, there are many
IDs used in Squid, for various needs. The connectionId_ name of the
new data member avoids confusion with the existing InstanceId members
IMO, but if the proposed option is renamed, we may need to rename the
data member as well. Please note that there is no client in the data
member name because the class manages a connection to a client so the
client scope should be assumed by default.


Amos, does clt_conn_tag, clt_conn_mark, clt_conn_label, or clt_conn_note
(or their longer, partially spelled out versions) sound better than
clt_conn_id to you?


HTH,

Alex.



Re: [RFC] connections logging

2014-06-23 Thread Alex Rousskov
On 06/20/2014 03:04 AM, Amos Jeffries wrote:
 I am playing with the idea of adding a new log file to record just the
 connections handled by Squid (rather than the HTTP request/reply
 transactions).
 
 This log would include connections opened but never used, port details,
 connection lifetimes, re-use counts on persistent connections, SSL/TLS
 presence/version (or not), and other connection details we find a need
 for later.


 The driver behind this is to help resolve a growing amount of user
 queries regarding happy eyeballs idle connections and broken TLS
 connections. We are also adding other potentially never-used connections
 ourselves with the standby pools.

A couple of days ago, I came across another use case for logging
connections unrelated to Happy Eyeballs. It sounds like this is going to
be a generally useful feature. My use case is still being clarified, but
here are the already known non-obvious requirements:

1. Logging when the connection is received (unlike transactions that are
logged when the transaction is completed).

2. Logging client-to-Squid connections.

3. Introduction of some kind of unique connection ID that can be
associated with HTTP/etc transactions on that connection, correlating
access.log and connection.log entries. [ uniqueness scope is being
clarified, but I hope it would just be a single worker lifetime ]


Let's coordinate our efforts!


 The Squid native access.log appears unsuitable for adjustment given its
 widely used format and the number of details to be logged.

Agreed. We should support a similarly configurable log format and other
existing logging knobs but use a different, dedicated log (or logs if we
want to record from-Squid connections and to-Squid connections
differently). Reusable logformat codes such as %la and %lp should be
reused, but new ones will probably be needed as well.


 There is also a useful side effect of MasterXaction. At present the
 TcpAcceptor generates a MasterXaction object to record initial
 connection details for the handler, this creates a bit of complexity in
 the recipient regarding whether the MasterXactino object applies to a
 future HTTP request or one which is currently being parsed. This could
 be simplified if the connection was a transaction and the HTTP requests
 on it each a different transaction spawned from the connection
 MasterXaction when parsing a new requst.

I am not sure what you mean by if the connection was a transaction.
The MasterXaction is usually associated with at least one connection
(via the small t transaction using that connection), and one
connection is usually associated with at least one MasterTransaction,
but transactions and connections are completely different concepts at
Squid level IMO. I doubt we should try to merge the concept of a
transaction and a connection somehow.

MasterTransaction is created at connection acceptance time because that
is when the first HTTP/etc. transaction starts (parsing headers is not
the first step) and, hence, that is when the master transaction starts.

We could make the connection acceptance event itself a transaction (an
object of some new AcceptXaction class inherited from a generic
Transaction class and associated with the corresponding Master
Transaction object just like HTTP and ICAP transactions should be). I do
not yet see what that would actually buy us except more classes to worry
about, but, at the very least, it would not be conceptually wrong.


HTH,

Alex.



[PATCH] Fixes to get more collapsed forwarding hits

2014-06-23 Thread Alex Rousskov
Hello,

The attached patch contains several changes to improve the
probability of getting a collapsed forwarding hit:

* Allow STORE_MEMORY_CLIENTs to open disk files if needed and possible.
STORE_*_CLIENT designation is rather buggy (several known XXXs). Some
collapsed clients are marked as STORE_MEMORY_CLIENTs (for the lack of
info at determination time) but their hit content may actually come from
a disk cache.

* Do not abandon writing a collapsed cache entry when we cannot cache
the entry in RAM if the entry can be cached on disk instead. Both shared
memory cache and the disk cache have to refuse to cache the entry for it
to become non-collapsible. This dual refusal is difficult to detect
because each cache may make the caching decision at different times.
Added StoreEntry methods to track those decisions and react to them.

* Recognize disk cache as a potential source of the collapsed entry when
the memory cache is configured. While collapsed entries would normally
be found in the shared memory cache, caching policies and other factors
may prohibit memory caching but still allow disk caching. Memory cache
is still preferred.

* Do not use unknown entry size in StoreEntry::checkTooSmall()
determination. The size of collapsed entries is often unknown, even when
they are STORE_OK (because swap_hdr_sz is unknown when the other worker
has created the cache entry).  The same code has been using this
ignore-unknowns logic for the Content-Length header value, so the
rejection of unknown entry size (added as a part of C++ conversion
without a dedicated message in r5766) could have been a typo.


If approved, I will commit the change for the last bullet separately
because it is easier to isolate and is less specific to collapsed
forwarding.


The patch does not attempt to cover all possible collapsing cases that
Squid currently misses, of course. More work remains in this area
(including a serious STORE_*_CLIENT redesign), but the proposed fixes
should make a dent.

These changes were developed for an earlier trunk version but the
trunk-ported patch posted here also passed simple lab tests.


Thank you,

Alex.
Support more collapsed forwarding hit cases:

Allow STORE_MEMORY_CLIENTs to open disk files if needed and possible.
STORE_*_CLIENT designation is rather buggy (several known XXXs). Some
collapsed clients are marked as STORE_MEMORY_CLIENTs (for the lack of info at
determination time) but their hit content may actually come from a disk cache.

Do not abandon writing a collapsed cache entry when we cannot cache the entry
in RAM if the entry can be cached on disk instead. Both shared memory cache
and the disk cache have to refuse to cache the entry for it to become
non-collapsible. This dual refusal is difficult to detect because each cache
may make the caching decision at different times. Added StoreEntry methods to
track those decisions and react to them.

Recognize disk cache as a potential source of the collapsed entry when the
memory cache is configured. While collapsed entries would normally be found in
the shared memory cache, caching policies and other factors may prohibit
memory caching but still allow disk caching. Memory cache is still preferred.


Do not use unknown entry size in StoreEntry::checkTooSmall() determination.
The size of collapsed entries is often unknown, even when they are STORE_OK
(because swap_hdr_sz is unknown when the other worker has created the cache
entry).  The same code has been using this ignore-unknowns logic for the
Content-Length header value, so the rejection of unknown entry size (added as
a part of C++ conversion without a dedicated message in r5766) could have been
a typo.

=== modified file 'src/MemStore.cc'
--- src/MemStore.cc	2014-02-21 16:14:05 +
+++ src/MemStore.cc	2014-06-23 23:34:53 +
@@ -465,40 +465,41 @@ MemStore::shouldCache(const StoreEntry 
 
 return true;
 }
 
 /// locks map anchor and preps to store the entry in shared memory
 bool
 MemStore::startCaching(StoreEntry e)
 {
 sfileno index = 0;
 Ipc::StoreMapAnchor *slot = map-openForWriting(reinterpret_castconst cache_key *(e.key), index);
 if (!slot) {
 debugs(20, 5, HERE  No room in mem-cache map to index   e);
 return false;
 }
 
 assert(e.mem_obj);
 e.mem_obj-memCache.index = index;
 e.mem_obj-memCache.io = MemObject::ioWriting;
 slot-set(e);
 map-startAppending(index);
+e.memOutDecision(true);
 return true;
 }
 
 /// copies all local data to shared memory
 void
 MemStore::copyToShm(StoreEntry e)
 {
 // prevents remote readers from getting ENTRY_FWD_HDR_WAIT entries and
 // not knowing when the wait is over
 if (EBIT_TEST(e.flags, ENTRY_FWD_HDR_WAIT)) {
 debugs(20, 5, postponing copying   e   for ENTRY_FWD_HDR_WAIT);
 return;
 }
 
 assert(map);
 assert(e.mem_obj);
 
 const int32_t index = e.mem_obj-memCache.index;
 assert(index = 0);
 Ipc::StoreMapAnchor anchor = 

Re: [RFC] connections logging

2014-06-23 Thread Kinkie
On Tue, Jun 24, 2014 at 2:26 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 06/20/2014 03:04 AM, Amos Jeffries wrote:
 I am playing with the idea of adding a new log file to record just the
 connections handled by Squid (rather than the HTTP request/reply
 transactions).

 This log would include connections opened but never used, port details,
 connection lifetimes, re-use counts on persistent connections, SSL/TLS
 presence/version (or not), and other connection details we find a need
 for later.


 The driver behind this is to help resolve a growing amount of user
 queries regarding happy eyeballs idle connections and broken TLS
 connections. We are also adding other potentially never-used connections
 ourselves with the standby pools.

 A couple of days ago, I came across another use case for logging
 connections unrelated to Happy Eyeballs. It sounds like this is going to
 be a generally useful feature. My use case is still being clarified, but
 here are the already known non-obvious requirements:

I have an use case, as well: timing logging.
It would be useful to have a chance to log the timing of certain key
moments in a request's processing path. Things like accept, end of
slow acl matching, end of dns resolution(s), connected to uplink, and
so on. This could help administrators identify congestion issues in
their infrastructure.

   Kinkie


Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID

2014-06-23 Thread Amos Jeffries
On 24/06/2014 8:19 a.m., Alex Rousskov wrote:
 On 06/23/2014 07:44 AM, Tsantilas Christos wrote:
 On 06/23/2014 09:50 AM, Amos Jeffries wrote:
 On 23/06/2014 5:43 a.m., Tsantilas Christos wrote:
   clt_conn_id=ID
  Associates the received ID with the client TCP connection.
  The clt_conn_id=ID pair is treated as a regular annotation but
  it persists across all transactions on the client connection
  rather than disappearing after the current request. A helper
  may update the client connection ID value during subsequent
  requests by returning a new ID value. To send the connection
  ID to the URL rewriter, use url_rewrite_extras:
  url_rewrite_extras clt_conn_id=%{clt_conn_id}note ...
 
 
   * can we call this client-tag= or connection-tag= or
 connection-label=. We have ID used internally via InstanceId
 template members, which are very different to this ID being
 user-assigned.
 
 
 The clt_conn_id is not a client tag, but it can be a connection-tag or
 connection-label.
 
 
 client-tag does not work well indeed because the tag is specific to
 the connection, not the client, for some popular definitions of a
 client. For example, many differently authenticated clients may
 share the same tagged connection (when that connection comes from a peer
 that aggregates traffic from many end-users).


My understanding was that a major use-case for this feature was

for faking various forms of
connection-based authentication when standard HTTP authentication
cannot be used.


Allowing client A to login client B, C, ...Z etc only makes sense if
the client and user are detached concepts at this level. Client
being strictly the client-server model type of client for whom any
number of users may exist, or proxying for other further away clients
in a multi-hop setup.

Either way by using this tag the helper is assuming that it is relevant
to all future requests on the whole connection. ie that there is either
one client, or one user which the note represents.


 
 connection-tag does not work well because the proposed feature is
 specific to the connection coming from an HTTP client. We may decide to
 tag connections to servers later.
 
 
 The internal instance IDs are not particularly relevant to this
 discussion about an external interface name IMO, but yes, there are many
 IDs used in Squid, for various needs. The connectionId_ name of the
 new data member avoids confusion with the existing InstanceId members
 IMO, but if the proposed option is renamed, we may need to rename the
 data member as well. Please note that there is no client in the data
 member name because the class manages a connection to a client so the
 client scope should be assumed by default.
 
 
 Amos, does clt_conn_tag, clt_conn_mark, clt_conn_label, or clt_conn_note
 (or their longer, partially spelled out versions) sound better than
 clt_conn_id to you?

Yes. Any of them are better. tag has existing history from
exetrnal_acl_type interface that may want to be considered.

Amos



Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID

2014-06-23 Thread Amos Jeffries
On 24/06/2014 1:44 a.m., Tsantilas Christos wrote:
 On 06/23/2014 09:50 AM, Amos Jeffries wrote:
 On 23/06/2014 5:43 a.m., Tsantilas Christos wrote:
 Hi all,

 The attached patch replaces existin annotation values with the new one
 received from helpers.

 Just one question. We are documenting key-value pairs in cf.data.pre
 only for url-rewriter helpers, but annotations works for all squid
 helpers.
 Should we move the related url-rewriter section to a global section? If
 yes where?

 For example something like the following in a global section should be
 enough:

 The interface for all helpers has been extended to support arbitrary
 lists of key=value pairs, with the syntax  key=value . Some keys have
 special meaning to Squid, as documented here.

 Currently Squid understands the following optional key=value pairs
 received from URL rewriters:
   clt_conn_id=ID
  Associates the received ID with the client TCP connection.
  The clt_conn_id=ID pair is treated as a regular annotation but
  it persists across all transactions on the client connection
  rather than disappearing after the current request. A helper
  may update the client connection ID value during subsequent
  requests by returning a new ID value. To send the connection
  ID to the URL rewriter, use url_rewrite_extras:
  url_rewrite_extras clt_conn_id=%{clt_conn_id}note ...


 Design issue:
   * can we call this client-tag= or connection-tag= or
 connection-label=. We have ID used internally via InstanceId
 template members, which are very different to this ID being
 user-assigned.
 
 The clt_conn_id is not a client tag, but it can be a connection-tag or
 connection-label.
 Lets wait Alex to answer on this.
 

See my answer to Alex. By using this note the helper is explicitly
defining that connection == 1 client.




 in src/Notes.cc:
 * replaceOrAdd iterates over the notes list twice in the event of
 removals (due to remove() iterating over it from scratch). Please fix
 this to iterate over the list only once and do an in-place replace if
 the key is found already there, add if end() is reached.

 * remove() appears to be unnecessary after the above correction.
 
 I can fix a litle replaceOrAdd, but the remove() method still is needed.
 

By what code?  I see it only being used by replaceOrAdd(), and only to
perform the unnecessary loop. It is fine as a remove() implementation,
but to me looks needless at this time.


 The replaceOrAdd() method method will be:
 void
 NotePairs::replaceOrAdd(const NotePairs *src)
 {
  for (std::vectorNotePairs::Entry *::const_iterator  i =
 src-entries.begin(); i != src-entries.end(); ++i) {
 remove((*i)-name.termedBuf());
 entries.push_back(new NotePairs::Entry((*i)-name.termedBuf(),
 (*i)-value.termedBuf()));
 }
 }
 


 in Notes.*:
 * findFirst() no longer makes sense. There being no second now.
   - Please rename to find().
 
 This is not true. We can still support multiple values per key name.
 I am suggesting to not touch this one.
 

Every helper callback is now using Update*() which uses replaceOrAddd()
to ensure uniqueness.

I see that the list output directly from the helper may contain
duplicates and might be used by the handler instead of the HttpRequest
merged list. We may want to change that to make handler behaviour match
behaviour of the ACL testing the HttpRequest list.




 in src/client_side_request.cc:
 * ClientRequestContext::clientRedirectDone was previously calling
 SyncNotes to ensure the AccessLogEntry and HttpRequest shared a notes
 list before updating the request. Order is important here because Sync
 will kill Squid with an assert if the objects have different notes list
 objects.
 Until that is fixed please retain the order of:
   - sync first, then update the request.

 * same in ClientRequestContext::clientStoreIdDone
 
 It is the same, it is not make difference.
 UpdateRequestNotes will just copy helper notes to the request notes. The
 SyncNotes will check if AccessLogEntry and HttpRequest object points to
 the same NotePairs object.

If you are going to retain it the way it is please add a note to the
AccessLogEntry::notes member indicating that it must NEVER be set other
than by calling SyncNotes() with the current transactions HttpRequest
object. Doing so will guarantee crashes after this patch goes in.

The burden will also be placed on auditors to check this with every
notes related patch. (I would rather avoid that extra work TBH).

Amos


Re: [PATCH] Fixes to get more collapsed forwarding hits

2014-06-23 Thread Amos Jeffries
On 24/06/2014 4:07 p.m., Alex Rousskov wrote:
 Hello,
 
 The attached patch contains several changes to improve the
 probability of getting a collapsed forwarding hit:
 
 * Allow STORE_MEMORY_CLIENTs to open disk files if needed and possible.
 STORE_*_CLIENT designation is rather buggy (several known XXXs). Some
 collapsed clients are marked as STORE_MEMORY_CLIENTs (for the lack of
 info at determination time) but their hit content may actually come from
 a disk cache.
 
 * Do not abandon writing a collapsed cache entry when we cannot cache
 the entry in RAM if the entry can be cached on disk instead. Both shared
 memory cache and the disk cache have to refuse to cache the entry for it
 to become non-collapsible. This dual refusal is difficult to detect
 because each cache may make the caching decision at different times.
 Added StoreEntry methods to track those decisions and react to them.

Hmm. FTR: IMHO the only reasons a response should be non-collapsible is
if it is a) private, b) authenticated to another user, c) explicitly
known non-cacheable, d) a non-matching variant of the URL, or e) we
already discarded some body bytes.

The factor of memory/disk caches not having provided cacheable size
allow/reject result yet is no reason to be non-collapsible. It is a
reason to wait on that result to determine the cacheability condition (c).

 
 * Recognize disk cache as a potential source of the collapsed entry when
 the memory cache is configured. While collapsed entries would normally
 be found in the shared memory cache, caching policies and other factors
 may prohibit memory caching but still allow disk caching. Memory cache
 is still preferred.
 
 * Do not use unknown entry size in StoreEntry::checkTooSmall()
 determination. The size of collapsed entries is often unknown, even when
 they are STORE_OK (because swap_hdr_sz is unknown when the other worker
 has created the cache entry).  The same code has been using this
 ignore-unknowns logic for the Content-Length header value, so the
 rejection of unknown entry size (added as a part of C++ conversion
 without a dedicated message in r5766) could have been a typo.
 
 
 If approved, I will commit the change for the last bullet separately
 because it is easier to isolate and is less specific to collapsed
 forwarding.
 
 
 The patch does not attempt to cover all possible collapsing cases that
 Squid currently misses, of course. More work remains in this area
 (including a serious STORE_*_CLIENT redesign), but the proposed fixes
 should make a dent.
 
 These changes were developed for an earlier trunk version but the
 trunk-ported patch posted here also passed simple lab tests.
 
 
 Thank you,
 
 Alex.
 

+1. Although I do not know enough about store yet to do much in-depth
audit of the logic changes.

Amos