Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID
On 23/06/2014 5:43 a.m., Tsantilas Christos wrote: Hi all, The attached patch replaces existin annotation values with the new one received from helpers. Just one question. We are documenting key-value pairs in cf.data.pre only for url-rewriter helpers, but annotations works for all squid helpers. Should we move the related url-rewriter section to a global section? If yes where? For example something like the following in a global section should be enough: The interface for all helpers has been extended to support arbitrary lists of key=value pairs, with the syntax key=value . Some keys have special meaning to Squid, as documented here. Currently Squid understands the following optional key=value pairs received from URL rewriters: clt_conn_id=ID Associates the received ID with the client TCP connection. The clt_conn_id=ID pair is treated as a regular annotation but it persists across all transactions on the client connection rather than disappearing after the current request. A helper may update the client connection ID value during subsequent requests by returning a new ID value. To send the connection ID to the URL rewriter, use url_rewrite_extras: url_rewrite_extras clt_conn_id=%{clt_conn_id}note ... Design issue: * can we call this client-tag= or connection-tag= or connection-label=. We have ID used internally via InstanceId template members, which are very different to this ID being user-assigned. in src/Notes.cc: * replaceOrAdd iterates over the notes list twice in the event of removals (due to remove() iterating over it from scratch). Please fix this to iterate over the list only once and do an in-place replace if the key is found already there, add if end() is reached. * remove() appears to be unnecessary after the above correction. in Notes.*: * findFirst() no longer makes sense. There being no second now. - Please rename to find(). in src/cf.data.pre: * kv-pairs is a format syntax definition used throughout the documentation. Please retain it. If the term needs re-defining that is an issue outside this patch scope. * Please reword Squid understands the following optional key=value pairs received from URL rewriters:. Since there are optional kv-pairs already documented in the rewrite vs redirect vs no-change syntax we need to specify that these are optional extensions. Perhapse this: In addition to the above kv-pairs Squid also understands the following optional kv-pairs received from URL rewriters: * Please update the kv-pair description like so (where FOO is the term tag or label selected to replace ID by the above mentioned design issue) Associates a FOO with the client TCP connection. The FOO is treated as a regular annotation but persists across future requests on the client connection rather than just the current request. A helper may update the FOO during subsequent requests by returning a new kv-pair. * Reference to url_rewrite_extras should be directy after the helper input syntax line. Like so: [channel-ID SP] URL [SP extras]NL see url_rewrite_extras on how to send extras with optional values to the helper. in src/client_side_request.cc: * ClientRequestContext::clientRedirectDone was previously calling SyncNotes to ensure the AccessLogEntry and HttpRequest shared a notes list before updating the request. Order is important here because Sync will kill Squid with an assert if the objects have different notes list objects. Until that is fixed please retain the order of: - sync first, then update the request. * same in ClientRequestContext::clientStoreIdDone * in ClientHttpRequest::doCallouts() please use SBuf::isEmpty() instead of length() to check for existence. Amos
Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID
On 06/23/2014 09:50 AM, Amos Jeffries wrote: On 23/06/2014 5:43 a.m., Tsantilas Christos wrote: Hi all, The attached patch replaces existin annotation values with the new one received from helpers. Just one question. We are documenting key-value pairs in cf.data.pre only for url-rewriter helpers, but annotations works for all squid helpers. Should we move the related url-rewriter section to a global section? If yes where? For example something like the following in a global section should be enough: The interface for all helpers has been extended to support arbitrary lists of key=value pairs, with the syntax key=value . Some keys have special meaning to Squid, as documented here. Currently Squid understands the following optional key=value pairs received from URL rewriters: clt_conn_id=ID Associates the received ID with the client TCP connection. The clt_conn_id=ID pair is treated as a regular annotation but it persists across all transactions on the client connection rather than disappearing after the current request. A helper may update the client connection ID value during subsequent requests by returning a new ID value. To send the connection ID to the URL rewriter, use url_rewrite_extras: url_rewrite_extras clt_conn_id=%{clt_conn_id}note ... Design issue: * can we call this client-tag= or connection-tag= or connection-label=. We have ID used internally via InstanceId template members, which are very different to this ID being user-assigned. The clt_conn_id is not a client tag, but it can be a connection-tag or connection-label. Lets wait Alex to answer on this. in src/Notes.cc: * replaceOrAdd iterates over the notes list twice in the event of removals (due to remove() iterating over it from scratch). Please fix this to iterate over the list only once and do an in-place replace if the key is found already there, add if end() is reached. * remove() appears to be unnecessary after the above correction. I can fix a litle replaceOrAdd, but the remove() method still is needed. The replaceOrAdd() method method will be: void NotePairs::replaceOrAdd(const NotePairs *src) { for (std::vectorNotePairs::Entry *::const_iterator i = src-entries.begin(); i != src-entries.end(); ++i) { remove((*i)-name.termedBuf()); entries.push_back(new NotePairs::Entry((*i)-name.termedBuf(), (*i)-value.termedBuf())); } } in Notes.*: * findFirst() no longer makes sense. There being no second now. - Please rename to find(). This is not true. We can still support multiple values per key name. I am suggesting to not touch this one. in src/cf.data.pre: * kv-pairs is a format syntax definition used throughout the documentation. Please retain it. If the term needs re-defining that is an issue outside this patch scope. * Please reword Squid understands the following optional key=value pairs received from URL rewriters:. Since there are optional kv-pairs already documented in the rewrite vs redirect vs no-change syntax we need to specify that these are optional extensions. Perhapse this: In addition to the above kv-pairs Squid also understands the following optional kv-pairs received from URL rewriters: * Please update the kv-pair description like so (where FOO is the term tag or label selected to replace ID by the above mentioned design issue) Associates a FOO with the client TCP connection. The FOO is treated as a regular annotation but persists across future requests on the client connection rather than just the current request. A helper may update the FOO during subsequent requests by returning a new kv-pair. * Reference to url_rewrite_extras should be directy after the helper input syntax line. Like so: [channel-ID SP] URL [SP extras]NL see url_rewrite_extras on how to send extras with optional values to the helper. Ok for these changes in src/client_side_request.cc: * ClientRequestContext::clientRedirectDone was previously calling SyncNotes to ensure the AccessLogEntry and HttpRequest shared a notes list before updating the request. Order is important here because Sync will kill Squid with an assert if the objects have different notes list objects. Until that is fixed please retain the order of: - sync first, then update the request. * same in ClientRequestContext::clientStoreIdDone It is the same, it is not make difference. UpdateRequestNotes will just copy helper notes to the request notes. The SyncNotes will check if AccessLogEntry and HttpRequest object points to the same NotePairs object. * in ClientHttpRequest::doCallouts() please use SBuf::isEmpty() instead of length() to check for existence. OK. Lets wait others comments before I post a new patch. Amos
Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID
On 06/23/2014 07:44 AM, Tsantilas Christos wrote: On 06/23/2014 09:50 AM, Amos Jeffries wrote: On 23/06/2014 5:43 a.m., Tsantilas Christos wrote: clt_conn_id=ID Associates the received ID with the client TCP connection. The clt_conn_id=ID pair is treated as a regular annotation but it persists across all transactions on the client connection rather than disappearing after the current request. A helper may update the client connection ID value during subsequent requests by returning a new ID value. To send the connection ID to the URL rewriter, use url_rewrite_extras: url_rewrite_extras clt_conn_id=%{clt_conn_id}note ... * can we call this client-tag= or connection-tag= or connection-label=. We have ID used internally via InstanceId template members, which are very different to this ID being user-assigned. The clt_conn_id is not a client tag, but it can be a connection-tag or connection-label. client-tag does not work well indeed because the tag is specific to the connection, not the client, for some popular definitions of a client. For example, many differently authenticated clients may share the same tagged connection (when that connection comes from a peer that aggregates traffic from many end-users). connection-tag does not work well because the proposed feature is specific to the connection coming from an HTTP client. We may decide to tag connections to servers later. The internal instance IDs are not particularly relevant to this discussion about an external interface name IMO, but yes, there are many IDs used in Squid, for various needs. The connectionId_ name of the new data member avoids confusion with the existing InstanceId members IMO, but if the proposed option is renamed, we may need to rename the data member as well. Please note that there is no client in the data member name because the class manages a connection to a client so the client scope should be assumed by default. Amos, does clt_conn_tag, clt_conn_mark, clt_conn_label, or clt_conn_note (or their longer, partially spelled out versions) sound better than clt_conn_id to you? HTH, Alex.
Re: [RFC] connections logging
On 06/20/2014 03:04 AM, Amos Jeffries wrote: I am playing with the idea of adding a new log file to record just the connections handled by Squid (rather than the HTTP request/reply transactions). This log would include connections opened but never used, port details, connection lifetimes, re-use counts on persistent connections, SSL/TLS presence/version (or not), and other connection details we find a need for later. The driver behind this is to help resolve a growing amount of user queries regarding happy eyeballs idle connections and broken TLS connections. We are also adding other potentially never-used connections ourselves with the standby pools. A couple of days ago, I came across another use case for logging connections unrelated to Happy Eyeballs. It sounds like this is going to be a generally useful feature. My use case is still being clarified, but here are the already known non-obvious requirements: 1. Logging when the connection is received (unlike transactions that are logged when the transaction is completed). 2. Logging client-to-Squid connections. 3. Introduction of some kind of unique connection ID that can be associated with HTTP/etc transactions on that connection, correlating access.log and connection.log entries. [ uniqueness scope is being clarified, but I hope it would just be a single worker lifetime ] Let's coordinate our efforts! The Squid native access.log appears unsuitable for adjustment given its widely used format and the number of details to be logged. Agreed. We should support a similarly configurable log format and other existing logging knobs but use a different, dedicated log (or logs if we want to record from-Squid connections and to-Squid connections differently). Reusable logformat codes such as %la and %lp should be reused, but new ones will probably be needed as well. There is also a useful side effect of MasterXaction. At present the TcpAcceptor generates a MasterXaction object to record initial connection details for the handler, this creates a bit of complexity in the recipient regarding whether the MasterXactino object applies to a future HTTP request or one which is currently being parsed. This could be simplified if the connection was a transaction and the HTTP requests on it each a different transaction spawned from the connection MasterXaction when parsing a new requst. I am not sure what you mean by if the connection was a transaction. The MasterXaction is usually associated with at least one connection (via the small t transaction using that connection), and one connection is usually associated with at least one MasterTransaction, but transactions and connections are completely different concepts at Squid level IMO. I doubt we should try to merge the concept of a transaction and a connection somehow. MasterTransaction is created at connection acceptance time because that is when the first HTTP/etc. transaction starts (parsing headers is not the first step) and, hence, that is when the master transaction starts. We could make the connection acceptance event itself a transaction (an object of some new AcceptXaction class inherited from a generic Transaction class and associated with the corresponding Master Transaction object just like HTTP and ICAP transactions should be). I do not yet see what that would actually buy us except more classes to worry about, but, at the very least, it would not be conceptually wrong. HTH, Alex.
[PATCH] Fixes to get more collapsed forwarding hits
Hello, The attached patch contains several changes to improve the probability of getting a collapsed forwarding hit: * Allow STORE_MEMORY_CLIENTs to open disk files if needed and possible. STORE_*_CLIENT designation is rather buggy (several known XXXs). Some collapsed clients are marked as STORE_MEMORY_CLIENTs (for the lack of info at determination time) but their hit content may actually come from a disk cache. * Do not abandon writing a collapsed cache entry when we cannot cache the entry in RAM if the entry can be cached on disk instead. Both shared memory cache and the disk cache have to refuse to cache the entry for it to become non-collapsible. This dual refusal is difficult to detect because each cache may make the caching decision at different times. Added StoreEntry methods to track those decisions and react to them. * Recognize disk cache as a potential source of the collapsed entry when the memory cache is configured. While collapsed entries would normally be found in the shared memory cache, caching policies and other factors may prohibit memory caching but still allow disk caching. Memory cache is still preferred. * Do not use unknown entry size in StoreEntry::checkTooSmall() determination. The size of collapsed entries is often unknown, even when they are STORE_OK (because swap_hdr_sz is unknown when the other worker has created the cache entry). The same code has been using this ignore-unknowns logic for the Content-Length header value, so the rejection of unknown entry size (added as a part of C++ conversion without a dedicated message in r5766) could have been a typo. If approved, I will commit the change for the last bullet separately because it is easier to isolate and is less specific to collapsed forwarding. The patch does not attempt to cover all possible collapsing cases that Squid currently misses, of course. More work remains in this area (including a serious STORE_*_CLIENT redesign), but the proposed fixes should make a dent. These changes were developed for an earlier trunk version but the trunk-ported patch posted here also passed simple lab tests. Thank you, Alex. Support more collapsed forwarding hit cases: Allow STORE_MEMORY_CLIENTs to open disk files if needed and possible. STORE_*_CLIENT designation is rather buggy (several known XXXs). Some collapsed clients are marked as STORE_MEMORY_CLIENTs (for the lack of info at determination time) but their hit content may actually come from a disk cache. Do not abandon writing a collapsed cache entry when we cannot cache the entry in RAM if the entry can be cached on disk instead. Both shared memory cache and the disk cache have to refuse to cache the entry for it to become non-collapsible. This dual refusal is difficult to detect because each cache may make the caching decision at different times. Added StoreEntry methods to track those decisions and react to them. Recognize disk cache as a potential source of the collapsed entry when the memory cache is configured. While collapsed entries would normally be found in the shared memory cache, caching policies and other factors may prohibit memory caching but still allow disk caching. Memory cache is still preferred. Do not use unknown entry size in StoreEntry::checkTooSmall() determination. The size of collapsed entries is often unknown, even when they are STORE_OK (because swap_hdr_sz is unknown when the other worker has created the cache entry). The same code has been using this ignore-unknowns logic for the Content-Length header value, so the rejection of unknown entry size (added as a part of C++ conversion without a dedicated message in r5766) could have been a typo. === modified file 'src/MemStore.cc' --- src/MemStore.cc 2014-02-21 16:14:05 + +++ src/MemStore.cc 2014-06-23 23:34:53 + @@ -465,40 +465,41 @@ MemStore::shouldCache(const StoreEntry return true; } /// locks map anchor and preps to store the entry in shared memory bool MemStore::startCaching(StoreEntry e) { sfileno index = 0; Ipc::StoreMapAnchor *slot = map-openForWriting(reinterpret_castconst cache_key *(e.key), index); if (!slot) { debugs(20, 5, HERE No room in mem-cache map to index e); return false; } assert(e.mem_obj); e.mem_obj-memCache.index = index; e.mem_obj-memCache.io = MemObject::ioWriting; slot-set(e); map-startAppending(index); +e.memOutDecision(true); return true; } /// copies all local data to shared memory void MemStore::copyToShm(StoreEntry e) { // prevents remote readers from getting ENTRY_FWD_HDR_WAIT entries and // not knowing when the wait is over if (EBIT_TEST(e.flags, ENTRY_FWD_HDR_WAIT)) { debugs(20, 5, postponing copying e for ENTRY_FWD_HDR_WAIT); return; } assert(map); assert(e.mem_obj); const int32_t index = e.mem_obj-memCache.index; assert(index = 0); Ipc::StoreMapAnchor anchor =
Re: [RFC] connections logging
On Tue, Jun 24, 2014 at 2:26 AM, Alex Rousskov rouss...@measurement-factory.com wrote: On 06/20/2014 03:04 AM, Amos Jeffries wrote: I am playing with the idea of adding a new log file to record just the connections handled by Squid (rather than the HTTP request/reply transactions). This log would include connections opened but never used, port details, connection lifetimes, re-use counts on persistent connections, SSL/TLS presence/version (or not), and other connection details we find a need for later. The driver behind this is to help resolve a growing amount of user queries regarding happy eyeballs idle connections and broken TLS connections. We are also adding other potentially never-used connections ourselves with the standby pools. A couple of days ago, I came across another use case for logging connections unrelated to Happy Eyeballs. It sounds like this is going to be a generally useful feature. My use case is still being clarified, but here are the already known non-obvious requirements: I have an use case, as well: timing logging. It would be useful to have a chance to log the timing of certain key moments in a request's processing path. Things like accept, end of slow acl matching, end of dns resolution(s), connected to uplink, and so on. This could help administrators identify congestion issues in their infrastructure. Kinkie
Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID
On 24/06/2014 8:19 a.m., Alex Rousskov wrote: On 06/23/2014 07:44 AM, Tsantilas Christos wrote: On 06/23/2014 09:50 AM, Amos Jeffries wrote: On 23/06/2014 5:43 a.m., Tsantilas Christos wrote: clt_conn_id=ID Associates the received ID with the client TCP connection. The clt_conn_id=ID pair is treated as a regular annotation but it persists across all transactions on the client connection rather than disappearing after the current request. A helper may update the client connection ID value during subsequent requests by returning a new ID value. To send the connection ID to the URL rewriter, use url_rewrite_extras: url_rewrite_extras clt_conn_id=%{clt_conn_id}note ... * can we call this client-tag= or connection-tag= or connection-label=. We have ID used internally via InstanceId template members, which are very different to this ID being user-assigned. The clt_conn_id is not a client tag, but it can be a connection-tag or connection-label. client-tag does not work well indeed because the tag is specific to the connection, not the client, for some popular definitions of a client. For example, many differently authenticated clients may share the same tagged connection (when that connection comes from a peer that aggregates traffic from many end-users). My understanding was that a major use-case for this feature was for faking various forms of connection-based authentication when standard HTTP authentication cannot be used. Allowing client A to login client B, C, ...Z etc only makes sense if the client and user are detached concepts at this level. Client being strictly the client-server model type of client for whom any number of users may exist, or proxying for other further away clients in a multi-hop setup. Either way by using this tag the helper is assuming that it is relevant to all future requests on the whole connection. ie that there is either one client, or one user which the note represents. connection-tag does not work well because the proposed feature is specific to the connection coming from an HTTP client. We may decide to tag connections to servers later. The internal instance IDs are not particularly relevant to this discussion about an external interface name IMO, but yes, there are many IDs used in Squid, for various needs. The connectionId_ name of the new data member avoids confusion with the existing InstanceId members IMO, but if the proposed option is renamed, we may need to rename the data member as well. Please note that there is no client in the data member name because the class manages a connection to a client so the client scope should be assumed by default. Amos, does clt_conn_tag, clt_conn_mark, clt_conn_label, or clt_conn_note (or their longer, partially spelled out versions) sound better than clt_conn_id to you? Yes. Any of them are better. tag has existing history from exetrnal_acl_type interface that may want to be considered. Amos
Re: [PATCH] Support client connection annotation by helpers via clt_conn_id=ID
On 24/06/2014 1:44 a.m., Tsantilas Christos wrote: On 06/23/2014 09:50 AM, Amos Jeffries wrote: On 23/06/2014 5:43 a.m., Tsantilas Christos wrote: Hi all, The attached patch replaces existin annotation values with the new one received from helpers. Just one question. We are documenting key-value pairs in cf.data.pre only for url-rewriter helpers, but annotations works for all squid helpers. Should we move the related url-rewriter section to a global section? If yes where? For example something like the following in a global section should be enough: The interface for all helpers has been extended to support arbitrary lists of key=value pairs, with the syntax key=value . Some keys have special meaning to Squid, as documented here. Currently Squid understands the following optional key=value pairs received from URL rewriters: clt_conn_id=ID Associates the received ID with the client TCP connection. The clt_conn_id=ID pair is treated as a regular annotation but it persists across all transactions on the client connection rather than disappearing after the current request. A helper may update the client connection ID value during subsequent requests by returning a new ID value. To send the connection ID to the URL rewriter, use url_rewrite_extras: url_rewrite_extras clt_conn_id=%{clt_conn_id}note ... Design issue: * can we call this client-tag= or connection-tag= or connection-label=. We have ID used internally via InstanceId template members, which are very different to this ID being user-assigned. The clt_conn_id is not a client tag, but it can be a connection-tag or connection-label. Lets wait Alex to answer on this. See my answer to Alex. By using this note the helper is explicitly defining that connection == 1 client. in src/Notes.cc: * replaceOrAdd iterates over the notes list twice in the event of removals (due to remove() iterating over it from scratch). Please fix this to iterate over the list only once and do an in-place replace if the key is found already there, add if end() is reached. * remove() appears to be unnecessary after the above correction. I can fix a litle replaceOrAdd, but the remove() method still is needed. By what code? I see it only being used by replaceOrAdd(), and only to perform the unnecessary loop. It is fine as a remove() implementation, but to me looks needless at this time. The replaceOrAdd() method method will be: void NotePairs::replaceOrAdd(const NotePairs *src) { for (std::vectorNotePairs::Entry *::const_iterator i = src-entries.begin(); i != src-entries.end(); ++i) { remove((*i)-name.termedBuf()); entries.push_back(new NotePairs::Entry((*i)-name.termedBuf(), (*i)-value.termedBuf())); } } in Notes.*: * findFirst() no longer makes sense. There being no second now. - Please rename to find(). This is not true. We can still support multiple values per key name. I am suggesting to not touch this one. Every helper callback is now using Update*() which uses replaceOrAddd() to ensure uniqueness. I see that the list output directly from the helper may contain duplicates and might be used by the handler instead of the HttpRequest merged list. We may want to change that to make handler behaviour match behaviour of the ACL testing the HttpRequest list. in src/client_side_request.cc: * ClientRequestContext::clientRedirectDone was previously calling SyncNotes to ensure the AccessLogEntry and HttpRequest shared a notes list before updating the request. Order is important here because Sync will kill Squid with an assert if the objects have different notes list objects. Until that is fixed please retain the order of: - sync first, then update the request. * same in ClientRequestContext::clientStoreIdDone It is the same, it is not make difference. UpdateRequestNotes will just copy helper notes to the request notes. The SyncNotes will check if AccessLogEntry and HttpRequest object points to the same NotePairs object. If you are going to retain it the way it is please add a note to the AccessLogEntry::notes member indicating that it must NEVER be set other than by calling SyncNotes() with the current transactions HttpRequest object. Doing so will guarantee crashes after this patch goes in. The burden will also be placed on auditors to check this with every notes related patch. (I would rather avoid that extra work TBH). Amos
Re: [PATCH] Fixes to get more collapsed forwarding hits
On 24/06/2014 4:07 p.m., Alex Rousskov wrote: Hello, The attached patch contains several changes to improve the probability of getting a collapsed forwarding hit: * Allow STORE_MEMORY_CLIENTs to open disk files if needed and possible. STORE_*_CLIENT designation is rather buggy (several known XXXs). Some collapsed clients are marked as STORE_MEMORY_CLIENTs (for the lack of info at determination time) but their hit content may actually come from a disk cache. * Do not abandon writing a collapsed cache entry when we cannot cache the entry in RAM if the entry can be cached on disk instead. Both shared memory cache and the disk cache have to refuse to cache the entry for it to become non-collapsible. This dual refusal is difficult to detect because each cache may make the caching decision at different times. Added StoreEntry methods to track those decisions and react to them. Hmm. FTR: IMHO the only reasons a response should be non-collapsible is if it is a) private, b) authenticated to another user, c) explicitly known non-cacheable, d) a non-matching variant of the URL, or e) we already discarded some body bytes. The factor of memory/disk caches not having provided cacheable size allow/reject result yet is no reason to be non-collapsible. It is a reason to wait on that result to determine the cacheability condition (c). * Recognize disk cache as a potential source of the collapsed entry when the memory cache is configured. While collapsed entries would normally be found in the shared memory cache, caching policies and other factors may prohibit memory caching but still allow disk caching. Memory cache is still preferred. * Do not use unknown entry size in StoreEntry::checkTooSmall() determination. The size of collapsed entries is often unknown, even when they are STORE_OK (because swap_hdr_sz is unknown when the other worker has created the cache entry). The same code has been using this ignore-unknowns logic for the Content-Length header value, so the rejection of unknown entry size (added as a part of C++ conversion without a dedicated message in r5766) could have been a typo. If approved, I will commit the change for the last bullet separately because it is easier to isolate and is less specific to collapsed forwarding. The patch does not attempt to cover all possible collapsing cases that Squid currently misses, of course. More work remains in this area (including a serious STORE_*_CLIENT redesign), but the proposed fixes should make a dent. These changes were developed for an earlier trunk version but the trunk-ported patch posted here also passed simple lab tests. Thank you, Alex. +1. Although I do not know enough about store yet to do much in-depth audit of the logic changes. Amos