2016-08-08 23:17 GMT+03:00 Amos Jeffries <squ...@treenet.co.nz>:
> s/temporary exceed/temporarily exceed/ Done. > please remove the above/below wording. Removed. > the lines documenting 'die' and 'err' action look a bit squashed up. Adjusted accordingly. > looks wrong documentation for the overloaded() method Removed that comment at all, since the method looks simple and self-documented. > please revert to sending helper::Unknown code Reverted. Note that for helper::Unknown some callers print a bit misleading error message, when requests are dropped, e.g.: "ERROR: storeID helper returned invalid result code. Wrong helper?" > Also, can we please add a "wait" action as well. Which means just keep > queueing things and ignore the overload. I don't think we need this extra option. Currently we have a hard-coded 3 min. timeout, within which requests are keep queuing ignoring [possible] overload. Probably you need another option to configure this timeout? That would be a separate task, IMO. Eduard.
Make Squid death due to overloaded helpers optional. Added on-persistent-overload=action option to helpers. Helper overload is defined as running with an overflowing queue. Persistent helper overload is [still] defined as being overloaded for more than 3 minutes. The default behavior is unchanged(*) -- Squid worker dies with a fatal error at the attempt to submit a new request to a persistenly overloaded helper. This default behavior can also be configured explicitly using on-persistent-overload=die. With on-persistent-overload=err, when dealing with a persistently overloaded helper, Squid immediately drops the helper request and sends an empty response to the caller which should handle that empty response as an error. Squid informs the admin when it starts and when it stops dropping helper requests due to persistent overload. The code had conflicting notions of an "overloaded helper". The external ACL helper, the URL rewriter, and the store ID code used queueFull() to test whether the new request would overflow the queue (and, hence, overload the helper), but queueFull() itself did not check whether the queue was full! It checked whether the queue was already overflowing. This confusion resulted in that code scheduling one extra helper request before enabling bypass. The code and its documentation are now more consistent (and better match the "overload" terminology used by the new configuration option, which also feels better than calling the helper "full"). (*) Resolving the above confusion resulted in minor (one request) differences in the number of helper requests queued by Squid for external ACL, URL rewriting, and store ID helpers, with the adjusted behavior [better] matching the documentation. === modified file 'src/cf.data.pre' --- src/cf.data.pre 2016-08-01 11:11:47 +0000 +++ src/cf.data.pre 2016-08-09 11:01:15 +0000 @@ -539,89 +539,107 @@ can be used. In practice, a %macro expands as a dash (-) if the helper request is sent before the required macro information is available to Squid. By default, Squid uses request formats provided in scheme-specific examples below (search for %credentials). The expanded key_extras value is added to the Squid credentials cache and, hence, will affect authentication. It can be used to autenticate different users with identical user names (e.g., when user authentication depends on http_port). Avoid adding frequently changing information to key_extras. For example, if you add user source IP, and it changes frequently in your environment, then max_user_ip ACL is going to treat every user+IP combination as a unique "user", breaking the ACL and wasting a lot of memory on those user records. It will also force users to authenticate from scratch whenever their IP changes. "realm" string Specifies the protection scope (aka realm name) which is to be reported to the client for the authentication scheme. It is commonly part of the text the user will see when prompted for their username and password. For Basic the default is "Squid proxy-caching web server". For Digest there is no default, this parameter is mandatory. For NTLM and Negotiate this parameter is ignored. - "children" numberofchildren [startup=N] [idle=N] [concurrency=N] [queue-size=N] + "children" numberofchildren [startup=N] [idle=N] [concurrency=N] + [queue-size=N] [on-persistent-overload=action] The maximum number of authenticator processes to spawn. If you start too few Squid will have to wait for them to process a backlog of credential verifications, slowing it down. When password verifications are done via a (slow) network you are likely to need lots of authenticator processes. The startup= and idle= options permit some skew in the exact amount run. A minimum of startup=N will begin during startup and reconfigure. Squid will start more in groups of up to idle=N in an attempt to meet traffic needs and to keep idle=N free above those traffic needs up to the maximum. The concurrency= option sets the number of concurrent requests the helper can process. The default of 0 is used for helpers who only supports one request at a time. Setting this to a number greater than 0 changes the protocol used to include a channel ID field first on the request/response line, allowing multiple requests to be sent to the same helper in parallel without waiting for the response. Concurrency must not be set unless it's known the helper supports the input format with channel-ID fields. - The queue-size= option sets the maximum number of queued - requests. If the queued requests exceed queue size for more - than 3 minutes then squid aborts its operation. - The default value is set to 2*numberofchildren/ + The queue-size=N option sets the maximum number of queued + requests to N. The default maximum is 2*numberofchildren. Squid + is allowed to temporarily exceed the configured maximum, marking + the affected helper as "overloaded". If the helper overload + lasts more than 3 minutes, the action prescribed by the + on-persistent-overload option applies. + + The on-persistent-overload=action option specifies Squid + reaction to a new helper request arriving when the helper + has been overloaded for more that 3 minutes already. The number + of queued requests determines whether the helper is overloaded + (see the queue-size option). + + Two actions are supported: + + die Squid worker quits. This is the default behavior. + + err Squid treats the helper request as if it was + immediately submitted, and the helper immediately + responded with an error. This action has no effect on + the already queued and in-progress helper requests. NOTE: NTLM and Negotiate schemes do not support concurrency in the Squid code module even though some helpers can. IF HAVE_AUTH_MODULE_BASIC === Basic authentication parameters === "utf8" on|off HTTP uses iso-latin-1 as character set, while some authentication backends such as LDAP expects UTF-8. If this is set to on Squid will translate the HTTP iso-latin-1 charset to UTF-8 before sending the username and password to the helper. "credentialsttl" timetolive Specifies how long squid assumes an externally validated username:password pair is valid for - in other words how often the helper program is called for that user. Set this low to force revalidation with short lived passwords. NOTE: setting this high does not impact your susceptibility to replay attacks unless you are using an one-time password system (such as SecureID). If you are using such a system, you will be vulnerable to replay attacks unless you also use the max_user_ip ACL in an http_access rule. "casesensitive" on|off Specifies if usernames are case sensitive. Most user databases are case insensitive allowing the same username to be spelled using both lower and upper case letters, but some are case @@ -5098,117 +5116,135 @@ startup= Sets a minimum of how many processes are to be spawned when Squid starts or reconfigures. When set to zero the first request will cause spawning of the first child process to handle it. Starting too few will cause an initial slowdown in traffic as Squid attempts to simultaneously spawn enough processes to cope. idle= Sets a minimum of how many processes Squid is to try and keep available at all times. When traffic begins to rise above what the existing processes can handle this many more will be spawned up to the maximum configured. A minimum setting of 1 is required. concurrency= The number of requests each redirector helper can handle in parallel. Defaults to 0 which indicates the redirector is a old-style single threaded redirector. When this directive is set to a value >= 1 then the protocol used to communicate with the helper is modified to include an ID in front of the request/response. The ID from the request must be echoed back with the response to that request. queue-size=N - Sets the maximum number of queued requests. - If the queued requests exceed queue size and redirector_bypass - configuration option is set, then redirector is bypassed. Otherwise, if - overloading persists squid may abort its operation. - The default value is set to 2*numberofchildren. + Sets the maximum number of queued requests to N. The default maximum + is 2*numberofchildren. If the queued requests exceed queue size and + redirector_bypass configuration option is set, then redirector is bypassed. + Otherwise, Squid is allowed to temporarily exceed the configured maximum, + marking the affected helper as "overloaded". If the helper overload lasts + more than 3 minutes, the action prescribed by the on-persistent-overload + option applies. + + on-persistent-overload=action + + Specifies Squid reaction to a new helper request arriving when the helper + has been overloaded for more that 3 minutes already. The number of queued + requests determines whether the helper is overloaded (see the queue-size + option). + + Two actions are supported: + + die Squid worker quits. This is the default behavior. + + err Squid treats the helper request as if it was + immediately submitted, and the helper immediately + responded with an error. This action has no effect on + the already queued and in-progress helper requests. DOC_END NAME: url_rewrite_host_header redirect_rewrites_host_header TYPE: onoff DEFAULT: on LOC: Config.onoff.redir_rewrites_host DOC_START To preserve same-origin security policies in browsers and prevent Host: header forgery by redirectors Squid rewrites any Host: header in redirected requests. If you are running an accelerator this may not be a wanted effect of a redirector. This directive enables you disable Host: alteration in reverse-proxy traffic. WARNING: Entries are cached on the result of the URL rewriting process, so be careful if you have domain-virtual hosts. WARNING: Squid and other software verifies the URL and Host are matching, so be careful not to relay through other proxies or inspecting firewalls with this disabled. DOC_END NAME: url_rewrite_access redirector_access TYPE: acl_access DEFAULT: none DEFAULT_DOC: Allow, unless rules exist in squid.conf. LOC: Config.accessList.redirector DOC_START If defined, this access list specifies which requests are sent to the redirector processes. This clause supports both fast and slow acl types. See http://wiki.squid-cache.org/SquidFaq/SquidAcl for details. DOC_END NAME: url_rewrite_bypass redirector_bypass TYPE: onoff LOC: Config.onoff.redirector_bypass DEFAULT: off DOC_START - When this is 'on', a request will not go through the - redirector if all the helpers are busy. If this is 'off' - and the redirector queue grows too large, Squid will exit - with a FATAL error and ask you to increase the number of - redirectors. You should only enable this if the redirectors - are not critical to your caching system. If you use + When this is 'on', a request will not go through the + redirector if all the helpers are busy. If this is 'off' and the + redirector queue grows too large, the action is prescribed by the + on-persistent-overload option. You should only enable this if the + redirectors are not critical to your caching system. If you use redirectors for access control, and you enable this option, users may have access to pages they should not be allowed to request. This options sets default queue-size option of the url_rewrite_children to 0. + DOC_END NAME: url_rewrite_extras TYPE: TokenOrQuotedString LOC: Config.redirector_extras DEFAULT: "%>a/%>A %un %>rm myip=%la myport=%lp" DOC_START Specifies a string to be append to request line format for the rewriter helper. "Quoted" format values may contain spaces and logformat %macros. In theory, any logformat %macro can be used. In practice, a %macro expands as a dash (-) if the helper request is sent before the required macro information is available to Squid. DOC_END NAME: url_rewrite_timeout TYPE: UrlHelperTimeout LOC: Config.onUrlRewriteTimeout DEFAULT: none DEFAULT_DOC: Squid waits for the helper response forever DOC_START Squid times active requests to redirector. The timeout value and Squid reaction to a timed out request are configurable using the following format: url_rewrite_timeout timeout time-units on_timeout=<action> [response=<quoted-response>] supported timeout actions: fail Squid return a ERR_GATEWAY_FAILURE error page bypass Do not re-write the URL @@ -5305,92 +5341,109 @@ startup= Sets a minimum of how many processes are to be spawned when Squid starts or reconfigures. When set to zero the first request will cause spawning of the first child process to handle it. Starting too few will cause an initial slowdown in traffic as Squid attempts to simultaneously spawn enough processes to cope. idle= Sets a minimum of how many processes Squid is to try and keep available at all times. When traffic begins to rise above what the existing processes can handle this many more will be spawned up to the maximum configured. A minimum setting of 1 is required. concurrency= The number of requests each storeID helper can handle in parallel. Defaults to 0 which indicates the helper is a old-style single threaded program. When this directive is set to a value >= 1 then the protocol used to communicate with the helper is modified to include an ID in front of the request/response. The ID from the request must be echoed back with the response to that request. queue-size=N - Sets the maximum number of queued requests. - If the queued requests exceed queue size and store_id_bypass - configuration option is set, then storeID helper is bypassed. Otherwise, - if overloading persists squid may abort its operation. - The default value is set to 2*numberofchildren. + Sets the maximum number of queued requests to N. The default maximum + is 2*numberofchildren. If the queued requests exceed queue size and + redirector_bypass configuration option is set, then redirector is bypassed. + Otherwise, Squid is allowed to temporarily exceed the configured maximum, + marking the affected helper as "overloaded". If the helper overload lasts + more than 3 minutes, the action prescribed by the on-persistent-overload + option applies. + + on-persistent-overload=action + + Specifies Squid reaction to a new helper request arriving when the helper + has been overloaded for more that 3 minutes already. The number of queued + requests determines whether the helper is overloaded (see the queue-size + option). + + Two actions are supported: + + die Squid worker quits. This is the default behavior. + + err Squid treats the helper request as if it was + immediately submitted, and the helper immediately + responded with an error. This action has no effect on + the already queued and in-progress helper requests. DOC_END NAME: store_id_access storeurl_rewrite_access TYPE: acl_access DEFAULT: none DEFAULT_DOC: Allow, unless rules exist in squid.conf. LOC: Config.accessList.store_id DOC_START If defined, this access list specifies which requests are sent to the StoreID processes. By default all requests are sent. This clause supports both fast and slow acl types. See http://wiki.squid-cache.org/SquidFaq/SquidAcl for details. DOC_END NAME: store_id_bypass storeurl_rewrite_bypass TYPE: onoff LOC: Config.onoff.store_id_bypass DEFAULT: on DOC_START - When this is 'on', a request will not go through the - helper if all helpers are busy. If this is 'off' - and the helper queue grows too large, Squid will exit - with a FATAL error and ask you to increase the number of - helpers. You should only enable this if the helperss - are not critical to your caching system. If you use + When this is 'on', a request will not go through the + helper if all helpers are busy. If this is 'off' and the helper + queue grows too large, the action is prescribed by the + on-persistent-overload option. You should only enable this if the + helpers are not critical to your caching system. If you use helpers for critical caching components, and you enable this option, users may not get objects from cache. This options sets default queue-size option of the store_id_children to 0. DOC_END COMMENT_START OPTIONS FOR TUNING THE CACHE ----------------------------------------------------------------------------- COMMENT_END NAME: cache no_cache TYPE: acl_access DEFAULT: none DEFAULT_DOC: By default, this directive is unused and has no effect. LOC: Config.accessList.noCache DOC_START Requests denied by this directive will not be served from the cache and their responses will not be stored in the cache. This directive has no effect on other transactions and on already cached responses. This clause supports both fast and slow acl types. See http://wiki.squid-cache.org/SquidFaq/SquidAcl for details. This and the two other similar caching directives listed below are checked at different transaction processing stages, have different access to response information, affect different cache operations, and differ in slow ACLs support: * cache: Checked before Squid makes a hit/miss determination. === modified file 'src/external_acl.cc' --- src/external_acl.cc 2016-03-11 15:03:20 +0000 +++ src/external_acl.cc 2016-08-09 04:41:46 +0000 @@ -586,75 +586,76 @@ if (ti != ACCESS_ALLOWED) { debugs(82, 2, HERE << acl->def->name << " user not authenticated (" << ti << ")"); return ti; } debugs(82, 3, HERE << acl->def->name << " user is authenticated."); } #endif const char *key = makeExternalAclKey(ch, acl); if (!key) { /* Not sufficient data to process */ return ACCESS_DUNNO; } entry = static_cast<ExternalACLEntry *>(hash_lookup(acl->def->cache, key)); const ExternalACLEntryPointer staleEntry = entry; if (entry != NULL && external_acl_entry_expired(acl->def, entry)) entry = NULL; if (entry != NULL && external_acl_grace_expired(acl->def, entry)) { // refresh in the background ExternalACLLookup::Start(ch, acl, true); debugs(82, 4, HERE << "no need to wait for the refresh of '" << key << "' in '" << acl->def->name << "' (ch=" << ch << ")."); } if (!entry) { debugs(82, 2, HERE << acl->def->name << "(\"" << key << "\") = lookup needed"); - if (!acl->def->theHelper->queueFull()) { + // TODO: All other helpers allow temporary overload. Should not we? + if (!acl->def->theHelper->willOverload()) { debugs(82, 2, HERE << "\"" << key << "\": queueing a call."); if (!ch->goAsync(ExternalACLLookup::Instance())) debugs(82, 2, "\"" << key << "\": no async support!"); debugs(82, 2, HERE << "\"" << key << "\": return -1."); return ACCESS_DUNNO; // expired cached or simply absent entry } else { if (!staleEntry) { debugs(82, DBG_IMPORTANT, "WARNING: external ACL '" << acl->def->name << - "' queue overload. Request rejected '" << key << "'."); + "' queue full. Request rejected '" << key << "'."); external_acl_message = "SYSTEM TOO BUSY, TRY AGAIN LATER"; return ACCESS_DUNNO; } else { debugs(82, DBG_IMPORTANT, "WARNING: external ACL '" << acl->def->name << - "' queue overload. Using stale result. '" << key << "'."); + "' queue full. Using stale result. '" << key << "'."); entry = staleEntry; /* Fall thru to processing below */ } } } } debugs(82, 4, HERE << "entry = { date=" << (long unsigned int) entry->date << ", result=" << entry->result << " tag=" << entry->tag << " log=" << entry->log << " }"); #if USE_AUTH debugs(82, 4, HERE << "entry user=" << entry->user); #endif external_acl_cache_touch(acl->def, entry); external_acl_message = entry->message.termedBuf(); debugs(82, 2, HERE << acl->def->name << " = " << entry->result); copyResultsFromEntry(ch->request, entry); return entry->result; } int ACLExternal::match(ACLChecklist *checklist) { allow_t answer = aclMatchExternal(data, Filled(checklist)); // convert to tri-state ACL match 1,0,-1 === modified file 'src/helper.cc' --- src/helper.cc 2016-07-29 08:31:12 +0000 +++ src/helper.cc 2016-08-09 09:20:11 +0000 @@ -18,62 +18,62 @@ #include "fd.h" #include "fde.h" #include "format/Quoting.h" #include "helper.h" #include "helper/Reply.h" #include "helper/Request.h" #include "MemBuf.h" #include "SquidConfig.h" #include "SquidIpc.h" #include "SquidMath.h" #include "SquidTime.h" #include "Store.h" #include "wordlist.h" // helper_stateful_server::data uses explicit alloc()/freeOne() */ #include "mem/Pool.h" #define HELPER_MAX_ARGS 64 /// The maximum allowed request retries. #define MAX_RETRIES 2 /// Helpers input buffer size. const size_t ReadBufSize(32*1024); static IOCB helperHandleRead; static IOCB helperStatefulHandleRead; static void helperServerFree(helper_server *srv); static void helperStatefulServerFree(helper_stateful_server *srv); static void Enqueue(helper * hlp, Helper::Xaction *); -static helper_server *GetFirstAvailable(helper * hlp); -static helper_stateful_server *StatefulGetFirstAvailable(statefulhelper * hlp); +static helper_server *GetFirstAvailable(const helper * hlp); +static helper_stateful_server *StatefulGetFirstAvailable(const statefulhelper * hlp); static void helperDispatch(helper_server * srv, Helper::Xaction * r); static void helperStatefulDispatch(helper_stateful_server * srv, Helper::Xaction * r); static void helperKickQueue(helper * hlp); static void helperStatefulKickQueue(statefulhelper * hlp); static void helperStatefulServerDone(helper_stateful_server * srv); static void StatefulEnqueue(statefulhelper * hlp, Helper::Xaction * r); CBDATA_CLASS_INIT(helper); CBDATA_CLASS_INIT(helper_server); CBDATA_CLASS_INIT(statefulhelper); CBDATA_CLASS_INIT(helper_stateful_server); InstanceIdDefinitions(HelperServerBase, "Hlpr"); void HelperServerBase::initStats() { stats.uses=0; stats.replies=0; stats.pending=0; stats.releases=0; stats.timedout = 0; } void HelperServerBase::closePipesSafely(const char *id_name) { #if _SQUID_WINDOWS_ shutdown(writePipe->fd, SD_BOTH); #endif @@ -352,164 +352,209 @@ commSetNonBlocking(rfd); if (wfd != rfd) commSetNonBlocking(wfd); AsyncCall::Pointer closeCall = asyncCall(5,4, "helperStatefulServerFree", cbdataDialer(helperStatefulServerFree, srv)); comm_add_close_handler(rfd, closeCall); AsyncCall::Pointer call = commCbCall(5,4, "helperStatefulHandleRead", CommIoCbPtrFun(helperStatefulHandleRead, srv)); comm_read(srv->readPipe, srv->rbuf, srv->rbuf_sz - 1, call); } hlp->last_restart = squid_curtime; safe_free(shortname); safe_free(procname); helperStatefulKickQueue(hlp); } void helper::submitRequest(Helper::Xaction *r) { helper_server *srv; if ((srv = GetFirstAvailable(this))) helperDispatch(srv, r); else Enqueue(this, r); - if (!queueFull()) { - full_time = 0; - } else if (!full_time) { - debugs(84, 3, id_name << " queue became full"); - full_time = squid_curtime; - } + syncQueueStats(); +} + +/// handles helperSubmit() and helperStatefulSubmit() failures +static void +SubmissionFailure(helper *hlp, HLPCB *callback, void *data) +{ + if (!hlp) + debugs(84, 3, "no helper"); + // else the helper has already reported the error + + Helper::Reply nilReply(Helper::Unknown); + callback(data, nilReply); } void helperSubmit(helper * hlp, const char *buf, HLPCB * callback, void *data) { - if (hlp == NULL) { - debugs(84, 3, "helperSubmit: hlp == NULL"); - Helper::Reply const nilReply(Helper::Unknown); - callback(data, nilReply); - return; - } - hlp->prepSubmit(); - hlp->submit(buf, callback, data); + if (!hlp || !hlp->trySubmit(buf, callback, data)) + SubmissionFailure(hlp, callback, data); } +/// whether queuing an additional request would overload the helper bool helper::queueFull() const { + return stats.queue_size >= static_cast<int>(childs.queue_size); +} + +bool +helper::overloaded() const { return stats.queue_size > static_cast<int>(childs.queue_size); } -/// prepares the helper for request submission via trySubmit() or helperSubmit() -/// currently maintains full_time and kills Squid if the helper remains full for too long +/// synchronizes queue-dependent measurements with the current queue state void +helper::syncQueueStats() +{ + if (overloaded()) { + if (overloadStart) { + debugs(84, 5, id_name << " still overloaded; dropped " << droppedRequests); + } else { + overloadStart = squid_curtime; + debugs(84, 3, id_name << " became overloaded"); + } + } else { + if (overloadStart) { + debugs(84, 5, id_name << " is no longer overloaded"); + if (droppedRequests) { + debugs(84, DBG_IMPORTANT, "helper " << id_name << + " is no longer overloaded after dropping " << droppedRequests << + " requests in " << (squid_curtime - overloadStart) << " seconds"); + droppedRequests = 0; + } + overloadStart = 0; + } + } +} + +/// prepares the helper for request submission +/// returns true if and only if the submission should proceed +/// may kill Squid if the helper remains overloaded for too long +bool helper::prepSubmit() { - if (!queueFull()) - full_time = 0; - else if (!full_time) // may happen here if reconfigure decreases capacity - full_time = squid_curtime; - else if (squid_curtime - full_time > 180) - fatalf("Too many queued %s requests", id_name); + // re-sync for the configuration may have changed since the last submission + syncQueueStats(); + + // Nothing special to do if the new request does not overload (i.e., the + // queue is not even full yet) or only _starts_ overloading this helper + // (i.e., the queue is currently at its limit). + if (!overloaded()) + return true; + + if (squid_curtime - overloadStart <= 180) + return true; // also OK: overload has not persisted long enough to panic + + if (childs.onPersistentOverload == Helper::ChildConfig::actDie) + fatalf("Too many queued %s requests; see on-persistent-overload.", id_name); + + if (!droppedRequests) { + debugs(84, DBG_IMPORTANT, "WARNING: dropping requests to overloaded " << + id_name << " helper configured with on-persistent-overload=err"); + } + ++droppedRequests; + debugs(84, 3, "failed to send " << droppedRequests << " helper requests to " << id_name); + return false; } bool helper::trySubmit(const char *buf, HLPCB * callback, void *data) { - prepSubmit(); - - if (queueFull()) { - debugs(84, DBG_IMPORTANT, id_name << " drops request due to a full queue"); - return false; // request was ignored - } + if (!prepSubmit()) + return false; // request was dropped submit(buf, callback, data); // will send or queue return true; // request submitted or queued } /// dispatches or enqueues a helper requests; does not enforce queue limits void helper::submit(const char *buf, HLPCB * callback, void *data) { Helper::Xaction *r = new Helper::Xaction(callback, data, buf); submitRequest(r); debugs(84, DBG_DATA, Raw("buf", buf, strlen(buf))); } /// lastserver = "server last used as part of a reserved request sequence" void helperStatefulSubmit(statefulhelper * hlp, const char *buf, HLPCB * callback, void *data, helper_stateful_server * lastserver) { - if (hlp == NULL) { - debugs(84, 3, "helperStatefulSubmit: hlp == NULL"); - Helper::Reply const nilReply(Helper::Unknown); - callback(data, nilReply); - return; - } - hlp->prepSubmit(); - hlp->submit(buf, callback, data, lastserver); + if (!hlp || !hlp->trySubmit(buf, callback, data, lastserver)) + SubmissionFailure(hlp, callback, data); +} + +/// If possible, submit request. Otherwise, either kill Squid or return false. +bool +statefulhelper::trySubmit(const char *buf, HLPCB * callback, void *data, helper_stateful_server *lastserver) +{ + if (!prepSubmit()) + return false; // request was dropped + + submit(buf, callback, data, lastserver); // will send or queue + return true; // request submitted or queued } void statefulhelper::submit(const char *buf, HLPCB * callback, void *data, helper_stateful_server * lastserver) { Helper::Xaction *r = new Helper::Xaction(callback, data, buf); if ((buf != NULL) && lastserver) { debugs(84, 5, "StatefulSubmit with lastserver " << lastserver); assert(lastserver->flags.reserved); assert(!lastserver->requests.size()); debugs(84, 5, "StatefulSubmit dispatching"); helperStatefulDispatch(lastserver, r); } else { helper_stateful_server *srv; if ((srv = StatefulGetFirstAvailable(this))) { helperStatefulDispatch(srv, r); } else StatefulEnqueue(this, r); } debugs(84, DBG_DATA, "placeholder: '" << r->request.placeholder << "', " << Raw("buf", buf, (!buf?0:strlen(buf)))); - if (!queueFull()) { - full_time = 0; - } else if (!full_time) { - debugs(84, 3, id_name << " queue became full"); - full_time = squid_curtime; - } + syncQueueStats(); } /** * DPW 2007-05-08 * * helperStatefulReleaseServer tells the helper that whoever was * using it no longer needs its services. */ void helperStatefulReleaseServer(helper_stateful_server * srv) { debugs(84, 3, HERE << "srv-" << srv->index << " flags.reserved = " << srv->flags.reserved); if (!srv->flags.reserved) return; ++ srv->stats.releases; srv->flags.reserved = false; helperStatefulServerDone(srv); } /** return a pointer to the stateful routines data area */ void * helperStatefulServerGetData(helper_stateful_server * srv) { return srv->data; } void @@ -543,60 +588,65 @@ assert(srv); Helper::Xaction *xaction = srv->requests.empty() ? NULL : srv->requests.front(); double tt = 0.001 * (xaction ? tvSubMsec(xaction->request.dispatch_time, current_time) : tvSubMsec(srv->dispatch_time, srv->answer_time)); p->appendf("%7u\t%7d\t%7d\t%11" PRIu64 "\t%11" PRIu64 "\t%11" PRIu64 "\t%c%c%c%c%c%c\t%7.3f\t%7d\t%s\n", srv->index.value, srv->readPipe->fd, srv->pid, srv->stats.uses, srv->stats.replies, srv->stats.timedout, srv->stats.pending ? 'B' : ' ', srv->flags.writing ? 'W' : ' ', srv->flags.closing ? 'C' : ' ', srv->flags.reserved ? 'R' : ' ', srv->flags.shutdown ? 'S' : ' ', xaction && xaction->request.placeholder ? 'P' : ' ', tt < 0.0 ? 0.0 : tt, (int) srv->roffset, xaction ? Format::QuoteMimeBlob(xaction->request.buf) : "(none)"); } p->append("\nFlags key:\n" " B\tBUSY\n" " W\tWRITING\n" " C\tCLOSING\n" " R\tRESERVED\n" " S\tSHUTDOWN PENDING\n" " P\tPLACEHOLDER\n", 101); } +bool +helper::willOverload() const { + return queueFull() && !(childs.needNew() || GetFirstAvailable(this)); +} + void helperShutdown(helper * hlp) { dlink_node *link = hlp->servers.head; while (link) { helper_server *srv; srv = (helper_server *)link->data; link = link->next; if (srv->flags.shutdown) { debugs(84, 3, "helperShutdown: " << hlp->id_name << " #" << srv->index << " has already SHUT DOWN."); continue; } assert(hlp->childs.n_active > 0); -- hlp->childs.n_active; srv->flags.shutdown = true; /* request it to shut itself down */ if (srv->flags.closing) { debugs(84, 3, "helperShutdown: " << hlp->id_name << " #" << srv->index << " is CLOSING."); continue; } if (srv->stats.pending) { debugs(84, 3, "helperShutdown: " << hlp->id_name << " #" << srv->index << " is BUSY."); continue; } debugs(84, 3, "helperShutdown: " << hlp->id_name << " #" << srv->index << " shutting down."); @@ -1159,108 +1209,107 @@ if (hlp->stats.queue_size < (int)hlp->childs.queue_size) return; if (squid_curtime - hlp->last_queue_warn < 600) return; if (shutting_down || reconfiguring) return; hlp->last_queue_warn = squid_curtime; debugs(84, DBG_CRITICAL, "WARNING: All " << hlp->childs.n_active << "/" << hlp->childs.n_max << " " << hlp->id_name << " processes are busy."); debugs(84, DBG_CRITICAL, "WARNING: " << hlp->stats.queue_size << " pending requests queued"); debugs(84, DBG_CRITICAL, "WARNING: Consider increasing the number of " << hlp->id_name << " processes in your config file."); } Helper::Xaction * helper::nextRequest() { if (queue.empty()) return nullptr; auto *r = queue.front(); queue.pop(); --stats.queue_size; return r; } static helper_server * -GetFirstAvailable(helper * hlp) +GetFirstAvailable(const helper * hlp) { dlink_node *n; helper_server *srv; helper_server *selected = NULL; debugs(84, 5, "GetFirstAvailable: Running servers " << hlp->childs.n_running); if (hlp->childs.n_running == 0) return NULL; /* Find "least" loaded helper (approx) */ for (n = hlp->servers.head; n != NULL; n = n->next) { srv = (helper_server *)n->data; if (selected && selected->stats.pending <= srv->stats.pending) continue; if (srv->flags.shutdown) continue; if (!srv->stats.pending) return srv; if (selected) { selected = srv; break; } selected = srv; } - /* Check for overload */ if (!selected) { debugs(84, 5, "GetFirstAvailable: None available."); return NULL; } if (selected->stats.pending >= (hlp->childs.concurrency ? hlp->childs.concurrency : 1)) { - debugs(84, 3, "GetFirstAvailable: Least-loaded helper is overloaded!"); + debugs(84, 3, "GetFirstAvailable: Least-loaded helper is fully loaded!"); return NULL; } debugs(84, 5, "GetFirstAvailable: returning srv-" << selected->index); return selected; } static helper_stateful_server * -StatefulGetFirstAvailable(statefulhelper * hlp) +StatefulGetFirstAvailable(const statefulhelper * hlp) { dlink_node *n; helper_stateful_server *srv = NULL; debugs(84, 5, "StatefulGetFirstAvailable: Running servers " << hlp->childs.n_running); if (hlp->childs.n_running == 0) return NULL; for (n = hlp->servers.head; n != NULL; n = n->next) { srv = (helper_stateful_server *)n->data; if (srv->stats.pending) continue; if (srv->flags.reserved) continue; if (srv->flags.shutdown) continue; debugs(84, 5, "StatefulGetFirstAvailable: returning srv-" << srv->index); return srv; } debugs(84, 5, "StatefulGetFirstAvailable: None available."); return NULL; } static void helperDispatchWriteDone(const Comm::ConnectionPointer &, char *, size_t, Comm::Flag flag, int, void *data) === modified file 'src/helper.h' --- src/helper.h 2016-07-29 08:31:12 +0000 +++ src/helper.h 2016-08-09 05:05:15 +0000 @@ -22,149 +22,155 @@ #include "helper/Request.h" #include "ip/Address.h" #include "sbuf/SBuf.h" #include <list> #include <map> #include <queue> class Packable; class wordlist; namespace Helper { /// Holds the required data to serve a helper request. class Xaction { MEMPROXY_CLASS(Helper::Xaction); public: Xaction(HLPCB *c, void *d, const char *b): request(c, d, b) {} Helper::Request request; Helper::Reply reply; }; } /** * Managers a set of individual helper processes with a common queue of requests. * * With respect to load, a helper goes through these states (roughly): * idle: no processes are working on requests (and no requests are queued); * normal: some, but not all processes are working (and no requests are queued); * busy: all processes are working (and some requests are possibly queued); - * full: all processes are working and at least 2*#processes requests are queued. + * overloaded: a busy helper with more than queue-size requests in the queue. * - * A "busy" helper queues new requests and issues a WARNING every 10 minutes or so. - * A "full" helper either drops new requests or keeps queuing them, depending on + * A busy helper queues new requests and issues a WARNING every 10 minutes or so. + * An overloaded helper either drops new requests or keeps queuing them, depending on * whether the caller can handle dropped requests (trySubmit vs helperSubmit APIs). - * An attempt to use a "full" helper that has been "full" for 3+ minutes kills worker. - * Given enough load, all helpers except for external ACL will make such attempts. + * If an overloaded helper has been overloaded for 3+ minutes, an attempt to use + * it results in on-persistent-overload action, which may kill worker. */ class helper { CBDATA_CLASS(helper); public: inline helper(const char *name) : cmdline(NULL), id_name(name), ipc_type(0), - full_time(0), + droppedRequests(0), + overloadStart(0), last_queue_warn(0), last_restart(0), timeout(0), retryTimedOut(false), retryBrokenHelper(false), eom('\n') { memset(&stats, 0, sizeof(stats)); } ~helper(); - /// whether at least one more request can be successfully submitted - bool queueFull() const; - /// \returns next request in the queue, or nil. Helper::Xaction *nextRequest(); - ///< If not full, submit request. Otherwise, either kill Squid or return false. + /// If possible, submit request. Otherwise, either kill Squid or return false. bool trySubmit(const char *buf, HLPCB * callback, void *data); /// Submits a request to the helper or add it to the queue if none of /// the servers is available. void submitRequest(Helper::Xaction *r); /// Dump some stats about the helper state to a Packable object void packStatsInto(Packable *p, const char *label = NULL) const; + /// whether the helper will be in "overloaded" state after one more request + /// already overloaded helpers return true + bool willOverload() const; public: wordlist *cmdline; dlink_list servers; std::queue<Helper::Xaction *> queue; const char *id_name; Helper::ChildConfig childs; ///< Configuration settings for number running. int ipc_type; Ip::Address addr; - time_t full_time; ///< when a full helper became full (zero for non-full helpers) + unsigned int droppedRequests; ///< requests not sent during helper overload + time_t overloadStart; ///< when the helper became overloaded (zero if it is not) time_t last_queue_warn; time_t last_restart; time_t timeout; ///< Requests timeout bool retryTimedOut; ///< Whether the timed-out requests must retried bool retryBrokenHelper; ///< Whether the requests must retried on BH replies SBuf onTimedOutResponse; ///< The response to use when helper response timedout char eom; ///< The char which marks the end of (response) message, normally '\n' struct _stats { int requests; int replies; int timedout; int queue_size; int avg_svc_time; } stats; protected: friend void helperSubmit(helper * hlp, const char *buf, HLPCB * callback, void *data); - void prepSubmit(); + bool queueFull() const; + bool overloaded() const; + void syncQueueStats(); + bool prepSubmit(); void submit(const char *buf, HLPCB * callback, void *data); }; class statefulhelper : public helper { CBDATA_CLASS(statefulhelper); public: inline statefulhelper(const char *name) : helper(name), datapool(NULL) {} inline ~statefulhelper() {} public: MemAllocator *datapool; private: friend void helperStatefulSubmit(statefulhelper * hlp, const char *buf, HLPCB * callback, void *data, helper_stateful_server * lastserver); void submit(const char *buf, HLPCB * callback, void *data, helper_stateful_server *lastserver); + bool trySubmit(const char *buf, HLPCB * callback, void *data, helper_stateful_server *lastserver); }; /** * Fields shared between stateless and stateful helper servers. */ class HelperServerBase { public: /** Closes pipes to the helper safely. * Handles the case where the read and write pipes are the same FD. * * \param name displayed for the helper being shutdown if logging an error */ void closePipesSafely(const char *name); /** Closes the reading pipe. * If the read and write sockets are the same the write pipe will * also be closed. Otherwise its left open for later handling. * * \param name displayed for the helper being shutdown if logging an error */ void closeWritePipeSafely(const char *name); public: /// Helper program identifier; does not change when contents do, /// including during assignment const InstanceId<HelperServerBase> index; int pid; Ip::Address addr; Comm::ConnectionPointer readPipe; === modified file 'src/helper/ChildConfig.cc' --- src/helper/ChildConfig.cc 2016-01-01 00:12:18 +0000 +++ src/helper/ChildConfig.cc 2016-08-09 04:41:46 +0000 @@ -1,120 +1,133 @@ /* * Copyright (C) 1996-2016 The Squid Software Foundation and contributors * * Squid software is distributed under GPLv2+ license and includes * contributions from numerous individuals and organizations. * Please see the COPYING and CONTRIBUTORS files for details. */ #include "squid.h" #include "cache_cf.h" #include "ConfigParser.h" #include "Debug.h" #include "globals.h" #include "helper/ChildConfig.h" #include "Parsing.h" #include <cstring> Helper::ChildConfig::ChildConfig(): n_max(0), n_startup(0), n_idle(1), concurrency(0), n_running(0), n_active(0), queue_size(0), + onPersistentOverload(actDie), defaultQueueSize(true) {} Helper::ChildConfig::ChildConfig(const unsigned int m): n_max(m), n_startup(0), n_idle(1), concurrency(0), n_running(0), n_active(0), queue_size(2 * m), + onPersistentOverload(actDie), defaultQueueSize(true) {} Helper::ChildConfig & Helper::ChildConfig::updateLimits(const Helper::ChildConfig &rhs) { // Copy the limits only. // Preserve the local state values (n_running and n_active) n_max = rhs.n_max; n_startup = rhs.n_startup; n_idle = rhs.n_idle; concurrency = rhs.concurrency; queue_size = rhs.queue_size; + onPersistentOverload = rhs.onPersistentOverload; defaultQueueSize = rhs.defaultQueueSize; return *this; } int Helper::ChildConfig::needNew() const { /* during the startup and reconfigure use our special amount... */ if (starting_up || reconfiguring) return n_startup; /* keep a minimum of n_idle helpers free... */ if ( (n_active + n_idle) < n_max) return n_idle; /* dont ever start more than n_max processes. */ return (n_max - n_active); } void Helper::ChildConfig::parseConfig() { char const *token = ConfigParser::NextToken(); if (!token) self_destruct(); /* starts with a bare number for the max... back-compatible */ n_max = xatoui(token); if (n_max < 1) { debugs(0, DBG_CRITICAL, "ERROR: The maximum number of processes cannot be less than 1."); self_destruct(); } /* Parse extension options */ for (; (token = ConfigParser::NextToken()) ;) { if (strncmp(token, "startup=", 8) == 0) { n_startup = xatoui(token + 8); } else if (strncmp(token, "idle=", 5) == 0) { n_idle = xatoui(token + 5); if (n_idle < 1) { debugs(0, DBG_CRITICAL, "WARNING OVERIDE: Using idle=0 for helpers causes request failures. Overiding to use idle=1 instead."); n_idle = 1; } } else if (strncmp(token, "concurrency=", 12) == 0) { concurrency = xatoui(token + 12); } else if (strncmp(token, "queue-size=", 11) == 0) { queue_size = xatoui(token + 11); defaultQueueSize = false; + } else if (strncmp(token, "on-persistent-overload=", 23) == 0) { + const SBuf action(token + 23); + if (action.cmp("err") == 0) + onPersistentOverload = actErr; + else if (action.cmp("die") == 0) + onPersistentOverload = actDie; + else { + debugs(0, DBG_CRITICAL, "ERROR: Unsupported on-persistent-overloaded action: " << action); + self_destruct(); + } } else { debugs(0, DBG_PARSE_NOTE(DBG_IMPORTANT), "ERROR: Undefined option: " << token << "."); self_destruct(); } } /* simple sanity. */ if (n_startup > n_max) { debugs(0, DBG_CRITICAL, "WARNING OVERIDE: Capping startup=" << n_startup << " to the defined maximum (" << n_max <<")"); n_startup = n_max; } if (n_idle > n_max) { debugs(0, DBG_CRITICAL, "WARNING OVERIDE: Capping idle=" << n_idle << " to the defined maximum (" << n_max <<")"); n_idle = n_max; } if (defaultQueueSize) queue_size = 2 * n_max; } === modified file 'src/helper/ChildConfig.h' --- src/helper/ChildConfig.h 2016-01-01 00:12:18 +0000 +++ src/helper/ChildConfig.h 2016-08-09 04:41:46 +0000 @@ -63,47 +63,55 @@ * The default value for backward compatibility the default for this is the same as maximum children. * For now the actual number of idle children is only reduced by a reconfigure operation. This may change. */ unsigned int n_idle; /** * How many concurrent requests each child helper may be capable of handling. * Default: 0 - no concurrency possible. */ unsigned int concurrency; /* derived from active operations */ /** * Total helper children objects currently existing. * Produced as a side effect of starting children or their stopping. */ unsigned int n_running; /** * Count of helper children active (not shutting down). * This includes both idle and in-use children. */ unsigned int n_active; /** * The requests queue size. By default it is of size 2*n_max */ unsigned int queue_size; + /// how to handle a serious problem with a helper request submission + enum SubmissionErrorHandlingAction { + actDie, ///< kill the caller process (i.e., Squid worker) + actErr ///< drop the request and send an error to the caller + }; + /// how to handle a new request for helper that was overloaded for too long + SubmissionErrorHandlingAction onPersistentOverload; + /** * True if the default queue size is used. * Needed in the cases where we need to adjust default queue_size in * special configurations, for example when redirector_bypass is used. */ bool defaultQueueSize; }; } // namespace Helper /* Legacy parser interface */ #define parse_HelperChildConfig(c) (c)->parseConfig() #define dump_HelperChildConfig(e,n,c) storeAppendPrintf((e), "\n%s %d startup=%d idle=%d concurrency=%d\n", (n), (c).n_max, (c).n_startup, (c).n_idle, (c).concurrency) #define free_HelperChildConfig(dummy) // NO. #endif /* _SQUID_SRC_HELPER_CHILDCONFIG_H */ === modified file 'src/redirect.cc' --- src/redirect.cc 2016-07-25 10:22:54 +0000 +++ src/redirect.cc 2016-08-09 04:41:46 +0000 @@ -268,85 +268,87 @@ http->request->method, NULL, http->getConn() != NULL && http->getConn()->clientConnection != NULL ? http->getConn()->clientConnection->remote : tmpnoaddr, http->request, NULL, #if USE_AUTH http->getConn() != NULL && http->getConn()->getAuth() != NULL ? http->getConn()->getAuth() : http->request->auth_user_request); #else NULL); #endif node = (clientStreamNode *)http->client_stream.tail->data; clientStreamRead(node, http, node->readBuffer); return; } debugs(61,6, HERE << "sending '" << buf << "' to the " << name << " helper"); helperSubmit(hlp, buf, replyHandler, r); } /**** PUBLIC FUNCTIONS ****/ void redirectStart(ClientHttpRequest * http, HLPCB * handler, void *data) { assert(http); assert(handler); debugs(61, 5, "redirectStart: '" << http->uri << "'"); - if (Config.onoff.redirector_bypass && redirectors->queueFull()) { + // TODO: Deprecate Config.onoff.redirector_bypass in favor of either + // onPersistentOverload or a new onOverload option that applies to all helpers. + if (Config.onoff.redirector_bypass && redirectors->willOverload()) { /* Skip redirector if the queue is full */ ++redirectorBypassed; Helper::Reply bypassReply; bypassReply.result = Helper::Okay; bypassReply.notes.add("message","URL rewrite/redirect queue too long. Bypassed."); handler(data, bypassReply); return; } constructHelperQuery("redirector", redirectors, redirectHandleReply, http, handler, data, redirectorExtrasFmt); } /** * Handles the StoreID feature helper starting. * For now it cannot be done using the redirectStart method. */ void storeIdStart(ClientHttpRequest * http, HLPCB * handler, void *data) { assert(http); assert(handler); debugs(61, 5, "storeIdStart: '" << http->uri << "'"); - if (Config.onoff.store_id_bypass && storeIds->queueFull()) { + if (Config.onoff.store_id_bypass && storeIds->willOverload()) { /* Skip StoreID Helper if the queue is full */ ++storeIdBypassed; Helper::Reply bypassReply; bypassReply.result = Helper::Okay; bypassReply.notes.add("message","StoreId helper queue too long. Bypassed."); handler(data, bypassReply); return; } constructHelperQuery("storeId helper", storeIds, storeIdHandleReply, http, handler, data, storeIdExtrasFmt); } void redirectInit(void) { static bool init = false; if (!init) { Mgr::RegisterAction("redirector", "URL Redirector Stats", redirectStats, 0, 1); Mgr::RegisterAction("store_id", "StoreId helper Stats", storeIdStats, 0, 1); } if (Config.Program.redirect) { if (redirectors == NULL) redirectors = new helper("redirector"); redirectors->cmdline = Config.Program.redirect;
_______________________________________________ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev