Re: [squid-dev] [PATCH] url_rewrite_timeout directive
If there is not any objection I will apply the last patch to trunk... On 11/24/2014 02:36 PM, Tsantilas Christos wrote: This is a new patch for url_rewrite_timeout feature. Changes over the last patch: - The tools/helper-mux/helper-mux fixed to work with the new helpers request-id. - Now there is a limit on request retries, it is hardcoded to 2 retries. - The retrying request with BH replies on storeID and redrector helpers is nor handled inside helpers.cc code. - other minor polishing changes I did not remove the on_timeout option from url_rewrite_timeout directive. Although it can be emulated using the use_configured_response option I believe it is a clearer configuration method. The use_configured_response looks more than a trick and I am sure in 1-2 years, even me I develop the patch I will forget that exist a such configuration option. I must note again that the default behaviour of current helpers configuration, should not change with this patch. I hope it is OK. Regards, Christos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [PATCH] url_rewrite_timeout directive
On 11/18/2014 01:27 AM, Amos Jeffries wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18/11/2014 6:10 a.m., Tsantilas Christos wrote: On 11/16/2014 01:05 PM, Amos Jeffries wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 16/11/2014 7:38 a.m., Tsantilas Christos wrote: For the record I am still extremely skeptical about the use-case behind this feature. Timeout i a clear sign that the helper or system behind it is broken (capacity overload / network congestion). Continuing to use such systems in a way which further increases network load is very often a bad choice. Hiding the issue an even worse choice. Maybe the administrator does not care about unanswered queries. I posit that admin who truely do not care about unanswered questions will not care enough to bother configuring this feature. So he will not do it. The default behaviour is to not use timeout. The default behaviour is not changed. The administrator should always care about these unanswered questions. Each one means worse proxy performance. Offering to hide them silently and automatically without anybody having to do any work about fixing the underlying cause implies they are not actually problems. Which is false implication. This option can be used to increase proxy performance. This is the purpose of this option. Imagine a huge squid proxy, which may have to process thousands of URLs per minute and the administrator decided that a 10% of helpers request can fail. In this case we are giving him a way to configure the squid behaviour in this case: may ignore the probelm, or use a predefined answer etc Now that same Squid proxy suddenly starts randomly wrongly rejecting just one request per minute out of all those thousands. Which one, why, and WTF can anybody do about it? Just only requests which timed out and only if configured to do it... Previous Squid would log a client request with _TIMEOUT, leave a helper in Busy or Pending state with full trace of the pending lookup in mgr reports, possibly even cache.log warnings about helpers queue length. The patch does not change squid behaviour, if the timeout is not configured (default). Now all that is reduced to an overly simple aggregated Requests timed out: hiding in a rarely viewed manager report and a hidden level-3 debug message that lists an anonymous requestId from N identical requestIds spread over N helpers. IF configured you will see in mgr report a Requests timed out: for each running server. If you see that a server has many timedout requests, more than the other servers then you can kiil it if you consider it as a problem. The reason the helper is not answered enough fast, maybe is a database or an external lookup failure (for example categorized urls list as a DNS service). In these cases the system admin or service provider, may prefer a none answer from the helper, or a preconfigured answer, instead of waiting too long for the answer. What to do if the internal dependencies are going badly is something that should be handled *inside the helper*. After all Squid has absolutely no way to know if its generic lookup helper has the DNS lookup on the DB server name or the DB query itself broken. Each of which might have a different best way to respond. The intention of the BrokenHelper code was to have the helpers explicitly inform Squid that they were in trouble and needed help shifting the load elsewhere. Squid silently Forgetting that requests have been sent and then sending *more* is a truely terrible way to fix all the cases of helper overload. Again, the timeout is optional. It is an extra option. If the customer/squid-user, wants to use BH code, still can do it. The helper can be implemented to not answer at all after a timeout period. A policy of if you do not answer in a day, please do not answer at all, I am not interested any mode is common in human world, and in business. Yes, it is also being fingered as a major reason for businesses dying off during the recent recession :-) Well, we can do a huge discussion about the recent recession, but I am sure I have more examples than you on failing businesses under an recessionary environment! . Late respondants lost out on work and thus cashflow. Yes but you can not avoid such cases. A stop-loss, after a reasonable configured timeout, is not a bad tactic in such cases. in src/cache_cf.cc * please move the *_url_rewrite_timeout() functions to redirect.cc - that will help reduce dependency issues when we get around to making the redirector config a standalone FooConfig object. Not so easy, because of dependencies on time-related parsing functions I let it for now. If required we must move time parse functions to a library file, for example Config.cc Pity. Oh well. * it would be simpler and easier on users to omit the on_timeout=use_configured_response response= and just have the existence of a response=
Re: [squid-dev] [PATCH] url_rewrite_timeout directive
On 11/16/2014 04:05 AM, Amos Jeffries wrote: For the record I am still extremely skeptical about the use-case behind this feature. This is a real use case (or we would not be proposing this feature): admins want to control what happens when their helper transactions timeout. Some of us may prefer to ignore timeouts completely (because they are usually benign), some may prefer to kill Squid on the first timeout (so that somebody notices and investigates sooner rather than later), and some may want to do something in-between. Since both extremes and especially the range of actions between them are valid in many environments, our personal preferences are not really that important here: We should give the admins the knobs they need to do their job the way they see fit. Timeout i a clear sign that the helper or system behind it is broken (capacity overload / network congestion). Continuing to use such systems in a way which further increases network load is very often a bad choice. Hiding the issue an even worse choice. The above logic is very true in some deployment environments and completely bogus in others. Fortunately, there is no need to force everybody use this logic or spend hours discussing the best approach to dealing with proxy errors. We can just let admins the power to decide what works best for them! My vote on this is -0. Too many problems and still no clear use-case has been described[1] for this to be a reasonable addition. [1] helper is slow and helper times out are not use-case descriptions. They are symptoms of an underlying problem. They are both a use case and a symptom. I know it is difficult to imagine, but sometimes admins have to work around symptoms of the problems they cannot control or cannot fix. The ability to deal with imperfect helpers has been requested many times. Squid inability to function when optional helpers misbehave is a serious flaw that drives users away. As far as the overall functionality of the proposed feature is concerned, I am sure it would be welcomed by many, so I am happy to give it +1 if that is needed to offset your they should just make perfect helpers instead -0. Cheers, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [PATCH] url_rewrite_timeout directive
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18/11/2014 6:10 a.m., Tsantilas Christos wrote: On 11/16/2014 01:05 PM, Amos Jeffries wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 16/11/2014 7:38 a.m., Tsantilas Christos wrote: Hi all, This patch adds the url_rewrite_timeout directive. When configured, Squid keeps track of active requests and treats timed out requests to redirector as failed requests. url_rewrite_timeout format: url_rewrite_timeout timeout time-units on_timeout=fail|bypass|retry|use_configured_response [response=quoted-string] The url_rewrite_timeout directive can accept the on_timeout argument to allow user configure the action when the helper request times out. The available actions are: - fail: squid return a ERR_GATEWAY_FAILURE error page Why? It seems to me this should be doing one of the below: on_timeout=use_configured_response response=ERR on_timeout=use_configured_response response=BH although being able to redirect to a Squid default error page template has its attractions regardless of this. Perhapse BH with an error=ERR_GATEWAY_FAILURE kv-pair ? The on_timeout= option is easier to understand and configure. Also this patch add the retries operation inside helpers.cc code and make it easier to implement similar features for other helpers too... Also the BH retry can be implemented now using helpers retry operation. - bypass: the url is not rewritten. Identical to: on_timeout=use_configured_response response=OK - retry: retry the request to helper Equivalent to what is supposed to happen for: on_timeout=use_configured_response response=BH NP: if retry with different helper process is not already being done on BH then it needs to be added. It is not implemented for BH redirects. Also although it is make sense to not use the same server for a BH response, it is not clear why it is needed for a timedout server? We dont know why the timeout happened. There are a few cases where it may have happened due to internal helper state. Moving to a different helper guarantees that those cases will have changed in some ways - reducing the overall probability that it will repeat. I believe this is should have a different form. If there are many timedout responses from a server, then do not sent more requests for a while, or maybe shut it down. We can add a todo for this one. Both are worth doing. - use_configured_response: use a response which can be configured using the the response= option So as you can see, with a change to make the URL-rewriter capable of redirecting to one of the built-in error page templates we could completely drop the on_timeout setting. I still believe that the on-timeout is better options and easier to understand and configure. Whatever it gets called the main point was there is no need for 2 different options. Technical details = This patch: - adds mechanism inside helpers.cc code to handle timeouts: - return a pre-configured response on timeout - or retries on timeouts. - or timedout (Helper::TimedOut code) response to the caller. The caller can select to ignore the timedout request, or produce an error. - modify the client_side_request.cc to return ERR_GATEWAY_FAILURE error page. Also the error detail ERR_DETAIL_REDIRECTOR_TIMEDOUT is set to identify the type of error. For the record I am still extremely skeptical about the use-case behind this feature. Timeout i a clear sign that the helper or system behind it is broken (capacity overload / network congestion). Continuing to use such systems in a way which further increases network load is very often a bad choice. Hiding the issue an even worse choice. Maybe the administrator does not care about unanswered queries. I posit that admin who truely do not care about unanswered questions will not care enough to bother configuring this feature. The administrator should always care about these unanswered questions. Each one means worse proxy performance. Offering to hide them silently and automatically without anybody having to do any work about fixing the underlying cause implies they are not actually problems. Which is false implication. Imagine a huge squid proxy, which may have to process thousands of URLs per minute and the administrator decided that a 10% of helpers request can fail. In this case we are giving him a way to configure the squid behaviour in this case: may ignore the probelm, or use a predefined answer etc Now that same Squid proxy suddenly starts randomly wrongly rejecting just one request per minute out of all those thousands. Which one, why, and WTF can anybody do about it? Previous Squid would log a client request with _TIMEOUT, leave a helper in Busy or Pending state with full trace of the pending lookup in mgr reports, possibly even cache.log warnings about helpers queue length. Now all that is reduced to