Fwd: [PATCH] immortal helpers

2010-02-19 Thread Kinkie
gah! Forgot to hit reply-all


-- Forwarded message --
From: Kinkie 
Date: Sat, Feb 20, 2010 at 8:25 AM
Subject: Re: [PATCH] immortal helpers
To: Amos Jeffries 


On Sat, Feb 20, 2010 at 5:12 AM, Amos Jeffries  wrote:
> This patch adds an advanced option to the Squid helper controls which
> overrides Squid dying when helpers crash.
>
> It has been found necessary in certain corner cases with PHP helpers (which
> have system imposed limited lifetimes) where a proxy has previously been
> under some load and helpers started then are timed out later under low load
> as a bunch. Squid may die and restart.

While I don't object to the option per se, couldn't it be changed so
that it has some kind of dampeners in place? e.g. allow X restarts in
Y seconds? That would allow to serve the purpose while avoiding to
transform squid in a fork bomb..

--
   /kinkie


Re: [PATCH] immortal helpers

2010-02-19 Thread Alex Rousskov
On 02/19/2010 09:12 PM, Amos Jeffries wrote:
> This patch adds an advanced option to the Squid helper controls which
> overrides Squid dying when helpers crash.
> 
> It has been found necessary in certain corner cases with PHP helpers
> (which have system imposed limited lifetimes) where a proxy has
> previously been under some load and helpers started then are timed out
> later under low load as a bunch. Squid may die and restart.
> 
> If the proxy has been started under existing high load conditions (such
> as a backup coming online) this case may also occur shortly after
> startup. Leading to a chain reaction of restarts until load drops below
> number of helpers needed to trigger a death.
> 
> These cases depend on external forces or helper design closing the
> helpers outside Squid control.
> 
> 
> There is one known issue with this option:
> 
>   If the helpers are really dying due to some fatal issue during their
> startup the use of this option would result in Squid hanging while
> infinitely re-starting helpers and doing no request processing.

Missing squid.conf documentation update?

I think "immortal" is a little strange choice for helpers that die all
the time :-). How about "phoenix"? It is not an adjective but it fits:

> phoenix
> 
>  * A phoenix is an imaginary bird which, according to ancient stories,
> burns itself to ashes every five hundred years and is then born again.
> 
>  * If you describe someone or something as a phoenix, you mean that they
> return again after seeming to disappear or be destroyed. N-SING literary

Cheers,

Alex.


Re: [PATCH] icap_oldest_service_failure option

2010-02-19 Thread Alex Rousskov
On 02/19/2010 06:19 PM, Amos Jeffries wrote:
> Alex Rousskov wrote:
>> Added icap_oldest_service_failure option to forget old ICAP errors.
>>
>> A busy or remote ICAP server may produce a steady but shallow stream of
>> errors. Any ICAP server may become nearly unusable in a short period of
>> time, producing a burst of errors. To avoid disabling a generally usable
>> service, it is important to distinguish these two cases. Just counting
>> the number of errors and suspending the service after
>> icap_service_failure_limit is reached often either suspends the service
>> in both cases or never suspends it at all, depending on the option
>> value.
>>
>> One way to distinguish a large burst of errors from a steady but shallow
>> error stream is to forget about old errors. The added
>> icap_oldest_service_failure option instructs Squid to ignore errors that
>> are "too old" to be counted as a part of a burst.
>>
>> Another way to look at this feature is to say that the combination of
>> the old icap_service_failure_limit and the new
>> icap_oldest_service_failure limits the ICAP error _rate_. For example,
>># suspend service usage after 10 failures in 5 seconds:
>>icap_service_failure_limit 10
>>icap_oldest_service_failure 5 seconds
>>
>> Squid does not remember every transaction error that occurred within the
>> allowed "oldest error" time period. That would be result in a precise
>> but too expensive implementation, especially during error bursts on a
>> busy server. Instead, Squid divides the period in ten slots, counts the
>> number of errors that occurred in each slot, and forget the oldest
>> slot(s) as needed. Thus, the algorithm has about 90% precision as far as
>> timing of the failures is concerned. That 90% precision ought to be good
>> enough for any deployment.
>>
>> The patch is for Squid v3.1+ but we will port to trunk if approved.
>>
> 
> +1. I definitely like the idea.
> 
> Would it be possible to deflate the options a little bit though?
> 
> If what is being achieved really is meant to be a rate I would expect
> the icap_service_failure_limit could better be extended instead of a new
> option added. To make it easier for users to understand whats going to
> happen.
> 
> Something like:
>   'icap_service_failure_limit' number [ ('/'|'per') period ]

It would be a little tricky to explain that although it does look like a
rate, it is not really, because 1/sec value may have a very different
effect than 100/100sec even though the average rate is the same.  I am
happy to do it if others think one compound option is better than two
simple ones in this case.

Thank you,

Alex.


[PATCH] immortal helpers

2010-02-19 Thread Amos Jeffries
This patch adds an advanced option to the Squid helper controls which 
overrides Squid dying when helpers crash.


It has been found necessary in certain corner cases with PHP helpers 
(which have system imposed limited lifetimes) where a proxy has 
previously been under some load and helpers started then are timed out 
later under low load as a bunch. Squid may die and restart.


If the proxy has been started under existing high load conditions (such 
as a backup coming online) this case may also occur shortly after 
startup. Leading to a chain reaction of restarts until load drops below 
number of helpers needed to trigger a death.


These cases depend on external forces or helper design closing the 
helpers outside Squid control.



There is one known issue with this option:

  If the helpers are really dying due to some fatal issue during their 
startup the use of this option would result in Squid hanging while 
infinitely re-starting helpers and doing no request processing.


Amos
=== modified file 'src/HelperChildConfig.cc'
--- src/HelperChildConfig.cc	2010-02-13 09:16:30 +
+++ src/HelperChildConfig.cc	2010-02-20 01:33:07 +
@@ -4,11 +4,12 @@
 
 #include 
 
-HelperChildConfig::HelperChildConfig(const unsigned int m, const unsigned int s, const unsigned int i, const unsigned int cc) :
+HelperChildConfig::HelperChildConfig(const unsigned int m, const unsigned int s, const unsigned int i, const unsigned int cc, const unsigned int im) :
 n_max(m),
 n_startup(s),
 n_idle(i),
 concurrency(cc),
+immortal(im),
 n_running(0),
 n_active(0)
 {}
@@ -62,6 +63,8 @@
 }
 } else if (strncmp(token, "concurrency=", 12) == 0) {
 concurrency = atoi(token + 12);
+} else if (strcmp(token, "immortal") == 0) {
+immortal = 1;
 } else {
 self_destruct();
 }

=== modified file 'src/HelperChildConfig.h'
--- src/HelperChildConfig.h	2010-02-13 09:16:30 +
+++ src/HelperChildConfig.h	2010-02-20 01:42:43 +
@@ -10,7 +10,7 @@
 class HelperChildConfig
 {
 public:
-HelperChildConfig(const unsigned int m=0, const unsigned int s=0, const unsigned int i=1, const unsigned int cc=0);
+HelperChildConfig(const unsigned int m=0, const unsigned int s=0, const unsigned int i=1, const unsigned int cc=0, const unsigned int im=0);
 ~HelperChildConfig();
 HelperChildConfig &operator =(const HelperChildConfig &rhs);
 
@@ -54,6 +54,16 @@
  */
 unsigned int concurrency;
 
+/**
+ * Whether Squid should die if these helpers are closing too fast.
+ * This permits use of some script languages which have a maximum script run lifetime (ie PHP)
+ *
+ * The danger is that when used on truely broken helpers it will cause Squid to enter an
+ * infinite loop of restarting helpers.
+ * Default: 0 - Squid will shutdown if helpers close at too fast a rate.
+ */
+unsigned int immortal;
+
 /* derived from active operations */
 public:
 
@@ -72,7 +82,7 @@
 
 /* Legacy parser interface */
 #define parse_HelperChildConfig(c) (c)->parseConfig()
-#define dump_HelperChildConfig(e,n,c)  storeAppendPrintf((e), "\n%s %d startup=%d idle=%d concurrency=%d\n", (n), (c).n_max, (c).n_startup, (c).n_idle, (c).concurrency)
+#define dump_HelperChildConfig(e,n,c)  storeAppendPrintf((e), "\n%s %d startup=%d idle=%d concurrency=%d%s\n", (n), (c).n_max, (c).n_startup, (c).n_idle, (c).concurrency, ((c).immortal?" immortal":""))
 #define free_HelperChildConfig(dummy)  // NO.
 
 #endif /* _SQUID_SRC_HELPERCHILDCONFIG_H */

=== modified file 'src/helper.cc'
--- src/helper.cc	2009-12-16 03:46:59 +
+++ src/helper.cc	2010-02-20 01:38:33 +
@@ -715,8 +715,14 @@
 if (hlp->childs.needNew() > 0) {
 debugs(80, 1, "Too few " << hlp->id_name << " processes are running (need " << hlp->childs.needNew() << "/" << hlp->childs.n_max << ")");
 
-if (hlp->childs.n_active < hlp->childs.n_startup && hlp->last_restart > squid_curtime - 30)
-fatalf("The %s helpers are crashing too rapidly, need help!\n", hlp->id_name);
+if (hlp->childs.n_active < hlp->childs.n_startup && hlp->last_restart > squid_curtime - 30) {
+if (!hlp->childs.immortal) {
+fatalf("The %s helpers are crashing too rapidly, need help!\n", hlp->id_name);
+} else {
+debugs(80,0, "WARNING: The " << hlp->id_name << " helpers are crashing too rapidly. They may be broken!");
+debugs(80,0, "WARNING: Immortality override applied.");
+}
+}
 
 debugs(80, 1, "Starting new helpers");
 helperOpenServers(hlp);
@@ -775,7 +781,13 @@
 debugs(80, 1, "Too few " << hlp->id_name << " processes are running (need " << hlp->childs.needNew() << "/" << hlp->childs.n_max << ")");
 
 if (hlp->childs.n_active < hlp->childs.n_startu

Re: [PATCH] icap_oldest_service_failure option

2010-02-19 Thread Amos Jeffries

Alex Rousskov wrote:

Added icap_oldest_service_failure option to forget old ICAP errors.

A busy or remote ICAP server may produce a steady but shallow stream of
errors. Any ICAP server may become nearly unusable in a short period of
time, producing a burst of errors. To avoid disabling a generally usable
service, it is important to distinguish these two cases. Just counting
the number of errors and suspending the service after
icap_service_failure_limit is reached often either suspends the service
in both cases or never suspends it at all, depending on the option
value.

One way to distinguish a large burst of errors from a steady but shallow
error stream is to forget about old errors. The added
icap_oldest_service_failure option instructs Squid to ignore errors that
are "too old" to be counted as a part of a burst.

Another way to look at this feature is to say that the combination of
the old icap_service_failure_limit and the new
icap_oldest_service_failure limits the ICAP error _rate_. For example,
   # suspend service usage after 10 failures in 5 seconds:
   icap_service_failure_limit 10
   icap_oldest_service_failure 5 seconds

Squid does not remember every transaction error that occurred within the
allowed "oldest error" time period. That would be result in a precise
but too expensive implementation, especially during error bursts on a
busy server. Instead, Squid divides the period in ten slots, counts the
number of errors that occurred in each slot, and forget the oldest
slot(s) as needed. Thus, the algorithm has about 90% precision as far as
timing of the failures is concerned. That 90% precision ought to be good
enough for any deployment.

The patch is for Squid v3.1+ but we will port to trunk if approved.



+1. I definitely like the idea.

Would it be possible to deflate the options a little bit though?

If what is being achieved really is meant to be a rate I would expect 
the icap_service_failure_limit could better be extended instead of a new 
option added. To make it easier for users to understand whats going to 
happen.


Something like:
  'icap_service_failure_limit' number [ ('/'|'per') period ]


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE8 or 3.0.STABLE24
  Current Beta Squid 3.1.0.16


Re: [PATCH] icap_oldest_service_failure option

2010-02-19 Thread Robert Collins
+1


signature.asc
Description: This is a digitally signed message part