Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-16 Thread Henrik Nordstrom
tor 2009-07-16 klockan 11:13 +1000 skrev Robert Collins:

 Note that with NTLM this is realistic - there's no need for multiple
 helpers as a single helper can serve many concurrent requests.

No longer the case as we no longer do challenge reuses and the
concurrent protocol is not implemented for stateful helpers, but for
basic, digest and url rewriters it is..

Regards
Henrik



Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-16 Thread Henrik Nordstrom
tor 2009-07-16 klockan 14:47 +1200 skrev Amos Jeffries:

 And the sad reality in squid-3 with _one_ helper:
  
 squid-*--1-helper---* winbindd [state1, state2]
  \---crash: all 2 of 1 helpers all RS pending state.

You can't run NTLM with just one helper, not until both the following
are true:

1. We implement the current request protocol for stateful helpers

2. Samba implements the same in ntlm_auth (iirc this has been done in
Samab4)

The reason to this is because the challenge is unique for each client
(required by NTLMv2) and the default NTLM helper protocol only supports
one challenge at a time.

But Squid should not crash horribly only because you do, only complain
about helper queue overload...

Regards
Henrik



Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-16 Thread Robert Collins
On Thu, 2009-07-16 at 08:42 +0200, Henrik Nordstrom wrote:
 tor 2009-07-16 klockan 14:47 +1200 skrev Amos Jeffries:
 
  And the sad reality in squid-3 with _one_ helper:
   
  squid-*--1-helper---* winbindd [state1, state2]
   \---crash: all 2 of 1 helpers all RS pending state.
 
 You can't run NTLM with just one helper, not until both the following
 are true:
 
 1. We implement the current request protocol for stateful helpers

I thought we put a unique id on the stateful helper protocol _way_ back.
Sigh :(

 2. Samba implements the same in ntlm_auth (iirc this has been done in
 Samab4)

We'd need to ask, I can do that if we think it hasn't been done. As
samba supplies the auth helpers there shouldn't be a compatibility issue
in changing it.

-Rob


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-16 Thread Henrik Nordstrom
tor 2009-07-16 klockan 16:54 +1000 skrev Robert Collins:

 I thought we put a unique id on the stateful helper protocol _way_ back.
 Sigh :(

No, I couldn't when I implemented the concurent request protocol so it
only got done for stateless helpers. iirc I got stuck in trying to
unwind some of the dependencies in the reservation code used by the
challenge reuse layer (which is thrown out these days..).

Regards
Henrik



[PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-16 Thread Amos Jeffries

Amos Jeffries wrote:

Robert Collins wrote:

On Wed, 2009-07-15 at 00:52 +0200, Henrik Nordstrom wrote:

Or 4, go back to don't strictly enforce the number of helpers?


+1

I don't know what this strictly-enforce thing is, but it sounds unneeded
as we used to fire up the right number of helpers anyway.


I stopped Squid saying:
  running: 200 of 100 XXX helpers


A Henrik said,
  people with large memory-hog helpers have issues when Squid allocates 
more than N bunches of their carefully tuned available memory to its 
helpers. This is also important in low-memory systems requiring auth.


It's a simple 'start N' call now checks the number of running helpers 
before blindly starting new ones. Making Squid actually follow its 
numerous children=N settings.



I'm fine with reverting it in 3.1. But this is a nasty mix of sync and 
async operations that does need cleaning up in 3.2. It's semi-hiding 
about 4 bugs in a helpers and auth.




Right, following up on that I had a hunch and discovered that n_active 
is already performing this duty. The bug is therefore in my earlier fix 
using n_running as a base instead of n_active.


How does this new attached patch look to you two?
It performs the calculation re-base and documents the relevant counters.

Amos
=== modified file 'src/helper.cc'
--- src/helper.cc	2009-06-14 12:35:17 +
+++ src/helper.cc	2009-07-16 10:53:22 +
@@ -103,7 +103,7 @@
 shortname = xstrdup(progname);
 
 /* dont ever start more than hlp-n_to_start processes. */
-int need_new = hlp-n_to_start - hlp-n_running;
+int need_new = hlp-n_to_start - hlp-n_active;
 
 debugs(84, 1, helperOpenServers: Starting   need_new  /  hlp-n_to_start   '  shortname  ' processes);
 
@@ -209,7 +209,8 @@
 shortname = xstrdup(progname);
 
 /* dont ever start more than hlp-n_to_start processes. */
-int need_new = hlp-n_to_start - hlp-n_running;
+/* n_active are the helpers which have not been shut down. */
+int need_new = hlp-n_to_start - hlp-n_active;
 
 debugs(84, 1, helperOpenServers: Starting   need_new  /  hlp-n_to_start   '  shortname  ' processes);
 
@@ -545,8 +546,8 @@
 
 storeAppendPrintf(sentry, program: %s\n,
   hlp-cmdline-key);
-storeAppendPrintf(sentry, number running: %d of %d\n,
-  hlp-n_running, hlp-n_to_start);
+storeAppendPrintf(sentry, number active: %d of %d (%d shutting down)\n,
+  hlp-n_active, hlp-n_to_start, (hlp-n_running - hlp-n_active) );
 storeAppendPrintf(sentry, requests sent: %d\n,
   hlp-stats.requests);
 storeAppendPrintf(sentry, replies received: %d\n,
@@ -587,7 +588,7 @@
 storeAppendPrintf(sentry,B = BUSY\n);
 storeAppendPrintf(sentry,W = WRITING\n);
 storeAppendPrintf(sentry,C = CLOSING\n);
-storeAppendPrintf(sentry,S = SHUTDOWN\n);
+storeAppendPrintf(sentry,S = SHUTDOWN PENDING\n);
 }
 
 void
@@ -598,8 +599,8 @@
 
 storeAppendPrintf(sentry, program: %s\n,
   hlp-cmdline-key);
-storeAppendPrintf(sentry, number running: %d of %d\n,
-  hlp-n_running, hlp-n_to_start);
+storeAppendPrintf(sentry, number active: %d of %d (%d shutting down)\n,
+  hlp-n_active, hlp-n_to_start, (hlp-n_running - hlp-n_active) );
 storeAppendPrintf(sentry, requests sent: %d\n,
   hlp-stats.requests);
 storeAppendPrintf(sentry, replies received: %d\n,
@@ -644,7 +645,7 @@
 storeAppendPrintf(sentry,B = BUSY\n);
 storeAppendPrintf(sentry,C = CLOSING\n);
 storeAppendPrintf(sentry,R = RESERVED or DEFERRED\n);
-storeAppendPrintf(sentry,S = SHUTDOWN\n);
+storeAppendPrintf(sentry,S = SHUTDOWN PENDING\n);
 storeAppendPrintf(sentry,P = PLACEHOLDER\n);
 }
 
@@ -671,7 +672,6 @@
 
 hlp-n_active--;
 assert(hlp-n_active = 0);
-
 srv-flags.shutdown = 1;	/* request it to shut itself down */
 
 if (srv-flags.closing) {

=== modified file 'src/helper.h'
--- src/helper.h	2009-04-07 13:16:59 +
+++ src/helper.h	2009-07-16 10:51:42 +
@@ -58,9 +58,9 @@
 dlink_list servers;
 dlink_list queue;
 const char *id_name;
-int n_to_start;
-int n_running;
-int n_active;
+int n_to_start;   /// Configuration setting of how many helper children should be running
+int n_running;/// Total helper children objects currently existing
+int n_active; /// Count of helper children active (not shutting down)
 int ipc_type;
 IpAddress addr;
 unsigned int concurrency;
@@ -80,9 +80,9 @@
 dlink_list servers;
 dlink_list queue;
 const char *id_name;
-int n_to_start;
-int n_running;
-int n_active;
+int n_to_start;   /// Configuration setting of how many helper children should be running
+int n_running;/// Total helper children objects currently 

Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-16 Thread Henrik Nordstrom
tor 2009-07-16 klockan 23:15 +1200 skrev Amos Jeffries:

 Right, following up on that I had a hunch and discovered that n_active 
 is already performing this duty. The bug is therefore in my earlier fix 
 using n_running as a base instead of n_active.
 
 How does this new attached patch look to you two?

Looks fine to me, assuming rotate still starts a new set of helpers
(wasn't clear to me). Will fix the too many helpers have crashed, need
to start new ones case, and leave the rotate  reconfigure case alone.

With this we can also start new helpers more or less immediately (at
some seconds delay just in case) when seeing them crash...

Regards
Henrik



Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-16 Thread Robert Collins
On Thu, 2009-07-16 at 09:11 +0200, Henrik Nordstrom wrote:
 tor 2009-07-16 klockan 16:54 +1000 skrev Robert Collins:
 
  I thought we put a unique id on the stateful helper protocol _way_ back.
  Sigh :(
 
 No, I couldn't when I implemented the concurent request protocol so it
 only got done for stateless helpers. iirc I got stuck in trying to
 unwind some of the dependencies in the reservation code used by the
 challenge reuse layer (which is thrown out these days..).

Well thats hopeful, sounds like it would be easier now :) That reuse
stuff was such a hack.

-Rob


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-16 Thread Amos Jeffries

Henrik Nordstrom wrote:

tor 2009-07-16 klockan 23:15 +1200 skrev Amos Jeffries:

Right, following up on that I had a hunch and discovered that n_active 
is already performing this duty. The bug is therefore in my earlier fix 
using n_running as a base instead of n_active.


How does this new attached patch look to you two?


Looks fine to me, assuming rotate still starts a new set of helpers
(wasn't clear to me).


Yes, no longer touching mainRotate() at all with these.


Will fix the too many helpers have crashed, need
to start new ones case, and leave the rotate  reconfigure case alone.

With this we can also start new helpers more or less immediately (at
some seconds delay just in case) when seeing them crash...


Ah, good point. :)

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
  Current Beta Squid 3.1.0.9


Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-15 Thread Amos Jeffries

Robert Collins wrote:

On Wed, 2009-07-15 at 00:52 +0200, Henrik Nordstrom wrote:

Or 4, go back to don't strictly enforce the number of helpers?


+1

I don't know what this strictly-enforce thing is, but it sounds unneeded
as we used to fire up the right number of helpers anyway.


I stopped Squid saying:
  running: 200 of 100 XXX helpers


A Henrik said,
  people with large memory-hog helpers have issues when Squid allocates 
more than N bunches of their carefully tuned available memory to its 
helpers. This is also important in low-memory systems requiring auth.


It's a simple 'start N' call now checks the number of running helpers 
before blindly starting new ones. Making Squid actually follow its 
numerous children=N settings.



I'm fine with reverting it in 3.1. But this is a nasty mix of sync and 
async operations that does need cleaning up in 3.2. It's semi-hiding 
about 4 bugs in a helpers and auth.


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
  Current Beta Squid 3.1.0.9


Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-15 Thread Bundle Buggy

Bundle Buggy has detected this merge request.

For details, see: 
http://bundlebuggy.aaronbentley.com/project/squid/request/%3C4A584880.2080707%40treenet.co.nz%3E

Project: Squid


Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-15 Thread Robert Collins
On Thu, 2009-07-16 at 14:08 +1200, Amos Jeffries wrote:
 
 Both reconfigure and helper recovery use startHelpers() where the
 limit
 needs to take place.
 The DOS bug fix broke *rotate* (reconfigure has an async step added by
 Alex
 that prevents it being a problem).

s/rotate/reconfigure then :) In my mind one is a subset of the other.

  If someone is running hundreds of helpers on openwrt/olpc then
 things
  are broken already :). I'd really suggest that such environments
  pipeline through a single helper rather than many concurrent
 helpers.
  Such platorms are single core and you'll get better usage of memory
  doing many requests in a single helper than one request each to many
  helpers.
 
 lol, NTLM concurrent? try it!

I did. IIRC the winbindd is fully capable of handling multiple
overlapping requests, and each NTLM helper is *solely* a thunk layer
between squid's format and the winbindd *state*.

ASCII art time, 3 requests:
Multiple helpers:
   /--1-helper--\
squid-*---2-helper---* winbindd [state1, state2, state3]
   \--3-helper--/
One helper:
squid-*--1-helper---* winbindd [state1, state2, state3]

-Rob



signature.asc
Description: This is a digitally signed message part


Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-15 Thread Adrian Chadd
NOte that winbind has a hard coded limit that is by default very low.

Opening 2n ntlm_auth helpers may make things blow up in horrible ways.



Adrian

2009/7/16 Robert Collins robe...@robertcollins.net:
 On Thu, 2009-07-16 at 14:08 +1200, Amos Jeffries wrote:

 Both reconfigure and helper recovery use startHelpers() where the
 limit
 needs to take place.
 The DOS bug fix broke *rotate* (reconfigure has an async step added by
 Alex
 that prevents it being a problem).

 s/rotate/reconfigure then :) In my mind one is a subset of the other.

  If someone is running hundreds of helpers on openwrt/olpc then
 things
  are broken already :). I'd really suggest that such environments
  pipeline through a single helper rather than many concurrent
 helpers.
  Such platorms are single core and you'll get better usage of memory
  doing many requests in a single helper than one request each to many
  helpers.

 lol, NTLM concurrent? try it!

 I did. IIRC the winbindd is fully capable of handling multiple
 overlapping requests, and each NTLM helper is *solely* a thunk layer
 between squid's format and the winbindd *state*.

 ASCII art time, 3 requests:
 Multiple helpers:
       /--1-helper--\
 squid-*---2-helper---* winbindd [state1, state2, state3]
       \--3-helper--/
 One helper:
 squid-*--1-helper---* winbindd [state1, state2, state3]

 -Rob




Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-15 Thread Amos Jeffries
On Thu, 16 Jul 2009 12:19:55 +1000, Robert Collins
robe...@robertcollins.net wrote:
 On Thu, 2009-07-16 at 14:08 +1200, Amos Jeffries wrote:
 
 Both reconfigure and helper recovery use startHelpers() where the
 limit
 needs to take place.
 The DOS bug fix broke *rotate* (reconfigure has an async step added by
 Alex
 that prevents it being a problem).
 
 s/rotate/reconfigure then :) In my mind one is a subset of the other.
 
  If someone is running hundreds of helpers on openwrt/olpc then
 things
  are broken already :). I'd really suggest that such environments
  pipeline through a single helper rather than many concurrent
 helpers.
  Such platorms are single core and you'll get better usage of memory
  doing many requests in a single helper than one request each to many
  helpers.
 
 lol, NTLM concurrent? try it!
 
 I did. IIRC the winbindd is fully capable of handling multiple
 overlapping requests, and each NTLM helper is *solely* a thunk layer
 between squid's format and the winbindd *state*.
 
 ASCII art time, 3 requests:
 Multiple helpers:
/--1-helper--\
 squid-*---2-helper---* winbindd [state1, state2, state3]
\--3-helper--/
 One helper:
 squid-*--1-helper---* winbindd [state1, state2, state3]
 
 -Rob

And the sad reality in squid-3 with _one_ helper:
 
squid-*--1-helper---* winbindd [state1, state2]
 \---crash: all 2 of 1 helpers all RS pending state.

But thats bug 2648. With a useful perl client script provided to replicate.

Amos



Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-14 Thread Henrik Nordstrom
lör 2009-07-11 klockan 20:08 +1200 skrev Amos Jeffries:
 It's most visible in rotate because Squid is intended to keep running 
 with a hot-swap of its logs. Previously the sequence was causing two 
 full sets of helpers to be started, and a period of overlap before the 
 async closure of the old set happened.

Yes, this is what Squid-2 does. It schedules the existing set to be
closed (no more new requests assigned to them), and starts a new set.
The old set dies as soon as it's completed doing what it's doing at the
moment.. (including statful associations).

 But now that we are enforcing the number of helpers started to prevent 
 memory bloat the overlap prevents the new set from starting at all.

Hmm.. why?

haven't been much of a problem in all the years we have been doing
this.. except possibly for some poorly configured SquidGuard setups with
large text lists without db backends...

 The safety checks in helper itself which prevent live restart attempts 
 are tuned to only do so on incremental helper deaths, not to wholesale 
 100% loss of helpers.

Yes.

 Our options are:
   1 don't shutdown helpers on rotate (this ha been another open bug)
   2 make helpers code extremely aggressive to keep helpers running (at 
 cost of being able to detect failures)
   3 make mainRotate async

Or 4, go back to don't strictly enforce the number of helpers?

 Is there something I've missed that means some helpers MUST be restarted 
 on rotate?

Helpers are restarted because their stderr is connected to cache.log in
append mode.

Regards
Henrik



Re: [PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-14 Thread Robert Collins
On Wed, 2009-07-15 at 00:52 +0200, Henrik Nordstrom wrote:
 
 Or 4, go back to don't strictly enforce the number of helpers?

+1

I don't know what this strictly-enforce thing is, but it sounds unneeded
as we used to fire up the right number of helpers anyway.

-Rob


signature.asc
Description: This is a digitally signed message part


[PATCH] Bug 2680: ** helper errors after -k rotate

2009-07-11 Thread Amos Jeffries

http://www.squid-cache.org/bugs/show_bug.cgi?id=2680

I've tracked this one down to the helper shutdown being async and used 
from mainRotate() which is a sync operation.


We are probably hitting the same issue on reconfigure and shutdown.

It's most visible in rotate because Squid is intended to keep running 
with a hot-swap of its logs. Previously the sequence was causing two 
full sets of helpers to be started, and a period of overlap before the 
async closure of the old set happened.


But now that we are enforcing the number of helpers started to prevent 
memory bloat the overlap prevents the new set from starting at all.


The safety checks in helper itself which prevent live restart attempts 
are tuned to only do so on incremental helper deaths, not to wholesale 
100% loss of helpers.


Our options are:
 1 don't shutdown helpers on rotate (this ha been another open bug)
 2 make helpers code extremely aggressive to keep helpers running (at 
cost of being able to detect failures)

 3 make mainRotate async

(2) and (3) wil need to be done eventually, however as far as I can see 
there are only cons involved with keeping the status-quo of current 
rotate behavior:

 * extra work for Squid restarting stuff.
 * stateful helpers may be dropped mid-action leading to client errors.
 (from a simple log rotation!!)
 * longer time for the operation to complete back to a stable request 
processing state.



Attached is a patch for those worst affected which removes the helpers 
from rotate sequence (option 1).  I've checked and the only ways I can 
see at this point for helper to do any logging is via syslog or stderr 
relay back to Squid.


Is there something I've missed that means some helpers MUST be restarted 
on rotate?


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
  Current Beta Squid 3.1.0.9
=== modified file 'src/main.cc'
--- src/main.cc	2009-06-28 08:11:27 +
+++ src/main.cc	2009-07-11 07:31:13 +
@@ -775,15 +775,6 @@
 static void
 mainRotate(void)
 {
-icmpEngine.Close();
-#if USE_DNSSERVERS
-
-dnsShutdown();
-#endif
-
-redirectShutdown();
-authenticateShutdown();
-externalAclShutdown();
 _db_rotate_log();		/* cache.log */
 storeDirWriteCleanLogs(1);
 storeLogRotate();		/* store.log */
@@ -791,19 +782,8 @@
 useragentRotateLog();	/* useragent.log */
 refererRotateLog();		/* referer.log */
 #if WIP_FWD_LOG
-
 fwdLogRotate();
 #endif
-
-icmpEngine.Open();
-#if USE_DNSSERVERS
-
-dnsInit();
-#endif
-
-redirectInit();
-authenticateInit(Config.authConfiguration);
-externalAclInit();
 }
 
 static void