Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-03-24 Thread Amos Jeffries
[re-send now the mailer is workign again in case you didnt get the first
one]

On 7/03/2015 7:38 a.m., Alex Rousskov wrote:
> On 01/21/2015 05:03 AM, Amos Jeffries wrote:
>> On 22/01/2015 12:57 a.m., Tsantilas Christos wrote:
>>
>>> I am posting a new patch.
>>
>>> This patch include fixes to follow Squid Coding Style but also have
>>> a fix for a small bug: In the previous patches I posted the pid
>>> creation done by master process but the pid file was not removed by
>>> master process. This patch fixes master process to remove pid file
>>> on exit.
>>
>>
>> +1.
> 
> 
> Amos,
> 
> This patch went into trunk (r13867 and r13868). Do you plan on
> merging it (together with the pending "start workers as root" follow up)
> into v3.5 branch? It is your call, of course, but it is essentially a
> bug fix (SMP Squid does not restart reliably on busy servers without
> these changes). We can provide a v3.5-specific [combined] patch if that
> will make a positive decision easier for you.
> 
> 
> Thank you,
> 
> Alex.
> 

It is a bug fix on the one hand, but on the other its also a
design/behaviour change affecting everybody.

The choice is in the grey area for me, which means I'm treating it
conservatively and not to risking a backport destabilizing 3.5 unless we
have a clear need for it.
 There has not been much in the way of feedback regarding restart
relibility from the mailing lists and bugzilla so far. If you are seeing
a different view with more pressure, sure it can go back. Otherwise it
can wait a few more months for trunk to branch.

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-03-07 Thread Eliezer Croitoru
I do not understand fully the patch but get the idea and since 3.5.2 is 
very stable for the public use as a simple forward proxy with ICAP and 
external_acl helpers it seems to that this patch will add for the 
stability of the 3.5 branch.


I had a scenario of a development server testing which I needed to 
wait\kill couple times while not knowing how long it would take for the 
squid different instances to shutdown, and pgrep or "ps aux|grep squid" 
should not be something that will monitor squid.


So:
+ 1

Eliezer

On 06/03/2015 20:38, Alex Rousskov wrote:

Amos,

 This patch went into trunk (r13867 and r13868). Do you plan on
merging it (together with the pending "start workers as root" follow up)
into v3.5 branch? It is your call, of course, but it is essentially a
bug fix (SMP Squid does not restart reliably on busy servers without
these changes). We can provide a v3.5-specific [combined] patch if that
will make a positive decision easier for you.


Thank you,

Alex.


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-03-06 Thread Alex Rousskov
On 01/21/2015 05:03 AM, Amos Jeffries wrote:
> On 22/01/2015 12:57 a.m., Tsantilas Christos wrote:
> 
>> I am posting a new patch.
> 
>> This patch include fixes to follow Squid Coding Style but also have
>> a fix for a small bug: In the previous patches I posted the pid
>> creation done by master process but the pid file was not removed by
>> master process. This patch fixes master process to remove pid file
>> on exit.
> 
> 
> +1.


Amos,

This patch went into trunk (r13867 and r13868). Do you plan on
merging it (together with the pending "start workers as root" follow up)
into v3.5 branch? It is your call, of course, but it is essentially a
bug fix (SMP Squid does not restart reliably on busy servers without
these changes). We can provide a v3.5-specific [combined] patch if that
will make a positive decision easier for you.


Thank you,

Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-21 Thread Amos Jeffries
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 22/01/2015 12:57 a.m., Tsantilas Christos wrote:
> 
> I am posting a new patch.
> 
> This patch include fixes to follow Squid Coding Style but also have
> a fix for a small bug: In the previous patches I posted the pid
> creation done by master process but the pid file was not removed by
> master process. This patch fixes master process to remove pid file
> on exit.
> 

+1.

Amos

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUv5WJAAoJELJo5wb/XPRjT3gIALl6BodMJe1ejq3JxYujzMNd
7T2Ktw27WH/uYoo0a5xuQtivpkHoSKdenHkhLer6rSoogastIC9Zb34crKfq3jdF
HR2UYJyd/SYt55+3K9v+NQXnpE0nq4ZKrSIDwpYW6+s01/x1kRX/g/VorKkpd736
PIR/NhDKuHFUtY3ZzkpNZvNlsTd7ocRpbYbCqm2mWo624rQ+pipXsOeR7/l4DRhm
qOYzsG2F4IzkUM+P9qX5GjG0ag1w56FUIJWbFyLOzACqnL9/XsIFd9607oFdvJyD
hgViW6mHMCh3/nu9t9Mc2SOMO0Q2tfKlfyAwsfEJoai1RA9/8zAWmQqw1LSTBcg=
=Wc8J
-END PGP SIGNATURE-
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-21 Thread Tsantilas Christos


I am posting a new patch.

This patch include fixes to follow Squid Coding Style but also have a 
fix for a small bug:
 In the previous patches I posted the pid creation done by master 
process but the pid file was not removed by master process. This patch 
fixes master process to remove pid file on exit.





On 01/21/2015 12:17 PM, Amos Jeffries wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 21/01/2015 10:57 p.m., Tsantilas Christos wrote:

On 01/20/2015 02:55 AM, Alex Rousskov wrote:

On 01/16/2015 08:51 AM, Amos Jeffries wrote:

On 16/01/2015 11:29 a.m., Alex Rousskov wrote:

In SMP, there is only one Coordinator process, created by
the Master process.



All SMP kids (Coordinator, workers, and diskers) are started
by the Master process. There are no multiple levels as far as
kid startup and waiting are concerned and, hence, there is no
"level deeper than the master can see".




Hmm, okay. Then I have no problem per-se to this change of
esponsibility.


Great, thank you.



I do still think the coordinator needs to remain active until
last out of the kids though, so they can still use it to
coordinate during their shutdowns. Having it be the first up
and last down would solve a few architectural problems where
kids need to to collaborate on things, like log rotations or
broadcasting their availability.


Agreed!



This patch does not prevent coordinator process exit before the
workers, but also current squid does not guarantees that the
coordinator will exit after workers. I agree that we need to
implement it, but looks that this is out of the scope of this
patch.

If there is not any objection I will apply this patch as is for
now.



Okay, lets give it a try.

Can you please though make sure all the new functions use the Squid
coding style instead of the one you keep slipping in. Squid style is
this in .cc files:

  type
  functionName(...)
  {
 ... code ...
  }

Cheers
Amos


Moved PID file management from Coordinator to Master.

This move is the first step necessary to avoid the following race condition
among PID file deletion and shared segment creation/destruction in SMP Squid:

  O1) The old Squid Coordinator removes its PID file and quits.
  N1) The system script notices Coordinator death and starts the new Squid.
  N2) Shared segments are created by the new Master process.
  O2) Shared segments are removed by the old Master process.
  N3) New worker/disker processes fail due to missing segments.

TODO: The second step (not a part of this change) is to delete shared memory
segments before PID file is deleted (all in the Master process after this
change).


Now the Master process receives signals and is responsible for forwarding them
to the kids.

The kids does not install default signal handler for shudown signals (SIGINT,
SIGTERM) after a signal received. If a second shutdown signal is received then
squid imediatelly terminates the event loop and exits.

When the "kill-parent-hack" is enabled the kids are sending the kill signal
to master process and master process forward it to other kids too.

Also a small regression added: The PID file can no longer be renamed using
hot reconfiguration. A full Squid restart is now required for that.

This is a Measurement Factory project.

=== modified file 'src/ipc/Kid.cc'
--- src/ipc/Kid.cc	2015-01-13 07:25:36 +
+++ src/ipc/Kid.cc	2015-01-21 11:47:29 +
@@ -33,41 +33,42 @@
 badFailures(0),
 pid(-1),
 startTime(0),
 isRunning(false),
 status(0)
 {
 }
 
 /// called when this kid got started, records PID
 void Kid::start(pid_t cpid)
 {
 assert(!running());
 assert(cpid > 0);
 
 isRunning = true;
 pid = cpid;
 time(&startTime);
 }
 
 /// called when kid terminates, sets exiting status
-void Kid::stop(status_type theExitStatus)
+void
+Kid::stop(PidStatus const theExitStatus)
 {
 assert(running());
 assert(startTime != 0);
 
 isRunning = false;
 
 time_t stop_time;
 time(&stop_time);
 if ((stop_time - startTime) < fastFailureTimeLimit)
 ++badFailures;
 else
 badFailures = 0; // the failures are not "frequent" [any more]
 
 status = theExitStatus;
 }
 
 /// returns true if tracking of kid is stopped
 bool Kid::running() const
 {
 return isRunning;

=== modified file 'src/ipc/Kid.h'
--- src/ipc/Kid.h	2015-01-13 07:25:36 +
+++ src/ipc/Kid.h	2015-01-13 16:35:58 +
@@ -1,60 +1,56 @@
 /*
  * Copyright (C) 1996-2015 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 #ifndef SQUID_IPC_KID_H
 #define SQUID_IPC_KID_H
 
 #include "SquidString.h"
+#include "tools.h"
 
 /// Squid child, including current forked process info and
 /// info persistent across restarts
 class Kid
 {
 public:
-#if _SQUID_NEXT_
-typedef union wait status_type;
-#else
-   

Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-21 Thread Tsantilas Christos

On 01/21/2015 12:17 PM, Amos Jeffries wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 21/01/2015 10:57 p.m., Tsantilas Christos wrote:

On 01/20/2015 02:55 AM, Alex Rousskov wrote:

On 01/16/2015 08:51 AM, Amos Jeffries wrote:

On 16/01/2015 11:29 a.m., Alex Rousskov wrote:

In SMP, there is only one Coordinator process, created by
the Master process.



All SMP kids (Coordinator, workers, and diskers) are started
by the Master process. There are no multiple levels as far as
kid startup and waiting are concerned and, hence, there is no
"level deeper than the master can see".




Hmm, okay. Then I have no problem per-se to this change of
esponsibility.


Great, thank you.



I do still think the coordinator needs to remain active until
last out of the kids though, so they can still use it to
coordinate during their shutdowns. Having it be the first up
and last down would solve a few architectural problems where
kids need to to collaborate on things, like log rotations or
broadcasting their availability.


Agreed!



This patch does not prevent coordinator process exit before the
workers, but also current squid does not guarantees that the
coordinator will exit after workers. I agree that we need to
implement it, but looks that this is out of the scope of this
patch.

If there is not any objection I will apply this patch as is for
now.



Okay, lets give it a try.

Can you please though make sure all the new functions use the Squid
coding style instead of the one you keep slipping in. Squid style is
this in .cc files:

  type
  functionName(...)
  {
 ... code ...
  }


Hmm...
I found 1-2 cases... I will fix while applying to trunk 




Cheers
Amos
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUv3zOAAoJELJo5wb/XPRjJtMIAJV3sKXDHeOq+E3HqVGjrDFc
ZlJt7k76LIHCGZmfkh+1fPv4wuUtWKVzW2TfHMGP1HI2bqzq6hBTKe+uVmqi3Xzm
yXWHSymbXJKlWW7/lBLY7innpNjLyiE3Jv46gto1I6R79eXipiVYUehSVhx0FGL6
yB6yli6RTLEqxZlhI/tCHvfVh1Y0Wp3r0+yJJmW4POiF6S9PSnFJE+PzlEsWJ5L1
s6KuMlQ90jHAhLOsfJWrTE1bo/xJskVKkOT6KTTO7DxK0AyFxUvxsXVwT8I1h8oo
8qIHVjPEOYFuoejowH6+rn0nDDcFNyt47O/POIl7KJ1SvO1Vkx6lfDTPmvz7uMA=
=L5r1
-END PGP SIGNATURE-
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev



___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-21 Thread Amos Jeffries
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 21/01/2015 10:57 p.m., Tsantilas Christos wrote:
> On 01/20/2015 02:55 AM, Alex Rousskov wrote:
>> On 01/16/2015 08:51 AM, Amos Jeffries wrote:
>>> On 16/01/2015 11:29 a.m., Alex Rousskov wrote:
 In SMP, there is only one Coordinator process, created by
 the Master process.
>> 
 All SMP kids (Coordinator, workers, and diskers) are started
 by the Master process. There are no multiple levels as far as
 kid startup and waiting are concerned and, hence, there is no
 "level deeper than the master can see".
>> 
>> 
>>> Hmm, okay. Then I have no problem per-se to this change of 
>>> esponsibility.
>> 
>> Great, thank you.
>> 
>> 
>>> I do still think the coordinator needs to remain active until
>>> last out of the kids though, so they can still use it to
>>> coordinate during their shutdowns. Having it be the first up
>>> and last down would solve a few architectural problems where
>>> kids need to to collaborate on things, like log rotations or
>>> broadcasting their availability.
>> 
>> Agreed!
> 
> 
> This patch does not prevent coordinator process exit before the
> workers, but also current squid does not guarantees that the
> coordinator will exit after workers. I agree that we need to
> implement it, but looks that this is out of the scope of this
> patch.
> 
> If there is not any objection I will apply this patch as is for
> now.
> 

Okay, lets give it a try.

Can you please though make sure all the new functions use the Squid
coding style instead of the one you keep slipping in. Squid style is
this in .cc files:

 type
 functionName(...)
 {
... code ...
 }

Cheers
Amos
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUv3zOAAoJELJo5wb/XPRjJtMIAJV3sKXDHeOq+E3HqVGjrDFc
ZlJt7k76LIHCGZmfkh+1fPv4wuUtWKVzW2TfHMGP1HI2bqzq6hBTKe+uVmqi3Xzm
yXWHSymbXJKlWW7/lBLY7innpNjLyiE3Jv46gto1I6R79eXipiVYUehSVhx0FGL6
yB6yli6RTLEqxZlhI/tCHvfVh1Y0Wp3r0+yJJmW4POiF6S9PSnFJE+PzlEsWJ5L1
s6KuMlQ90jHAhLOsfJWrTE1bo/xJskVKkOT6KTTO7DxK0AyFxUvxsXVwT8I1h8oo
8qIHVjPEOYFuoejowH6+rn0nDDcFNyt47O/POIl7KJ1SvO1Vkx6lfDTPmvz7uMA=
=L5r1
-END PGP SIGNATURE-
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-21 Thread Tsantilas Christos

On 01/20/2015 02:55 AM, Alex Rousskov wrote:

On 01/16/2015 08:51 AM, Amos Jeffries wrote:

On 16/01/2015 11:29 a.m., Alex Rousskov wrote:

In SMP, there is only one Coordinator process, created by the
Master process.



All SMP kids (Coordinator, workers, and diskers) are started by
the Master process. There are no multiple levels as far as kid
startup and waiting are concerned and, hence, there is no "level
deeper than the master can see".




Hmm, okay. Then I have no problem per-se to this change of esponsibility.


Great, thank you.



I do still think the coordinator needs to remain active until last out
of the kids though, so they can still use it to coordinate during
their shutdowns. Having it be the first up and last down would solve a
few architectural problems where kids need to to collaborate on
things, like log rotations or broadcasting their availability.


Agreed!



This patch does not prevent coordinator process exit before the workers, 
but also current squid does not guarantees that the coordinator will 
exit after workers.
I agree that we need to implement it, but looks that this is out of the 
scope of this patch.


If there is not any objection I will apply this patch as is for now.






Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev



___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-19 Thread Alex Rousskov
On 01/16/2015 08:51 AM, Amos Jeffries wrote:
> On 16/01/2015 11:29 a.m., Alex Rousskov wrote:
>> In SMP, there is only one Coordinator process, created by the
>> Master process.

>> All SMP kids (Coordinator, workers, and diskers) are started by
>> the Master process. There are no multiple levels as far as kid
>> startup and waiting are concerned and, hence, there is no "level
>> deeper than the master can see".


> Hmm, okay. Then I have no problem per-se to this change of esponsibility.

Great, thank you.


> I do still think the coordinator needs to remain active until last out
> of the kids though, so they can still use it to coordinate during
> their shutdowns. Having it be the first up and last down would solve a
> few architectural problems where kids need to to collaborate on
> things, like log rotations or broadcasting their availability.

Agreed!


Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-19 Thread Tsantilas Christos

On 01/16/2015 05:51 PM, Amos Jeffries wrote:



Hmm, okay. Then I have no problem per-se to this change of esponsibility.

I do still think the coordinator needs to remain active until last out
of the kids though, so they can still use it to coordinate during
their shutdowns. Having it be the first up and last down would solve a
few architectural problems where kids need to to collaborate on
things, like log rotations or broadcasting their availability.


I am not sure we are able to build something simple if we try to 
implement it.


The easier way is to delay sending the kill signal to coordinator until 
all other kids has exited.  The coordinator process in this case should 
exit immediately after the kill signal received.


Or we can just delay sending kill signal to coordinator process for 1 
second. This is will give it enough time to exit later that the other kids.


The type of kids is not known by the master process. But I suppose we 
can find it using the kid name.


Any ideas?



Amos


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-16 Thread Amos Jeffries
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 16/01/2015 11:29 a.m., Alex Rousskov wrote:
> On 01/14/2015 03:09 AM, Amos Jeffries wrote:
>>> On 01/14/2015 11:25 AM, Amos Jeffries wrote:
 Does the master process get exit status of *all* worker
 processes and the sub-childs down N levels? It was my
 understanding that in SMP each worker disker etc is a fork()
 and the child becomes new coordinator.
> 
> 
> Hi Amos,
> 
> In SMP, there is only one Coordinator process, created by the
> Master process.
> 
> 
>> I suspect we will find that some diskers etc are in fact spawned
>> by either coordinator or a worker and one level deeper than the
>> master can see.
> 
> 
> All SMP kids (Coordinator, workers, and diskers) are started by
> the Master process. There are no multiple levels as far as kid
> startup and waiting are concerned and, hence, there is no "level
> deeper than the master can see".
> 
> Needless to say, there are processes that are not SMP kids. For
> example, all helpers do not run Squid code and, hence, they are not
> kids and are not affected by the proposed fix.
> 
> 
> HTH,
> 
> Alex.
> 


Hmm, okay. Then I have no problem per-se to this change of esponsibility.

I do still think the coordinator needs to remain active until last out
of the kids though, so they can still use it to coordinate during
their shutdowns. Having it be the first up and last down would solve a
few architectural problems where kids need to to collaborate on
things, like log rotations or broadcasting their availability.

Amos
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUuTOSAAoJELJo5wb/XPRj76UIAJk+f4O4aiIp4Hmw7bZpD5zV
3MGcai6k7kTG8/4ncvnsZCT9lG1shgA41JeNpoUzCMvo9kfRJ7wHYpwXZT0UNeEW
mwE2TZXjwt1cWewyfwnb24Pph5vCBEhI7wqgANH3/+duZkAYzHuEl09G62LEpAR3
KBrmAJUP468brWXMLdDAJAUNo4c1e5U74vqI2k2Dnn25NjCxg1yLLx3RB4oDNGqI
WbLAR8hR4ni4gl5jkJwdFb9yFgROaciQxVrLchj/ub73j2yoPF4hBgwwx2hola+f
rJDRDiQugVbTlDl4jyv3QjRc3NLMA27qZKV0DCIxEIYZ7F+3niG4VVTo6mQhgEo=
=Naou
-END PGP SIGNATURE-
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-15 Thread Alex Rousskov
On 01/14/2015 03:09 AM, Amos Jeffries wrote:
>> On 01/14/2015 11:25 AM, Amos Jeffries wrote:
>>> Does the master process get exit status of *all* worker processes
>>> and the sub-childs down N levels? It was my understanding that in
>>> SMP each worker disker etc is a fork() and the child becomes new
>>> coordinator.


Hi Amos,

In SMP, there is only one Coordinator process, created by the Master
process.


> I suspect we will find that some diskers etc are in fact spawned by
> either coordinator or a worker and one level deeper than the master
> can see.


All SMP kids (Coordinator, workers, and diskers) are started by the
Master process. There are no multiple levels as far as kid startup and
waiting are concerned and, hence, there is no "level deeper than the
master can see".

Needless to say, there are processes that are not SMP kids. For example,
all helpers do not run Squid code and, hence, they are not kids and are
not affected by the proposed fix.


HTH,

Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-15 Thread Tsantilas Christos
---END PGP SIGNATURE-
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev



Moved PID file management from Coordinator to Master.

This move is the first step necessary to avoid the following race condition
among PID file deletion and shared segment creation/destruction in SMP Squid:

  O1) The old Squid Coordinator removes its PID file and quits.
  N1) The system script notices Coordinator death and starts the new Squid.
  N2) Shared segments are created by the new Master process.
  O2) Shared segments are removed by the old Master process.
  N3) New worker/disker processes fail due to missing segments.

TODO: The second step (not a part of this change) is to delete shared memory
segments before PID file is deleted (all in the Master process after this
change).


Now the Master process receives signals and is responsible for forwarding them
to the kids.

The kids does not install default signal handler for shudown signals (SIGINT,
SIGTERM) after a signal received. If a second shutdown signal is received then
squid imediatelly terminates the event loop and exits.

When the "kill-parent-hack" is enabled the kids are sending the kill signal
to master process and master process forward it to other kids too.

Also a small regression added: The PID file can no longer be renamed using
hot reconfiguration. A full Squid restart is now required for that.

This is a Measurement Factory project.

=== modified file 'src/ipc/Kid.cc'
--- src/ipc/Kid.cc	2015-01-13 07:25:36 +
+++ src/ipc/Kid.cc	2015-01-13 16:35:58 +
@@ -33,41 +33,41 @@
 badFailures(0),
 pid(-1),
 startTime(0),
 isRunning(false),
 status(0)
 {
 }
 
 /// called when this kid got started, records PID
 void Kid::start(pid_t cpid)
 {
 assert(!running());
 assert(cpid > 0);
 
 isRunning = true;
 pid = cpid;
 time(&startTime);
 }
 
 /// called when kid terminates, sets exiting status
-void Kid::stop(status_type theExitStatus)
+void Kid::stop(PidStatus const theExitStatus)
 {
 assert(running());
 assert(startTime != 0);
 
 isRunning = false;
 
 time_t stop_time;
 time(&stop_time);
 if ((stop_time - startTime) < fastFailureTimeLimit)
 ++badFailures;
 else
 badFailures = 0; // the failures are not "frequent" [any more]
 
 status = theExitStatus;
 }
 
 /// returns true if tracking of kid is stopped
 bool Kid::running() const
 {
 return isRunning;

=== modified file 'src/ipc/Kid.h'
--- src/ipc/Kid.h	2015-01-13 07:25:36 +
+++ src/ipc/Kid.h	2015-01-13 16:35:58 +
@@ -1,60 +1,56 @@
 /*
  * Copyright (C) 1996-2015 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 #ifndef SQUID_IPC_KID_H
 #define SQUID_IPC_KID_H
 
 #include "SquidString.h"
+#include "tools.h"
 
 /// Squid child, including current forked process info and
 /// info persistent across restarts
 class Kid
 {
 public:
-#if _SQUID_NEXT_
-typedef union wait status_type;
-#else
-typedef int status_type;
-#endif
 
 /// keep restarting until the number of bad failures exceed this limit
 enum { badFailureLimit = 4 };
 
 /// slower start failures are not "frequent enough" to be counted as "bad"
 enum { fastFailureTimeLimit = 10 }; // seconds
 
 public:
 Kid();
 
 Kid(const String& kid_name);
 
 /// called when this kid got started, records PID
 void start(pid_t cpid);
 
 /// called when kid terminates, sets exiting status
-void stop(status_type exitStatus);
+void stop(PidStatus const exitStatus);
 
 /// returns true if tracking of kid is stopped
 bool running() const;
 
 /// returns true if master should restart this kid
 bool shouldRestart() const;
 
 /// returns current pid for a running kid and last pid for a stopped kid
 pid_t getPid() const;
 
 /// whether the failures are "repeated and frequent"
 bool hopeless() const;
 
 /// returns true if the process terminated normally
 bool calledExit() const;
 
 /// returns the exit status of the process
 int exitStatus() const;
 
 /// whether the process exited with a given exit status code
@@ -67,39 +63,39 @@
 bool signaled() const;
 
 /// returns the number of the signal that caused the kid to terminate
 int termSignal() const;
 
 /// whether the process was terminated by a given signal
 bool signaled(int sgnl) const;
 
 /// returns kid name
 const String& name() const;
 
 private:
 // Information preserved across restarts
 String theName; ///< process name
 int badFailures; ///< number of "repeated fr

Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-14 Thread Amos Jeffries
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14/01/2015 10:54 p.m., Tsantilas Christos wrote:
> On 01/14/2015 11:25 AM, Amos Jeffries wrote:
>> On 14/01/2015 7:37 a.m., Tsantilas Christos wrote:
>>> On 01/12/2015 07:22 PM, Amos Jeffries wrote: On 12/01/2015
>>> 6:02
> ...
>>> The Master process has no way to know if the workers are
>>> exiting early with no clients, or aborting on worker-specific 
>>> shutdown_timeout values. But the coordinator can receive a 
>>> terminated message from them over SMP sockets.
>>> 
 We can use exit status.
>> 
>> Does the master process get exit status of *all* worker processes
>> and the sub-childs down N levels? It was my understanding that in
>> SMP each worker disker etc is a fork() and the child becomes new
>> coordinator.
>> 
> 
> Currently master process does not check exist status for any kid
> process and from what I know it can not get exit status down N
> levels. But it can get exit status of processes forked from master
> process, the coordinator and workers. This is should be enough. Why
> do we need to get exit status for sub-childs? If a worker process
> exited normally means that the sub-childs exited normally or
> without huge problems. Do you worry for still-running sub-childs
> after a worker process exited? Is it possible?

I suspect we will find that some diskers etc are in fact spawned by
either coordinator or a worker and one level deeper than the master
can see.

Amos
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUtkBIAAoJELJo5wb/XPRj6hIIAIvq9pA7/CdXz7ssgvlJhyer
P42VIFRVOV+0FOaUsQCGRN5tU/lhpdAcjDZdOZrodEkds3RGWm2lu+AbWL4K0BV/
tgiEF7c4wtKeW7AbwSdwnXCozWZ13vT1nuVA0Mw2inmJvgn6U8CjarN+nF+aWYIY
O6QYxcwWcfILaVmHHJFd2k9RWEMc+qDGC1tFosZK6hCu/mRs2DQu0nsLPKoA1fQ6
WGdLXM1ePEItSAQrHBw3uv2RG6vAYBuYUl3LHkVYkqCc269cFeKxbldf52MLiS5p
VrhYRGHPREAHcNJAwmnpm4YbEAO1g2RwsWU9mHPZP/YYA1kGaYV4LTZINh6s5HY=
=uMno
-END PGP SIGNATURE-
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-14 Thread Tsantilas Christos

On 01/14/2015 11:25 AM, Amos Jeffries wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14/01/2015 7:37 a.m., Tsantilas Christos wrote:

On 01/12/2015 07:22 PM, Amos Jeffries wrote: On 12/01/2015 6:02

...

The Master process has no way to know if the workers are exiting
early with no clients, or aborting on worker-specific
shutdown_timeout values. But the coordinator can receive a
terminated message from them over SMP sockets.


We can use exit status.


Does the master process get exit status of *all* worker processes and
the sub-childs down N levels? It was my understanding that in SMP each
worker disker etc is a fork() and the child becomes new coordinator.



Currently master process does not check exist status for any kid process 
and from what I know it can not get exit status down N levels.
But it can get exit status of processes forked from master process, the 
coordinator and workers.
This is should be enough. Why do we need to get exit status for 
sub-childs? If a worker process exited normally means that the 
sub-childs exited normally or without huge problems.
Do you worry for still-running sub-childs after a worker process exited? 
Is it possible?





Amos


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-14 Thread Amos Jeffries
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14/01/2015 7:37 a.m., Tsantilas Christos wrote:
> On 01/12/2015 07:22 PM, Amos Jeffries wrote: On 12/01/2015 6:02
> a.m., Tsantilas Christos wrote:
 Hi all, this patch moves pid file managment from coordinator 
 process to master process.
 
 This move is the first step necessary to avoid the following
 race condition among PID file deletion and shared segment 
 creation/destruction in SMP Squid:
 
 O1) The old Squid Coordinator removes its PID file and quits.
 N1) The system script notices Coordinator death and starts
 the new Squid. N2) Shared segments are created by the new
 Master process. O2) Shared segments are removed by the old
 Master process. N3) New worker/disker processes fail due to
 missing segments.
 
> 
> The Coordinator needs to continue coordinating activities over the
> SMP sockets until the workers are all shutdown and SMP sockets
> closed, only then should it do O2 and O1 (in that order).
> 
> The planned behaviour for worker shutdown is to: W1) early client
> FD closures into the beginning of the shutdown_timeout period W2)
> on each client closure or connection going idle, close it W3) at
> end of shutdown_timeout OR last client disconnect, release all 
> resources.
> 
> In that design the AsyncEngine still runs right up until the queue 
> completes draining. Using SMP sockets to inform Coordinator about 
> clean shutdown at the end.
> 
>> My sense is that the exit status can provide the same
>> functionality and also is easier to be implemented. If the worker
>> aborted early by a segfault then I am doubt that it will be able
>> to send a message to an SMP socket.
> 
> The Master process has no way to know if the workers are exiting
> early with no clients, or aborting on worker-specific
> shutdown_timeout values. But the coordinator can receive a
> terminated message from them over SMP sockets.
> 
>> We can use exit status.

Does the master process get exit status of *all* worker processes and
the sub-childs down N levels? It was my understanding that in SMP each
worker disker etc is a fork() and the child becomes new coordinator.


Amos
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUtjYjAAoJELJo5wb/XPRjWN0IAJUR13a2a45DT/XKqlYVYXGb
BoqPtKRUQQQsSrAqxJVrl59UXgY+yB38h7QWUrdBUV/+QaZvEhjMwVSLsL8qHbrM
bMd0HQLa5G8W9nuKQnuI+3iwUQj0F9+7Z45yGKFimJRi7sDGRkHH5dTl/zJqNLgu
oYJYZ3nB6QYsLV15SVf/cpBxrhELKZCfvb5Fcoy8LZojUfK9O0gXmdmBIJJtJyoa
1u9TbKsxtAdJOiUuhGcruLApj2bHoQL6IwEm9soV2IYOUxCamFKyaz4BqmUzmp05
txaYvQ3GbMc9k5IGGuXZHNiEVyNjUSg/97qXL150lYpKu7IYCJXqCj4PqwMu7NU=
=qDv3
-END PGP SIGNATURE-
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-13 Thread Tsantilas Christos

On 01/12/2015 07:22 PM, Amos Jeffries wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/01/2015 6:02 a.m., Tsantilas Christos wrote:

Hi all, this patch moves pid file managment from coordinator
process to master process.

This move is the first step necessary to avoid the following race
condition among PID file deletion and shared segment
creation/destruction in SMP Squid:

O1) The old Squid Coordinator removes its PID file and quits. N1)
The system script notices Coordinator death and starts the new
Squid. N2) Shared segments are created by the new Master process.
O2) Shared segments are removed by the old Master process. N3) New
worker/disker processes fail due to missing segments.



The Coordinator needs to continue coordinating activities over the SMP
sockets until the workers are all shutdown and SMP sockets closed,
only then should it do O2 and O1 (in that order).

The planned behaviour for worker shutdown is to:
  W1) early client FD closures into the beginning of the
shutdown_timeout period
  W2) on each client closure or connection going idle, close it
  W3) at end of shutdown_timeout OR last client disconnect, release all
resources.

In that design the AsyncEngine still runs right up until the queue
completes draining. Using SMP sockets to inform Coordinator about
clean shutdown at the end.


My sense is that the exit status can provide the same functionality and 
also is easier to be implemented.
If the worker aborted early by a segfault then I am doubt that it will 
be able to send a message to an SMP socket.



The Master process has no way to know if the workers are exiting early
with no clients, or aborting on worker-specific shutdown_timeout
values. But the coordinator can receive a terminated message from them
over SMP sockets.


We can use exit status.





TODO: The second step (not a part of this change) is to delete
shared memory segments before PID file is deleted (all in the
Master process after this change).

Now the Master process receives signals and is responsible for
forwarding them to the kids.


The command line control process also used manually for the -k options
  to send signals also thinks of itself as Master.

How does this new closing of SMP sockets interact with that other
meaning of Master process?


The master process is the simplest squid process. I believe that it is 
the best process for doing the cleanup.








Please for more informations read the patch preamble.

This is a Measurement Factory project


Some extra notes/ideas --

1) Multiple shutdown signals received by squid

In current squid when coordinator received a shutdown signal, then
replaced shutdown signal handlers with the default handlers. This
is has as result when a second shutdown signal received then the
coordinator process died immediately, without forwarding shutdown
signal to kids. The shutdown of the other kids are finished as
normal.

This patch when master process receives a shutdown signal forward
it to kids and master process is ready to receive a second shutdown
signal. When a second shutdown signal received to master and this
forwarded to kids then the kids died immediately.


Plan was to pass the signal to workers again where they kick off their
own shutdown_timeout event handlers immediately instead of hard
killing workers.


So do you believe that the workers should not restore default  handlers 
for shutdown signals. Am I correct?
It is easy to be implemented, it already implemented for 
"kill-parent-hack" where the master process is constrained to send 
multiple kill signals to kids.





FWIW: Ubuntu Gentoo, and RHEL people are enjoying their patches that
just ignore the repeated signals.




2) The system admin shows a blocked kid (infinity loop or not
responding). He kill with the hand.

Current squid does not restart the kids killed by a TERM or KILL
signal (squid considers it as normal kid shutdown). This patch does
not change this behaviour. The admin is still able to kill with a
"kill -11" and in this case the kid will restarted.

My opinion is that squid should restart kids in these cases. Should
not restart a kid only when a shutdown requested from system admin,
or when the kids dying very fast (hopeless()==true ).


TERM and KILL received by the workers often *are* signals sent by the
system admin, or scripts on their behalf. That may decrease in
popularity though when we fix the normal shutdown process issues. For
a while longer we have to take the current reality.


ok.







In related topics, I have been trying to figure out a --foreground
command line option that operates like -N but does not disable SMP,
just makes Coordinator == Master. But understanding the SMP
complexities have been blocking me so far.  Are you able and
interested in taking that forward?


I do not know :-)
I must ask Alex for this.

Regards,
Christos




Amos
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUtALXAAoJELJo

Re: [squid-dev] Moved PID file management from Coordinator to Master

2015-01-12 Thread Amos Jeffries
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/01/2015 6:02 a.m., Tsantilas Christos wrote:
> Hi all, this patch moves pid file managment from coordinator
> process to master process.
> 
> This move is the first step necessary to avoid the following race 
> condition among PID file deletion and shared segment 
> creation/destruction in SMP Squid:
> 
> O1) The old Squid Coordinator removes its PID file and quits. N1)
> The system script notices Coordinator death and starts the new
> Squid. N2) Shared segments are created by the new Master process. 
> O2) Shared segments are removed by the old Master process. N3) New
> worker/disker processes fail due to missing segments.
> 

The Coordinator needs to continue coordinating activities over the SMP
sockets until the workers are all shutdown and SMP sockets closed,
only then should it do O2 and O1 (in that order).

The planned behaviour for worker shutdown is to:
 W1) early client FD closures into the beginning of the
shutdown_timeout period
 W2) on each client closure or connection going idle, close it
 W3) at end of shutdown_timeout OR last client disconnect, release all
resources.

In that design the AsyncEngine still runs right up until the queue
completes draining. Using SMP sockets to inform Coordinator about
clean shutdown at the end.
The Master process has no way to know if the workers are exiting early
with no clients, or aborting on worker-specific shutdown_timeout
values. But the coordinator can receive a terminated message from them
over SMP sockets.


> TODO: The second step (not a part of this change) is to delete
> shared memory segments before PID file is deleted (all in the
> Master process after this change).
> 
> Now the Master process receives signals and is responsible for 
> forwarding them to the kids.

The command line control process also used manually for the -k options
 to send signals also thinks of itself as Master.

How does this new closing of SMP sockets interact with that other
meaning of Master process?



> 
> Please for more informations read the patch preamble.
> 
> This is a Measurement Factory project
> 
> 
> Some extra notes/ideas --
> 
> 1) Multiple shutdown signals received by squid
> 
> In current squid when coordinator received a shutdown signal, then 
> replaced shutdown signal handlers with the default handlers. This
> is has as result when a second shutdown signal received then the
> coordinator process died immediately, without forwarding shutdown
> signal to kids. The shutdown of the other kids are finished as
> normal.
> 
> This patch when master process receives a shutdown signal forward
> it to kids and master process is ready to receive a second shutdown
> signal. When a second shutdown signal received to master and this
> forwarded to kids then the kids died immediately.

Plan was to pass the signal to workers again where they kick off their
own shutdown_timeout event handlers immediately instead of hard
killing workers.

FWIW: Ubuntu Gentoo, and RHEL people are enjoying their patches that
just ignore the repeated signals.


> 
> 2) The system admin shows a blocked kid (infinity loop or not 
> responding). He kill with the hand.
> 
> Current squid does not restart the kids killed by a TERM or KILL 
> signal (squid considers it as normal kid shutdown). This patch does
> not change this behaviour. The admin is still able to kill with a
> "kill -11" and in this case the kid will restarted.
> 
> My opinion is that squid should restart kids in these cases. Should
> not restart a kid only when a shutdown requested from system admin,
> or when the kids dying very fast (hopeless()==true ).

TERM and KILL received by the workers often *are* signals sent by the
system admin, or scripts on their behalf. That may decrease in
popularity though when we fix the normal shutdown process issues. For
a while longer we have to take the current reality.




In related topics, I have been trying to figure out a --foreground
command line option that operates like -N but does not disable SMP,
just makes Coordinator == Master. But understanding the SMP
complexities have been blocking me so far.  Are you able and
interested in taking that forward?

Amos
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBAgAGBQJUtALXAAoJELJo5wb/XPRjwKsIAMPzuOaxvC7WpBHOpQZpG1IZ
1tgbtosaJu3JweE7At729HLL34mR+YagaJbTz4xF6c2mkpLxxYioT6IzSxKc6YCD
mYJr8WU8uuJVI662u7w+3UyLVLI+c3vIwrw8d8NDZaKyAkOIn//Xks9YIG7h+xse
ooK/AAhMaADiS5S1FqY9OM3Q5Pn0nI3R91EpzGIeL1U5bG+43GYiOic3YSKgxSzq
8Q3YemiLj7ex00ZBtCbQ955bB8Zz1Q9I8hWgXdAFHgQKrjNmjdUDHqEg5M6E33zf
Gwpr6M3bO1gbtp7ize9vX7YxIlUjK6TUsbOFPlt9QJYEzzVxoqcgzy0lavVEiXE=
=5Qeg
-END PGP SIGNATURE-
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] Moved PID file management from Coordinator to Master

2015-01-11 Thread Tsantilas Christos

Hi all,
 this patch moves pid file managment from coordinator process to master 
process.


This move is the first step necessary to avoid the following race 
condition among PID file deletion and shared segment 
creation/destruction in SMP Squid:


  O1) The old Squid Coordinator removes its PID file and quits.
  N1) The system script notices Coordinator death and starts the new Squid.
  N2) Shared segments are created by the new Master process.
  O2) Shared segments are removed by the old Master process.
  N3) New worker/disker processes fail due to missing segments.

TODO: The second step (not a part of this change) is to delete shared 
memory segments before PID file is deleted (all in the Master process 
after this change).


Now the Master process receives signals and is responsible for 
forwarding them to the kids.


Please for more informations read the patch preamble.

This is a Measurement Factory project


Some extra notes/ideas
--

1) Multiple shutdown signals received by squid

 In current squid when coordinator received a shutdown signal, then 
replaced shutdown signal handlers with the default handlers. This is has 
as result when a second shutdown signal received then the coordinator 
process died immediately, without forwarding shutdown signal to kids. 
The shutdown of the other kids are finished as normal.


  This patch when master process receives a shutdown signal forward it 
to kids and master process is ready to receive a second shutdown signal. 
When a second shutdown signal received to master and this forwarded to 
kids then the kids died immediately.



2) The system admin shows a blocked kid (infinity loop or not 
responding). He kill with the hand.


  Current squid does not restart the kids killed by a TERM or KILL 
signal (squid considers it as normal kid shutdown).
This patch does not change this behaviour. The admin is still able to 
kill with a "kill -11" and in this case the kid will restarted.


My opinion is that squid should restart kids in these cases. Should not 
restart a kid only when a shutdown requested from system admin, or when 
the kids dying very fast (hopeless()==true ).


Regards,
   Christos
Moved PID file management from Coordinator to Master.

This move is the first step necessary to avoid the following race condition
among PID file deletion and shared segment creation/destruction in SMP Squid:

  O1) The old Squid Coordinator removes its PID file and quits.
  N1) The system script notices Coordinator death and starts the new Squid.
  N2) Shared segments are created by the new Master process.
  O2) Shared segments are removed by the old Master process.
  N3) New worker/disker processes fail due to missing segments.

TODO: The second step (not a part of this change) is to delete shared memory
segments before PID file is deleted (all in the Master process after this
change).


Now the Master process receives signals and is responsible for forwarding them
to the kids.

When the "kill-parent-hack" is enabled the kids are sending the kill signal
to master process and master process forward it to other kids too. In this
case the kids does not install default signal handler for TERM and KILL signals
after receives a shutdown signal.

Also a small regression added: The PID file can no longer be renamed using
hot reconfiguration. A full Squid restart is now required for that.

This is a Measurement Factory project.

=== modified file 'src/ipc/Kid.cc'
--- src/ipc/Kid.cc	2014-12-20 12:12:02 +
+++ src/ipc/Kid.cc	2015-01-08 15:26:48 +
@@ -33,41 +33,41 @@
 badFailures(0),
 pid(-1),
 startTime(0),
 isRunning(false),
 status(0)
 {
 }
 
 /// called when this kid got started, records PID
 void Kid::start(pid_t cpid)
 {
 assert(!running());
 assert(cpid > 0);
 
 isRunning = true;
 pid = cpid;
 time(&startTime);
 }
 
 /// called when kid terminates, sets exiting status
-void Kid::stop(status_type theExitStatus)
+void Kid::stop(PidStatus const theExitStatus)
 {
 assert(running());
 assert(startTime != 0);
 
 isRunning = false;
 
 time_t stop_time;
 time(&stop_time);
 if ((stop_time - startTime) < fastFailureTimeLimit)
 ++badFailures;
 else
 badFailures = 0; // the failures are not "frequent" [any more]
 
 status = theExitStatus;
 }
 
 /// returns true if tracking of kid is stopped
 bool Kid::running() const
 {
 return isRunning;

=== modified file 'src/ipc/Kid.h'
--- src/ipc/Kid.h	2014-12-20 12:12:02 +
+++ src/ipc/Kid.h	2015-01-08 15:26:32 +
@@ -1,60 +1,56 @@
 /*
  * Copyright (C) 1996-2014 The Squid Software Foundation and contributors
  *
  * Squid software is distributed under GPLv2+ license and includes
  * contributions from numerous individuals and organizations.
  * Please see the COPYING and CONTRIBUTORS files for details.
  */
 
 #ifndef SQUID_IPC_KID_H
 #define SQUID_IPC_KID_H
 
 #include "SquidString.h"
+#include "tools.h"
 
 /// Squi