Re: [squid-dev] Moved PID file management from Coordinator to Master
[re-send now the mailer is workign again in case you didnt get the first one] On 7/03/2015 7:38 a.m., Alex Rousskov wrote: > On 01/21/2015 05:03 AM, Amos Jeffries wrote: >> On 22/01/2015 12:57 a.m., Tsantilas Christos wrote: >> >>> I am posting a new patch. >> >>> This patch include fixes to follow Squid Coding Style but also have >>> a fix for a small bug: In the previous patches I posted the pid >>> creation done by master process but the pid file was not removed by >>> master process. This patch fixes master process to remove pid file >>> on exit. >> >> >> +1. > > > Amos, > > This patch went into trunk (r13867 and r13868). Do you plan on > merging it (together with the pending "start workers as root" follow up) > into v3.5 branch? It is your call, of course, but it is essentially a > bug fix (SMP Squid does not restart reliably on busy servers without > these changes). We can provide a v3.5-specific [combined] patch if that > will make a positive decision easier for you. > > > Thank you, > > Alex. > It is a bug fix on the one hand, but on the other its also a design/behaviour change affecting everybody. The choice is in the grey area for me, which means I'm treating it conservatively and not to risking a backport destabilizing 3.5 unless we have a clear need for it. There has not been much in the way of feedback regarding restart relibility from the mailing lists and bugzilla so far. If you are seeing a different view with more pressure, sure it can go back. Otherwise it can wait a few more months for trunk to branch. Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
I do not understand fully the patch but get the idea and since 3.5.2 is very stable for the public use as a simple forward proxy with ICAP and external_acl helpers it seems to that this patch will add for the stability of the 3.5 branch. I had a scenario of a development server testing which I needed to wait\kill couple times while not knowing how long it would take for the squid different instances to shutdown, and pgrep or "ps aux|grep squid" should not be something that will monitor squid. So: + 1 Eliezer On 06/03/2015 20:38, Alex Rousskov wrote: Amos, This patch went into trunk (r13867 and r13868). Do you plan on merging it (together with the pending "start workers as root" follow up) into v3.5 branch? It is your call, of course, but it is essentially a bug fix (SMP Squid does not restart reliably on busy servers without these changes). We can provide a v3.5-specific [combined] patch if that will make a positive decision easier for you. Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
On 01/21/2015 05:03 AM, Amos Jeffries wrote: > On 22/01/2015 12:57 a.m., Tsantilas Christos wrote: > >> I am posting a new patch. > >> This patch include fixes to follow Squid Coding Style but also have >> a fix for a small bug: In the previous patches I posted the pid >> creation done by master process but the pid file was not removed by >> master process. This patch fixes master process to remove pid file >> on exit. > > > +1. Amos, This patch went into trunk (r13867 and r13868). Do you plan on merging it (together with the pending "start workers as root" follow up) into v3.5 branch? It is your call, of course, but it is essentially a bug fix (SMP Squid does not restart reliably on busy servers without these changes). We can provide a v3.5-specific [combined] patch if that will make a positive decision easier for you. Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 22/01/2015 12:57 a.m., Tsantilas Christos wrote: > > I am posting a new patch. > > This patch include fixes to follow Squid Coding Style but also have > a fix for a small bug: In the previous patches I posted the pid > creation done by master process but the pid file was not removed by > master process. This patch fixes master process to remove pid file > on exit. > +1. Amos -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUv5WJAAoJELJo5wb/XPRjT3gIALl6BodMJe1ejq3JxYujzMNd 7T2Ktw27WH/uYoo0a5xuQtivpkHoSKdenHkhLer6rSoogastIC9Zb34crKfq3jdF HR2UYJyd/SYt55+3K9v+NQXnpE0nq4ZKrSIDwpYW6+s01/x1kRX/g/VorKkpd736 PIR/NhDKuHFUtY3ZzkpNZvNlsTd7ocRpbYbCqm2mWo624rQ+pipXsOeR7/l4DRhm qOYzsG2F4IzkUM+P9qX5GjG0ag1w56FUIJWbFyLOzACqnL9/XsIFd9607oFdvJyD hgViW6mHMCh3/nu9t9Mc2SOMO0Q2tfKlfyAwsfEJoai1RA9/8zAWmQqw1LSTBcg= =Wc8J -END PGP SIGNATURE- ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
I am posting a new patch. This patch include fixes to follow Squid Coding Style but also have a fix for a small bug: In the previous patches I posted the pid creation done by master process but the pid file was not removed by master process. This patch fixes master process to remove pid file on exit. On 01/21/2015 12:17 PM, Amos Jeffries wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 21/01/2015 10:57 p.m., Tsantilas Christos wrote: On 01/20/2015 02:55 AM, Alex Rousskov wrote: On 01/16/2015 08:51 AM, Amos Jeffries wrote: On 16/01/2015 11:29 a.m., Alex Rousskov wrote: In SMP, there is only one Coordinator process, created by the Master process. All SMP kids (Coordinator, workers, and diskers) are started by the Master process. There are no multiple levels as far as kid startup and waiting are concerned and, hence, there is no "level deeper than the master can see". Hmm, okay. Then I have no problem per-se to this change of esponsibility. Great, thank you. I do still think the coordinator needs to remain active until last out of the kids though, so they can still use it to coordinate during their shutdowns. Having it be the first up and last down would solve a few architectural problems where kids need to to collaborate on things, like log rotations or broadcasting their availability. Agreed! This patch does not prevent coordinator process exit before the workers, but also current squid does not guarantees that the coordinator will exit after workers. I agree that we need to implement it, but looks that this is out of the scope of this patch. If there is not any objection I will apply this patch as is for now. Okay, lets give it a try. Can you please though make sure all the new functions use the Squid coding style instead of the one you keep slipping in. Squid style is this in .cc files: type functionName(...) { ... code ... } Cheers Amos Moved PID file management from Coordinator to Master. This move is the first step necessary to avoid the following race condition among PID file deletion and shared segment creation/destruction in SMP Squid: O1) The old Squid Coordinator removes its PID file and quits. N1) The system script notices Coordinator death and starts the new Squid. N2) Shared segments are created by the new Master process. O2) Shared segments are removed by the old Master process. N3) New worker/disker processes fail due to missing segments. TODO: The second step (not a part of this change) is to delete shared memory segments before PID file is deleted (all in the Master process after this change). Now the Master process receives signals and is responsible for forwarding them to the kids. The kids does not install default signal handler for shudown signals (SIGINT, SIGTERM) after a signal received. If a second shutdown signal is received then squid imediatelly terminates the event loop and exits. When the "kill-parent-hack" is enabled the kids are sending the kill signal to master process and master process forward it to other kids too. Also a small regression added: The PID file can no longer be renamed using hot reconfiguration. A full Squid restart is now required for that. This is a Measurement Factory project. === modified file 'src/ipc/Kid.cc' --- src/ipc/Kid.cc 2015-01-13 07:25:36 + +++ src/ipc/Kid.cc 2015-01-21 11:47:29 + @@ -33,41 +33,42 @@ badFailures(0), pid(-1), startTime(0), isRunning(false), status(0) { } /// called when this kid got started, records PID void Kid::start(pid_t cpid) { assert(!running()); assert(cpid > 0); isRunning = true; pid = cpid; time(&startTime); } /// called when kid terminates, sets exiting status -void Kid::stop(status_type theExitStatus) +void +Kid::stop(PidStatus const theExitStatus) { assert(running()); assert(startTime != 0); isRunning = false; time_t stop_time; time(&stop_time); if ((stop_time - startTime) < fastFailureTimeLimit) ++badFailures; else badFailures = 0; // the failures are not "frequent" [any more] status = theExitStatus; } /// returns true if tracking of kid is stopped bool Kid::running() const { return isRunning; === modified file 'src/ipc/Kid.h' --- src/ipc/Kid.h 2015-01-13 07:25:36 + +++ src/ipc/Kid.h 2015-01-13 16:35:58 + @@ -1,60 +1,56 @@ /* * Copyright (C) 1996-2015 The Squid Software Foundation and contributors * * Squid software is distributed under GPLv2+ license and includes * contributions from numerous individuals and organizations. * Please see the COPYING and CONTRIBUTORS files for details. */ #ifndef SQUID_IPC_KID_H #define SQUID_IPC_KID_H #include "SquidString.h" +#include "tools.h" /// Squid child, including current forked process info and /// info persistent across restarts class Kid { public: -#if _SQUID_NEXT_ -typedef union wait status_type; -#else -
Re: [squid-dev] Moved PID file management from Coordinator to Master
On 01/21/2015 12:17 PM, Amos Jeffries wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 21/01/2015 10:57 p.m., Tsantilas Christos wrote: On 01/20/2015 02:55 AM, Alex Rousskov wrote: On 01/16/2015 08:51 AM, Amos Jeffries wrote: On 16/01/2015 11:29 a.m., Alex Rousskov wrote: In SMP, there is only one Coordinator process, created by the Master process. All SMP kids (Coordinator, workers, and diskers) are started by the Master process. There are no multiple levels as far as kid startup and waiting are concerned and, hence, there is no "level deeper than the master can see". Hmm, okay. Then I have no problem per-se to this change of esponsibility. Great, thank you. I do still think the coordinator needs to remain active until last out of the kids though, so they can still use it to coordinate during their shutdowns. Having it be the first up and last down would solve a few architectural problems where kids need to to collaborate on things, like log rotations or broadcasting their availability. Agreed! This patch does not prevent coordinator process exit before the workers, but also current squid does not guarantees that the coordinator will exit after workers. I agree that we need to implement it, but looks that this is out of the scope of this patch. If there is not any objection I will apply this patch as is for now. Okay, lets give it a try. Can you please though make sure all the new functions use the Squid coding style instead of the one you keep slipping in. Squid style is this in .cc files: type functionName(...) { ... code ... } Hmm... I found 1-2 cases... I will fix while applying to trunk Cheers Amos -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUv3zOAAoJELJo5wb/XPRjJtMIAJV3sKXDHeOq+E3HqVGjrDFc ZlJt7k76LIHCGZmfkh+1fPv4wuUtWKVzW2TfHMGP1HI2bqzq6hBTKe+uVmqi3Xzm yXWHSymbXJKlWW7/lBLY7innpNjLyiE3Jv46gto1I6R79eXipiVYUehSVhx0FGL6 yB6yli6RTLEqxZlhI/tCHvfVh1Y0Wp3r0+yJJmW4POiF6S9PSnFJE+PzlEsWJ5L1 s6KuMlQ90jHAhLOsfJWrTE1bo/xJskVKkOT6KTTO7DxK0AyFxUvxsXVwT8I1h8oo 8qIHVjPEOYFuoejowH6+rn0nDDcFNyt47O/POIl7KJ1SvO1Vkx6lfDTPmvz7uMA= =L5r1 -END PGP SIGNATURE- ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 21/01/2015 10:57 p.m., Tsantilas Christos wrote: > On 01/20/2015 02:55 AM, Alex Rousskov wrote: >> On 01/16/2015 08:51 AM, Amos Jeffries wrote: >>> On 16/01/2015 11:29 a.m., Alex Rousskov wrote: In SMP, there is only one Coordinator process, created by the Master process. >> All SMP kids (Coordinator, workers, and diskers) are started by the Master process. There are no multiple levels as far as kid startup and waiting are concerned and, hence, there is no "level deeper than the master can see". >> >> >>> Hmm, okay. Then I have no problem per-se to this change of >>> esponsibility. >> >> Great, thank you. >> >> >>> I do still think the coordinator needs to remain active until >>> last out of the kids though, so they can still use it to >>> coordinate during their shutdowns. Having it be the first up >>> and last down would solve a few architectural problems where >>> kids need to to collaborate on things, like log rotations or >>> broadcasting their availability. >> >> Agreed! > > > This patch does not prevent coordinator process exit before the > workers, but also current squid does not guarantees that the > coordinator will exit after workers. I agree that we need to > implement it, but looks that this is out of the scope of this > patch. > > If there is not any objection I will apply this patch as is for > now. > Okay, lets give it a try. Can you please though make sure all the new functions use the Squid coding style instead of the one you keep slipping in. Squid style is this in .cc files: type functionName(...) { ... code ... } Cheers Amos -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUv3zOAAoJELJo5wb/XPRjJtMIAJV3sKXDHeOq+E3HqVGjrDFc ZlJt7k76LIHCGZmfkh+1fPv4wuUtWKVzW2TfHMGP1HI2bqzq6hBTKe+uVmqi3Xzm yXWHSymbXJKlWW7/lBLY7innpNjLyiE3Jv46gto1I6R79eXipiVYUehSVhx0FGL6 yB6yli6RTLEqxZlhI/tCHvfVh1Y0Wp3r0+yJJmW4POiF6S9PSnFJE+PzlEsWJ5L1 s6KuMlQ90jHAhLOsfJWrTE1bo/xJskVKkOT6KTTO7DxK0AyFxUvxsXVwT8I1h8oo 8qIHVjPEOYFuoejowH6+rn0nDDcFNyt47O/POIl7KJ1SvO1Vkx6lfDTPmvz7uMA= =L5r1 -END PGP SIGNATURE- ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
On 01/20/2015 02:55 AM, Alex Rousskov wrote: On 01/16/2015 08:51 AM, Amos Jeffries wrote: On 16/01/2015 11:29 a.m., Alex Rousskov wrote: In SMP, there is only one Coordinator process, created by the Master process. All SMP kids (Coordinator, workers, and diskers) are started by the Master process. There are no multiple levels as far as kid startup and waiting are concerned and, hence, there is no "level deeper than the master can see". Hmm, okay. Then I have no problem per-se to this change of esponsibility. Great, thank you. I do still think the coordinator needs to remain active until last out of the kids though, so they can still use it to coordinate during their shutdowns. Having it be the first up and last down would solve a few architectural problems where kids need to to collaborate on things, like log rotations or broadcasting their availability. Agreed! This patch does not prevent coordinator process exit before the workers, but also current squid does not guarantees that the coordinator will exit after workers. I agree that we need to implement it, but looks that this is out of the scope of this patch. If there is not any objection I will apply this patch as is for now. Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
On 01/16/2015 08:51 AM, Amos Jeffries wrote: > On 16/01/2015 11:29 a.m., Alex Rousskov wrote: >> In SMP, there is only one Coordinator process, created by the >> Master process. >> All SMP kids (Coordinator, workers, and diskers) are started by >> the Master process. There are no multiple levels as far as kid >> startup and waiting are concerned and, hence, there is no "level >> deeper than the master can see". > Hmm, okay. Then I have no problem per-se to this change of esponsibility. Great, thank you. > I do still think the coordinator needs to remain active until last out > of the kids though, so they can still use it to coordinate during > their shutdowns. Having it be the first up and last down would solve a > few architectural problems where kids need to to collaborate on > things, like log rotations or broadcasting their availability. Agreed! Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
On 01/16/2015 05:51 PM, Amos Jeffries wrote: Hmm, okay. Then I have no problem per-se to this change of esponsibility. I do still think the coordinator needs to remain active until last out of the kids though, so they can still use it to coordinate during their shutdowns. Having it be the first up and last down would solve a few architectural problems where kids need to to collaborate on things, like log rotations or broadcasting their availability. I am not sure we are able to build something simple if we try to implement it. The easier way is to delay sending the kill signal to coordinator until all other kids has exited. The coordinator process in this case should exit immediately after the kill signal received. Or we can just delay sending kill signal to coordinator process for 1 second. This is will give it enough time to exit later that the other kids. The type of kids is not known by the master process. But I suppose we can find it using the kid name. Any ideas? Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 16/01/2015 11:29 a.m., Alex Rousskov wrote: > On 01/14/2015 03:09 AM, Amos Jeffries wrote: >>> On 01/14/2015 11:25 AM, Amos Jeffries wrote: Does the master process get exit status of *all* worker processes and the sub-childs down N levels? It was my understanding that in SMP each worker disker etc is a fork() and the child becomes new coordinator. > > > Hi Amos, > > In SMP, there is only one Coordinator process, created by the > Master process. > > >> I suspect we will find that some diskers etc are in fact spawned >> by either coordinator or a worker and one level deeper than the >> master can see. > > > All SMP kids (Coordinator, workers, and diskers) are started by > the Master process. There are no multiple levels as far as kid > startup and waiting are concerned and, hence, there is no "level > deeper than the master can see". > > Needless to say, there are processes that are not SMP kids. For > example, all helpers do not run Squid code and, hence, they are not > kids and are not affected by the proposed fix. > > > HTH, > > Alex. > Hmm, okay. Then I have no problem per-se to this change of esponsibility. I do still think the coordinator needs to remain active until last out of the kids though, so they can still use it to coordinate during their shutdowns. Having it be the first up and last down would solve a few architectural problems where kids need to to collaborate on things, like log rotations or broadcasting their availability. Amos -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUuTOSAAoJELJo5wb/XPRj76UIAJk+f4O4aiIp4Hmw7bZpD5zV 3MGcai6k7kTG8/4ncvnsZCT9lG1shgA41JeNpoUzCMvo9kfRJ7wHYpwXZT0UNeEW mwE2TZXjwt1cWewyfwnb24Pph5vCBEhI7wqgANH3/+duZkAYzHuEl09G62LEpAR3 KBrmAJUP468brWXMLdDAJAUNo4c1e5U74vqI2k2Dnn25NjCxg1yLLx3RB4oDNGqI WbLAR8hR4ni4gl5jkJwdFb9yFgROaciQxVrLchj/ub73j2yoPF4hBgwwx2hola+f rJDRDiQugVbTlDl4jyv3QjRc3NLMA27qZKV0DCIxEIYZ7F+3niG4VVTo6mQhgEo= =Naou -END PGP SIGNATURE- ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
On 01/14/2015 03:09 AM, Amos Jeffries wrote: >> On 01/14/2015 11:25 AM, Amos Jeffries wrote: >>> Does the master process get exit status of *all* worker processes >>> and the sub-childs down N levels? It was my understanding that in >>> SMP each worker disker etc is a fork() and the child becomes new >>> coordinator. Hi Amos, In SMP, there is only one Coordinator process, created by the Master process. > I suspect we will find that some diskers etc are in fact spawned by > either coordinator or a worker and one level deeper than the master > can see. All SMP kids (Coordinator, workers, and diskers) are started by the Master process. There are no multiple levels as far as kid startup and waiting are concerned and, hence, there is no "level deeper than the master can see". Needless to say, there are processes that are not SMP kids. For example, all helpers do not run Squid code and, hence, they are not kids and are not affected by the proposed fix. HTH, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
---END PGP SIGNATURE- ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev Moved PID file management from Coordinator to Master. This move is the first step necessary to avoid the following race condition among PID file deletion and shared segment creation/destruction in SMP Squid: O1) The old Squid Coordinator removes its PID file and quits. N1) The system script notices Coordinator death and starts the new Squid. N2) Shared segments are created by the new Master process. O2) Shared segments are removed by the old Master process. N3) New worker/disker processes fail due to missing segments. TODO: The second step (not a part of this change) is to delete shared memory segments before PID file is deleted (all in the Master process after this change). Now the Master process receives signals and is responsible for forwarding them to the kids. The kids does not install default signal handler for shudown signals (SIGINT, SIGTERM) after a signal received. If a second shutdown signal is received then squid imediatelly terminates the event loop and exits. When the "kill-parent-hack" is enabled the kids are sending the kill signal to master process and master process forward it to other kids too. Also a small regression added: The PID file can no longer be renamed using hot reconfiguration. A full Squid restart is now required for that. This is a Measurement Factory project. === modified file 'src/ipc/Kid.cc' --- src/ipc/Kid.cc 2015-01-13 07:25:36 + +++ src/ipc/Kid.cc 2015-01-13 16:35:58 + @@ -33,41 +33,41 @@ badFailures(0), pid(-1), startTime(0), isRunning(false), status(0) { } /// called when this kid got started, records PID void Kid::start(pid_t cpid) { assert(!running()); assert(cpid > 0); isRunning = true; pid = cpid; time(&startTime); } /// called when kid terminates, sets exiting status -void Kid::stop(status_type theExitStatus) +void Kid::stop(PidStatus const theExitStatus) { assert(running()); assert(startTime != 0); isRunning = false; time_t stop_time; time(&stop_time); if ((stop_time - startTime) < fastFailureTimeLimit) ++badFailures; else badFailures = 0; // the failures are not "frequent" [any more] status = theExitStatus; } /// returns true if tracking of kid is stopped bool Kid::running() const { return isRunning; === modified file 'src/ipc/Kid.h' --- src/ipc/Kid.h 2015-01-13 07:25:36 + +++ src/ipc/Kid.h 2015-01-13 16:35:58 + @@ -1,60 +1,56 @@ /* * Copyright (C) 1996-2015 The Squid Software Foundation and contributors * * Squid software is distributed under GPLv2+ license and includes * contributions from numerous individuals and organizations. * Please see the COPYING and CONTRIBUTORS files for details. */ #ifndef SQUID_IPC_KID_H #define SQUID_IPC_KID_H #include "SquidString.h" +#include "tools.h" /// Squid child, including current forked process info and /// info persistent across restarts class Kid { public: -#if _SQUID_NEXT_ -typedef union wait status_type; -#else -typedef int status_type; -#endif /// keep restarting until the number of bad failures exceed this limit enum { badFailureLimit = 4 }; /// slower start failures are not "frequent enough" to be counted as "bad" enum { fastFailureTimeLimit = 10 }; // seconds public: Kid(); Kid(const String& kid_name); /// called when this kid got started, records PID void start(pid_t cpid); /// called when kid terminates, sets exiting status -void stop(status_type exitStatus); +void stop(PidStatus const exitStatus); /// returns true if tracking of kid is stopped bool running() const; /// returns true if master should restart this kid bool shouldRestart() const; /// returns current pid for a running kid and last pid for a stopped kid pid_t getPid() const; /// whether the failures are "repeated and frequent" bool hopeless() const; /// returns true if the process terminated normally bool calledExit() const; /// returns the exit status of the process int exitStatus() const; /// whether the process exited with a given exit status code @@ -67,39 +63,39 @@ bool signaled() const; /// returns the number of the signal that caused the kid to terminate int termSignal() const; /// whether the process was terminated by a given signal bool signaled(int sgnl) const; /// returns kid name const String& name() const; private: // Information preserved across restarts String theName; ///< process name int badFailures; ///< number of "repeated fr
Re: [squid-dev] Moved PID file management from Coordinator to Master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14/01/2015 10:54 p.m., Tsantilas Christos wrote: > On 01/14/2015 11:25 AM, Amos Jeffries wrote: >> On 14/01/2015 7:37 a.m., Tsantilas Christos wrote: >>> On 01/12/2015 07:22 PM, Amos Jeffries wrote: On 12/01/2015 >>> 6:02 > ... >>> The Master process has no way to know if the workers are >>> exiting early with no clients, or aborting on worker-specific >>> shutdown_timeout values. But the coordinator can receive a >>> terminated message from them over SMP sockets. >>> We can use exit status. >> >> Does the master process get exit status of *all* worker processes >> and the sub-childs down N levels? It was my understanding that in >> SMP each worker disker etc is a fork() and the child becomes new >> coordinator. >> > > Currently master process does not check exist status for any kid > process and from what I know it can not get exit status down N > levels. But it can get exit status of processes forked from master > process, the coordinator and workers. This is should be enough. Why > do we need to get exit status for sub-childs? If a worker process > exited normally means that the sub-childs exited normally or > without huge problems. Do you worry for still-running sub-childs > after a worker process exited? Is it possible? I suspect we will find that some diskers etc are in fact spawned by either coordinator or a worker and one level deeper than the master can see. Amos -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUtkBIAAoJELJo5wb/XPRj6hIIAIvq9pA7/CdXz7ssgvlJhyer P42VIFRVOV+0FOaUsQCGRN5tU/lhpdAcjDZdOZrodEkds3RGWm2lu+AbWL4K0BV/ tgiEF7c4wtKeW7AbwSdwnXCozWZ13vT1nuVA0Mw2inmJvgn6U8CjarN+nF+aWYIY O6QYxcwWcfILaVmHHJFd2k9RWEMc+qDGC1tFosZK6hCu/mRs2DQu0nsLPKoA1fQ6 WGdLXM1ePEItSAQrHBw3uv2RG6vAYBuYUl3LHkVYkqCc269cFeKxbldf52MLiS5p VrhYRGHPREAHcNJAwmnpm4YbEAO1g2RwsWU9mHPZP/YYA1kGaYV4LTZINh6s5HY= =uMno -END PGP SIGNATURE- ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
On 01/14/2015 11:25 AM, Amos Jeffries wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14/01/2015 7:37 a.m., Tsantilas Christos wrote: On 01/12/2015 07:22 PM, Amos Jeffries wrote: On 12/01/2015 6:02 ... The Master process has no way to know if the workers are exiting early with no clients, or aborting on worker-specific shutdown_timeout values. But the coordinator can receive a terminated message from them over SMP sockets. We can use exit status. Does the master process get exit status of *all* worker processes and the sub-childs down N levels? It was my understanding that in SMP each worker disker etc is a fork() and the child becomes new coordinator. Currently master process does not check exist status for any kid process and from what I know it can not get exit status down N levels. But it can get exit status of processes forked from master process, the coordinator and workers. This is should be enough. Why do we need to get exit status for sub-childs? If a worker process exited normally means that the sub-childs exited normally or without huge problems. Do you worry for still-running sub-childs after a worker process exited? Is it possible? Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14/01/2015 7:37 a.m., Tsantilas Christos wrote: > On 01/12/2015 07:22 PM, Amos Jeffries wrote: On 12/01/2015 6:02 > a.m., Tsantilas Christos wrote: Hi all, this patch moves pid file managment from coordinator process to master process. This move is the first step necessary to avoid the following race condition among PID file deletion and shared segment creation/destruction in SMP Squid: O1) The old Squid Coordinator removes its PID file and quits. N1) The system script notices Coordinator death and starts the new Squid. N2) Shared segments are created by the new Master process. O2) Shared segments are removed by the old Master process. N3) New worker/disker processes fail due to missing segments. > > The Coordinator needs to continue coordinating activities over the > SMP sockets until the workers are all shutdown and SMP sockets > closed, only then should it do O2 and O1 (in that order). > > The planned behaviour for worker shutdown is to: W1) early client > FD closures into the beginning of the shutdown_timeout period W2) > on each client closure or connection going idle, close it W3) at > end of shutdown_timeout OR last client disconnect, release all > resources. > > In that design the AsyncEngine still runs right up until the queue > completes draining. Using SMP sockets to inform Coordinator about > clean shutdown at the end. > >> My sense is that the exit status can provide the same >> functionality and also is easier to be implemented. If the worker >> aborted early by a segfault then I am doubt that it will be able >> to send a message to an SMP socket. > > The Master process has no way to know if the workers are exiting > early with no clients, or aborting on worker-specific > shutdown_timeout values. But the coordinator can receive a > terminated message from them over SMP sockets. > >> We can use exit status. Does the master process get exit status of *all* worker processes and the sub-childs down N levels? It was my understanding that in SMP each worker disker etc is a fork() and the child becomes new coordinator. Amos -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUtjYjAAoJELJo5wb/XPRjWN0IAJUR13a2a45DT/XKqlYVYXGb BoqPtKRUQQQsSrAqxJVrl59UXgY+yB38h7QWUrdBUV/+QaZvEhjMwVSLsL8qHbrM bMd0HQLa5G8W9nuKQnuI+3iwUQj0F9+7Z45yGKFimJRi7sDGRkHH5dTl/zJqNLgu oYJYZ3nB6QYsLV15SVf/cpBxrhELKZCfvb5Fcoy8LZojUfK9O0gXmdmBIJJtJyoa 1u9TbKsxtAdJOiUuhGcruLApj2bHoQL6IwEm9soV2IYOUxCamFKyaz4BqmUzmp05 txaYvQ3GbMc9k5IGGuXZHNiEVyNjUSg/97qXL150lYpKu7IYCJXqCj4PqwMu7NU= =qDv3 -END PGP SIGNATURE- ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Moved PID file management from Coordinator to Master
On 01/12/2015 07:22 PM, Amos Jeffries wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/01/2015 6:02 a.m., Tsantilas Christos wrote: Hi all, this patch moves pid file managment from coordinator process to master process. This move is the first step necessary to avoid the following race condition among PID file deletion and shared segment creation/destruction in SMP Squid: O1) The old Squid Coordinator removes its PID file and quits. N1) The system script notices Coordinator death and starts the new Squid. N2) Shared segments are created by the new Master process. O2) Shared segments are removed by the old Master process. N3) New worker/disker processes fail due to missing segments. The Coordinator needs to continue coordinating activities over the SMP sockets until the workers are all shutdown and SMP sockets closed, only then should it do O2 and O1 (in that order). The planned behaviour for worker shutdown is to: W1) early client FD closures into the beginning of the shutdown_timeout period W2) on each client closure or connection going idle, close it W3) at end of shutdown_timeout OR last client disconnect, release all resources. In that design the AsyncEngine still runs right up until the queue completes draining. Using SMP sockets to inform Coordinator about clean shutdown at the end. My sense is that the exit status can provide the same functionality and also is easier to be implemented. If the worker aborted early by a segfault then I am doubt that it will be able to send a message to an SMP socket. The Master process has no way to know if the workers are exiting early with no clients, or aborting on worker-specific shutdown_timeout values. But the coordinator can receive a terminated message from them over SMP sockets. We can use exit status. TODO: The second step (not a part of this change) is to delete shared memory segments before PID file is deleted (all in the Master process after this change). Now the Master process receives signals and is responsible for forwarding them to the kids. The command line control process also used manually for the -k options to send signals also thinks of itself as Master. How does this new closing of SMP sockets interact with that other meaning of Master process? The master process is the simplest squid process. I believe that it is the best process for doing the cleanup. Please for more informations read the patch preamble. This is a Measurement Factory project Some extra notes/ideas -- 1) Multiple shutdown signals received by squid In current squid when coordinator received a shutdown signal, then replaced shutdown signal handlers with the default handlers. This is has as result when a second shutdown signal received then the coordinator process died immediately, without forwarding shutdown signal to kids. The shutdown of the other kids are finished as normal. This patch when master process receives a shutdown signal forward it to kids and master process is ready to receive a second shutdown signal. When a second shutdown signal received to master and this forwarded to kids then the kids died immediately. Plan was to pass the signal to workers again where they kick off their own shutdown_timeout event handlers immediately instead of hard killing workers. So do you believe that the workers should not restore default handlers for shutdown signals. Am I correct? It is easy to be implemented, it already implemented for "kill-parent-hack" where the master process is constrained to send multiple kill signals to kids. FWIW: Ubuntu Gentoo, and RHEL people are enjoying their patches that just ignore the repeated signals. 2) The system admin shows a blocked kid (infinity loop or not responding). He kill with the hand. Current squid does not restart the kids killed by a TERM or KILL signal (squid considers it as normal kid shutdown). This patch does not change this behaviour. The admin is still able to kill with a "kill -11" and in this case the kid will restarted. My opinion is that squid should restart kids in these cases. Should not restart a kid only when a shutdown requested from system admin, or when the kids dying very fast (hopeless()==true ). TERM and KILL received by the workers often *are* signals sent by the system admin, or scripts on their behalf. That may decrease in popularity though when we fix the normal shutdown process issues. For a while longer we have to take the current reality. ok. In related topics, I have been trying to figure out a --foreground command line option that operates like -N but does not disable SMP, just makes Coordinator == Master. But understanding the SMP complexities have been blocking me so far. Are you able and interested in taking that forward? I do not know :-) I must ask Alex for this. Regards, Christos Amos -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUtALXAAoJELJo
Re: [squid-dev] Moved PID file management from Coordinator to Master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/01/2015 6:02 a.m., Tsantilas Christos wrote: > Hi all, this patch moves pid file managment from coordinator > process to master process. > > This move is the first step necessary to avoid the following race > condition among PID file deletion and shared segment > creation/destruction in SMP Squid: > > O1) The old Squid Coordinator removes its PID file and quits. N1) > The system script notices Coordinator death and starts the new > Squid. N2) Shared segments are created by the new Master process. > O2) Shared segments are removed by the old Master process. N3) New > worker/disker processes fail due to missing segments. > The Coordinator needs to continue coordinating activities over the SMP sockets until the workers are all shutdown and SMP sockets closed, only then should it do O2 and O1 (in that order). The planned behaviour for worker shutdown is to: W1) early client FD closures into the beginning of the shutdown_timeout period W2) on each client closure or connection going idle, close it W3) at end of shutdown_timeout OR last client disconnect, release all resources. In that design the AsyncEngine still runs right up until the queue completes draining. Using SMP sockets to inform Coordinator about clean shutdown at the end. The Master process has no way to know if the workers are exiting early with no clients, or aborting on worker-specific shutdown_timeout values. But the coordinator can receive a terminated message from them over SMP sockets. > TODO: The second step (not a part of this change) is to delete > shared memory segments before PID file is deleted (all in the > Master process after this change). > > Now the Master process receives signals and is responsible for > forwarding them to the kids. The command line control process also used manually for the -k options to send signals also thinks of itself as Master. How does this new closing of SMP sockets interact with that other meaning of Master process? > > Please for more informations read the patch preamble. > > This is a Measurement Factory project > > > Some extra notes/ideas -- > > 1) Multiple shutdown signals received by squid > > In current squid when coordinator received a shutdown signal, then > replaced shutdown signal handlers with the default handlers. This > is has as result when a second shutdown signal received then the > coordinator process died immediately, without forwarding shutdown > signal to kids. The shutdown of the other kids are finished as > normal. > > This patch when master process receives a shutdown signal forward > it to kids and master process is ready to receive a second shutdown > signal. When a second shutdown signal received to master and this > forwarded to kids then the kids died immediately. Plan was to pass the signal to workers again where they kick off their own shutdown_timeout event handlers immediately instead of hard killing workers. FWIW: Ubuntu Gentoo, and RHEL people are enjoying their patches that just ignore the repeated signals. > > 2) The system admin shows a blocked kid (infinity loop or not > responding). He kill with the hand. > > Current squid does not restart the kids killed by a TERM or KILL > signal (squid considers it as normal kid shutdown). This patch does > not change this behaviour. The admin is still able to kill with a > "kill -11" and in this case the kid will restarted. > > My opinion is that squid should restart kids in these cases. Should > not restart a kid only when a shutdown requested from system admin, > or when the kids dying very fast (hopeless()==true ). TERM and KILL received by the workers often *are* signals sent by the system admin, or scripts on their behalf. That may decrease in popularity though when we fix the normal shutdown process issues. For a while longer we have to take the current reality. In related topics, I have been trying to figure out a --foreground command line option that operates like -N but does not disable SMP, just makes Coordinator == Master. But understanding the SMP complexities have been blocking me so far. Are you able and interested in taking that forward? Amos -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iQEcBAEBAgAGBQJUtALXAAoJELJo5wb/XPRjwKsIAMPzuOaxvC7WpBHOpQZpG1IZ 1tgbtosaJu3JweE7At729HLL34mR+YagaJbTz4xF6c2mkpLxxYioT6IzSxKc6YCD mYJr8WU8uuJVI662u7w+3UyLVLI+c3vIwrw8d8NDZaKyAkOIn//Xks9YIG7h+xse ooK/AAhMaADiS5S1FqY9OM3Q5Pn0nI3R91EpzGIeL1U5bG+43GYiOic3YSKgxSzq 8Q3YemiLj7ex00ZBtCbQ955bB8Zz1Q9I8hWgXdAFHgQKrjNmjdUDHqEg5M6E33zf Gwpr6M3bO1gbtp7ize9vX7YxIlUjK6TUsbOFPlt9QJYEzzVxoqcgzy0lavVEiXE= =5Qeg -END PGP SIGNATURE- ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
[squid-dev] Moved PID file management from Coordinator to Master
Hi all, this patch moves pid file managment from coordinator process to master process. This move is the first step necessary to avoid the following race condition among PID file deletion and shared segment creation/destruction in SMP Squid: O1) The old Squid Coordinator removes its PID file and quits. N1) The system script notices Coordinator death and starts the new Squid. N2) Shared segments are created by the new Master process. O2) Shared segments are removed by the old Master process. N3) New worker/disker processes fail due to missing segments. TODO: The second step (not a part of this change) is to delete shared memory segments before PID file is deleted (all in the Master process after this change). Now the Master process receives signals and is responsible for forwarding them to the kids. Please for more informations read the patch preamble. This is a Measurement Factory project Some extra notes/ideas -- 1) Multiple shutdown signals received by squid In current squid when coordinator received a shutdown signal, then replaced shutdown signal handlers with the default handlers. This is has as result when a second shutdown signal received then the coordinator process died immediately, without forwarding shutdown signal to kids. The shutdown of the other kids are finished as normal. This patch when master process receives a shutdown signal forward it to kids and master process is ready to receive a second shutdown signal. When a second shutdown signal received to master and this forwarded to kids then the kids died immediately. 2) The system admin shows a blocked kid (infinity loop or not responding). He kill with the hand. Current squid does not restart the kids killed by a TERM or KILL signal (squid considers it as normal kid shutdown). This patch does not change this behaviour. The admin is still able to kill with a "kill -11" and in this case the kid will restarted. My opinion is that squid should restart kids in these cases. Should not restart a kid only when a shutdown requested from system admin, or when the kids dying very fast (hopeless()==true ). Regards, Christos Moved PID file management from Coordinator to Master. This move is the first step necessary to avoid the following race condition among PID file deletion and shared segment creation/destruction in SMP Squid: O1) The old Squid Coordinator removes its PID file and quits. N1) The system script notices Coordinator death and starts the new Squid. N2) Shared segments are created by the new Master process. O2) Shared segments are removed by the old Master process. N3) New worker/disker processes fail due to missing segments. TODO: The second step (not a part of this change) is to delete shared memory segments before PID file is deleted (all in the Master process after this change). Now the Master process receives signals and is responsible for forwarding them to the kids. When the "kill-parent-hack" is enabled the kids are sending the kill signal to master process and master process forward it to other kids too. In this case the kids does not install default signal handler for TERM and KILL signals after receives a shutdown signal. Also a small regression added: The PID file can no longer be renamed using hot reconfiguration. A full Squid restart is now required for that. This is a Measurement Factory project. === modified file 'src/ipc/Kid.cc' --- src/ipc/Kid.cc 2014-12-20 12:12:02 + +++ src/ipc/Kid.cc 2015-01-08 15:26:48 + @@ -33,41 +33,41 @@ badFailures(0), pid(-1), startTime(0), isRunning(false), status(0) { } /// called when this kid got started, records PID void Kid::start(pid_t cpid) { assert(!running()); assert(cpid > 0); isRunning = true; pid = cpid; time(&startTime); } /// called when kid terminates, sets exiting status -void Kid::stop(status_type theExitStatus) +void Kid::stop(PidStatus const theExitStatus) { assert(running()); assert(startTime != 0); isRunning = false; time_t stop_time; time(&stop_time); if ((stop_time - startTime) < fastFailureTimeLimit) ++badFailures; else badFailures = 0; // the failures are not "frequent" [any more] status = theExitStatus; } /// returns true if tracking of kid is stopped bool Kid::running() const { return isRunning; === modified file 'src/ipc/Kid.h' --- src/ipc/Kid.h 2014-12-20 12:12:02 + +++ src/ipc/Kid.h 2015-01-08 15:26:32 + @@ -1,60 +1,56 @@ /* * Copyright (C) 1996-2014 The Squid Software Foundation and contributors * * Squid software is distributed under GPLv2+ license and includes * contributions from numerous individuals and organizations. * Please see the COPYING and CONTRIBUTORS files for details. */ #ifndef SQUID_IPC_KID_H #define SQUID_IPC_KID_H #include "SquidString.h" +#include "tools.h" /// Squi