Am 22.11.2013 12:49, schrieb David Timothy Strauss: > It is the responsibility of whatever sends the watchdog to ensure > everything's healthy, however necessary. It would > be silly to spawn a thread and have it blindly report health to watchdog. The > point is for that thread to do proper > checks but ensure reports go in at the right intervals.
i know that but the *how* is the question you can internally check what not but that does not mean at the end of the day the service responds correctly to a client connection over the network until you do not go through the same stack meaning doing a network connection i spent hundrets of hours in upstream-debugging of dbmail to find spinlocks and what not else only happening in rare situations, one of them took 16 hours stress tests until it happend with debug-log enabled while on the real server it took a few minutes to get triggered by a random client action that's the difference between theory and real workload your internal checks are mostly theory because in case of a bug you have undefined behavior and what you want to achieve with the watchdog is catch this undefined behavior and restart the service - in doubt this will not work in the rare cases the watchdog should restart until you went the complete code-path of a client, in case of a IMAP server you can enter the spin-loop everywhere from accept the connection to folder listing or receive a message and it may depend on a buffer overflow while high concurrency and different threads are touching each other in a unexpected way been there, died nearly in debug it and catch data for upstream > On Nov 22, 2013 7:50 PM, "Reindl Harald" <h.rei...@thelounge.net > <mailto:h.rei...@thelounge.net>> wrote: > > > Am 22.11.2013 03:04, schrieb salil GK: > > Thanks a lot David > > > > On 22 November 2013 06:44, David Timothy Strauss > <da...@davidstrauss.net <mailto:da...@davidstrauss.net> > <mailto:da...@davidstrauss.net <mailto:da...@davidstrauss.net>>> wrote: > > > > On Thu, Nov 21, 2013 at 4:57 PM, salil GK <gksa...@gmail.com > <mailto:gksa...@gmail.com> > <mailto:gksa...@gmail.com <mailto:gksa...@gmail.com>>> wrote: > > > What happens is - my process may be busy with some other activity > during > > > which time it will fail to send periodic message to systemd. > After a while > > > it will come out of it's loop and ready to serve. But during this > time > > > system would have already marked the process as failed. > > > > Then you need to either use another thread, refactor to make a > tighter > > event loop, or increase the watchdog time. Drifting in and out of > > tolerance with watchdog is not a safe strategy. > > the problem i see with "use another thread" is that this thread can > happily > work and send it's keep alive, but that does not mean at the end that the > service itself is working OK and responsible because both are running > isolated > > in case of network services it would be pretty cool if systemd watchdog > could be configured to connect to the service avery n seconds and if > there is no response restart it because this would monitor the real > service > without need external tools
signature.asc
Description: OpenPGP digital signature
_______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel