Hello Laurent and Hoël, Thanks so much for the quick replies! I went with your suggested approach of using the timeout-kill and the finish scripts which works great. :) We had the concern that a different approach would be needed if the daemon is killed by the kernel’s OOM killer but I believe the finish script accounts for this possibility. And for the record, the daemon does terminate its children to the best of its ability during its shutdown ;)
> So an example finish script could be: > >#!/bin/sh >sleep 2 >kill -9 -- -"$4" I’m not sure this is correct, the finish script is only run once the process is down, so the waiting period is actually determined by the value in the timeout-kill file (or potentially another s6-svc -k command) and not by the sleep in the finish script as far as I can tell. For some reason I don’t think the — seperating options and arguments is needed in kill, it seems to interpret the group id as an argument anyway as long as I’ve supplied a signal. Thank you guys so much, Tom From: [email protected] <[email protected]> on behalf of Laurent Bercot <[email protected]> Date: Wednesday, 5. November 2025 at 22:49 To: [email protected] <[email protected]> Subject: Re: Kill process group after a timeout window [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Hi Tom, >def restart_daemon(): > # kill demon gracefully > system("/opt/s6/bin/s6-svc -T 180000 -wd -d >/projectname/service/servicename") > # kill demon > system("/opt/s6/bin/s6-svc -k /projectname/service/servicename") > # restart demon > system("/opt/s6/bin/s6-svc -T 180000 -wu -u >/projectname/service/servicename") Note that s6 *can* send a graceful kill first and a violent kill later: look for the timeout-kill file in https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fskarnet.org%2Fsoftware%2Fs6%2Fservicedir.html&data=05%7C02%7Ctom.becker%40sap.com%7Cc3e01828c6784c8457ba08de1cb51f5a%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638979761409125764%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=wZpNKrXtGY2sccqoCPj1pSSf4nELlbGDZZuum5%2FjNHQ%3D&reserved=0<https://skarnet.org/software/s6/servicedir.html> However, the kill signal is only sent to the main process, not to the process group, so this feature is only useful when the main process itself doesn't exit gracefully on receipt of its down-signal. Which is not the case here. >I upgraded s6 to the most recent version 2.13.2.0, which supports the “-K” >option (for kill the whole process group) and changed the corresponding line >in the function. >In order to test the change I replaced the demon with a process that does >nothing but spawn a child process which does nothing except sleep endlessly. >When executing the s6-svc commands in sequence the daemon’s child did not, >however, get killed. What happened here is that your main process successfully terminated on your initial SIGTERM. The service was then considered down. When you tried s6-svc -K, no signal was sent to the process group, because s6-supervise only sends signals when the service is *up* - otherwise it considers there's nothing to send the signal to! So s6-svc -K had no effect. >Ideally we could send SIGTERM with a timeout and the send SIGKILL to the whole >process group in one operation so that the process group is not forgotten. Do >you have any suggestions? As Hoël said, the problem here is that your daemon behaves *almost* correctly, with a main process that dies gracefully when told to - but sometimes leaves behind children that don't die. So it's not a signalling problem, but a *cleanup* problem, and indeed, cleaning up is the job of a finish script. So what you want is write a ./finish script for your service - s6-supervise will run it as soon as your main process dies. What it needs to do is give children some time to die, then send a SIGKILL to the process group to clean up. The pgid is given as the 4th argument to the finish script, as documented on (same page) https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fskarnet.org%2Fsoftware%2Fs6%2Fservicedir.html&data=05%7C02%7Ctom.becker%40sap.com%7Cc3e01828c6784c8457ba08de1cb51f5a%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638979761409147391%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=BGFKCvvS0zG1bn%2FTJyl6c47%2BNbUo9xcbCMhJijghkEg%3D&reserved=0<https://skarnet.org/software/s6/servicedir.html> So an example finish script could be: #!/bin/sh sleep 2 kill -9 -- -"$4" if you want to give 2 seconds for children to exit gracefully. (don't forget the - before "$4", this is what tells the kill command to kill a process group rather than a single process.) Note that if you want to give them 5 seconds or more, you will need to adjust the authorized lifetime of a finish script via the timeout-finish file. (echo 30000 > timeout-finish to allow a finish script to run for 30 seconds before being killed by s6-supervise.) And that, normally, should solve your issue. Mitigation for misbehaving daemons is an area where s6 doesn't shine for its clarity / ease of use, because these are difficult to do portably, and are added as afterthoughts - sorry about that. The process group mitigation was a recent addition, and I'm glad it's going to see some use. The next version of s6 will also have mitigation for another common misbehaviour 🙂 -- Laurent
