Bug#505608: runit: stopped runsv processes not responding to TERM signals
On Mon, Aug 11, 2014 at 06:52:33PM +, Gerrit Pape wrote: In your case the log service is still running, and so runsv is still waiting for it. The best solution is to make your log service detect that stdin is closed and exit as there's no more data to log. Hm. The log service of cereal makes heavy use of shell redirection: #!/bin/sh -e # The cereal scripts were written by # Jameson Graef Rollins jroll...@finestructure.net # and # Daniel Kahn Gillmor d...@fifthhorseman.net. # # They are Copyright 2007, and are all released under the GPL, version 3 # or later. exec 21 SHAREDIR=/usr/share/cereal export SHAREDIR . $SHAREDIR/common LOGUSER=$(cat ../env/LOGUSER) LOGGROUP=$(cat ../env/LOGGROUP) check_user $LOGUSER check_group $LOGGROUP exec chpst -u $LOGUSER:$LOGGROUP svlogd -tt ./main ../socket I guess that this was written with the old documented behavior in mind. Reassigning this bug to cereal. Greetings Marc -- - Marc Haber | I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things.Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600420 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#505608: runit: stopped runsv processes not responding to TERM signals
On Tue, Aug 12, 2014 at 08:30:05AM +0200, Marc Haber wrote: Reassigning this bug to cereal. There is already a corresponding bug against cereal (#485599), so this bug can accordingly be closed. Greetings Marc -- - Marc Haber | I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things.Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600420 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#505608: runit: stopped runsv processes not responding to TERM signals
On Fri, Aug 08, 2014 at 11:45:53AM +0200, Marc Haber wrote: I still can reproduce the behavior on a wheezy system. Here is the output of sv stat: sudo sv stat /var/lib/cereal/sessions/sw01 down: /var/lib/cereal/sessions/sw01: 82s; run: log: (pid 25393) 76s Hi Marc, when runsv is told to exit either through sv exit, or when receiving TERM, which also happens when the service directory symlink is removed from /etc/service/ (e.g. through update-service --remove), then from runsv(8): If the service is running, send it a TERM signal, and then a CONT signal. Do not restart the service. If the service is down, and no log service exists, runsv exits. If the service is down and a log service exists, runsv closes the standard input of the log service, and waits for it to terminate. If the log service is down, runsv exits. In your case the log service is still running, and so runsv is still waiting for it. The best solution is to make your log service detect that stdin is closed and exit as there's no more data to log. Does this fix your problem? Regards, Gerrit. PS: before runit 2.1.2 the sv(8) man page was wrong, stating that a TERM signal is sent to the log service, but actually only stdin is closed so that the log service has the chance to process everything that shall be logged. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#505608: runit: stopped runsv processes not responding to TERM signals
Hi, I still can reproduce the behavior on a wheezy system. Here is the output of sv stat: sudo sv stat /var/lib/cereal/sessions/sw01 down: /var/lib/cereal/sessions/sw01: 82s; run: log: (pid 25393) 76s The process list says: └─runsv,25378 cereal.sw01 └─run,25393 -e ./run So I suspect that it is not stuck in the finish state. Greetings Marc On Fri, Aug 01, 2014 at 09:55:01AM +, Gerrit Pape wrote: From: Gerrit Pape p...@smarden.org Subject: Bug#505608: runit: stopped runsv processes not responding to TERM signals To: Marc Haber mh+debian-b...@zugschlus.de, 505...@bugs.debian.org Cc: 505608-submit...@bugs.debian.org Reply-To: Marc Haber mh+debian-b...@zugschlus.de, 505...@bugs.debian.org, 505608-submit...@bugs.debian.org, 505608-qu...@bugs.debian.org Date: Fri, 1 Aug 2014 09:55:01 + List-Id: 505608.bugs.debian.org X-Debian-PR-Package: runit X-Spam-Score: (--) -2.6 X-Spam-Report: torres.zugschlus.de Content analysis details: (-2.6 points, 5.0 required) pts rule name description -- --- -0.7 RP_MATCHES_RCVDEnvelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.] On Tue, Nov 03, 2009 at 01:54:10PM +0100, Marc Haber wrote: I can reproduce the issue with the instructions given by Jameson. On Mon, Oct 12, 2009 at 09:39:42AM +, Gerrit Pape wrote: Hi Jameson, if you still have this problem, please tar the service directory, and mail the tar archive to this bug report, I'll take a look then. Attached. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. $ sudo update-service --add /var/lib/cereal/sessions/sw01 cereal.sw01 Service cereal.sw01 added. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. └─runsv,3113 cereal.sw01 └─run,3114 -e ./run $ sudo update-service --remove /var/lib/cereal/sessions/sw01 cereal.sw01 Service cereal.sw01 removed, the service daemon received the TERM and CONT signals. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. └─runsv,3113 cereal.sw01 └─run,3114 -e ./run $ I would have expected processes 3113 and 3114 to die. Please note that you will probably have to install cereal to reproduce the issue as cereal makes heavy use of out-of-tree symlinks in its service directories. Hi, thanks for this and sorry for the late reply. I guess the service is hanging in the finish state, if so, ./finish should be fixed. runsv will not exit unless the ./finish script has done its job and terminated. The output of 'sv stat /var/lib/cereal/sessions/sw01' would be interesing in this situation, and tell us whether the service is in the finish state. Do you still have these services in operation and can check? Regards, Gerrit. -- To unsubscribe, send mail to 505608-unsubscr...@bugs.debian.org. -- - Marc Haber | I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things.Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600420 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#505608: runit: stopped runsv processes not responding to TERM signals
On Tue, Nov 03, 2009 at 01:54:10PM +0100, Marc Haber wrote: I can reproduce the issue with the instructions given by Jameson. On Mon, Oct 12, 2009 at 09:39:42AM +, Gerrit Pape wrote: Hi Jameson, if you still have this problem, please tar the service directory, and mail the tar archive to this bug report, I'll take a look then. Attached. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. $ sudo update-service --add /var/lib/cereal/sessions/sw01 cereal.sw01 Service cereal.sw01 added. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. └─runsv,3113 cereal.sw01 └─run,3114 -e ./run $ sudo update-service --remove /var/lib/cereal/sessions/sw01 cereal.sw01 Service cereal.sw01 removed, the service daemon received the TERM and CONT signals. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. └─runsv,3113 cereal.sw01 └─run,3114 -e ./run $ I would have expected processes 3113 and 3114 to die. Please note that you will probably have to install cereal to reproduce the issue as cereal makes heavy use of out-of-tree symlinks in its service directories. Hi, thanks for this and sorry for the late reply. I guess the service is hanging in the finish state, if so, ./finish should be fixed. runsv will not exit unless the ./finish script has done its job and terminated. The output of 'sv stat /var/lib/cereal/sessions/sw01' would be interesing in this situation, and tell us whether the service is in the finish state. Do you still have these services in operation and can check? Regards, Gerrit. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#505608: runit: stopped runsv processes not responding to TERM signals
Hi, I can reproduce the issue with the instructions given by Jameson. On Mon, Oct 12, 2009 at 09:39:42AM +, Gerrit Pape wrote: Hi Jameson, if you still have this problem, please tar the service directory, and mail the tar archive to this bug report, I'll take a look then. Attached. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. $ sudo update-service --add /var/lib/cereal/sessions/sw01 cereal.sw01 Service cereal.sw01 added. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. └─runsv,3113 cereal.sw01 └─run,3114 -e ./run $ sudo update-service --remove /var/lib/cereal/sessions/sw01 cereal.sw01 Service cereal.sw01 removed, the service daemon received the TERM and CONT signals. $ pstree -apl 2856 runsvdir,2856 -P /etc/servicelog:\040.. └─runsv,3113 cereal.sw01 └─run,3114 -e ./run $ I would have expected processes 3113 and 3114 to die. Please note that you will probably have to install cereal to reproduce the issue as cereal makes heavy use of out-of-tree symlinks in its service directories. Greetings Marc -- - Marc Haber | I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things.Winona Ryder | Fon: *49 621 72739834 Nordisch by Nature | How to make an American Quilt | Fax: *49 621 72739835 sw01.tar.gz Description: Binary data
Bug#505608: runit: stopped runsv processes not responding to TERM signals
tags 505608 + moreinfo quit On Wed, Nov 19, 2008 at 12:02:16PM -0500, Jameson Graef Rollins wrote: Hey, Gerrit. Thanks so much for the response. I can definitely reproduce the issue reliably, but interestingly, only with cereal sessions. Since I'm not sure exactly how to pass you one of these cereal service directories, is it possible for you to create a service directory from the cereal package itself? If not, please let me know what the best way for me to pass you a service directory is. Maybe I can just tar it and send it to you via email. Hi Jameson, if you still have this problem, please tar the service directory, and mail the tar archive to this bug report, I'll take a look then. Regards, Gerrit. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#505608: runit: stopped runsv processes not responding to TERM signals
On Thu, Nov 13, 2008 at 02:41:39PM -0500, Jameson Graef Rollins wrote: Hello, Gerrit. I'm encountering a problem with runit that I'm hoping you can help me with. It appears that stopped runsv processes are not responding to TERM signals. The big problem that this is causing is that services removed with update-service --remove do not die: servo:~ 1$ sudo update-service --add /var/lib/cereal/sessions/foo cereal.foo Service cereal.foo added. servo:~ 0$ pidof runsv 27144 servo:~ 0$ sudo svstat /etc/service/cereal.foo /etc/service/cereal.foo: down 51 seconds servo:~ 0$ sudo update-service --remove /var/lib/cereal/sessions/foo cereal.foo Service cereal.foo removed, the service daemon received the TERM and CONT signals. servo:~ 0$ sudo svstat /etc/service/cereal.foo /etc/service/cereal.foo: unable to chdir: file does not exist servo:~ 0$ ps -eFH | grep [r]unsv root 3276 1 03032 0 Oct24 ?00:00:05 runsvdir -P /etc/service log: ... root 27144 3276 02736 0 14:30 ?00:00:00 runsv cereal.foo servo:~ 0$ pidof runsv 27144 servo:~ 0$ sudo kill -TERM $(pidof runsv) servo:~ 0$ pidof runsv 27144 servo:~ 0$ sudo kill -KILL $(pidof runsv) servo:~ 0$ pidof runsv servo:~ 1$ Hi Jameson, see the runsv(8) man page: SIGNALS If runsv receives a TERM signal, it acts as if the character x was written to the control pipe. and x Exit. If the service is running, send it a TERM signal, and then a CONT signal. Do not restart the service. If the service is down, and no log service exists, runsv exits. If the service is down and a log service exists, runsv closes the standard input of the log service, and waits for it to terminate. If the log service is down, runsv exits. This command is ignored if it is given to service/log/supervise/control. If runsv receives a TERM signal, it sends the service daemon a TERM signal, and waits for it to terminate. As long as the service daemon doesn't terminate, runsv will be running too, I guess that's what's happening with your services. IMO the correct fix is to make sure that the service daemon properly re-acts on a TERM signal. If that isn't feasible, you can tell runsv to send a KILL signal (see CUSTOMIZE CONTROL), or use something like 'sv force-stop /var/lib/cereal/sessions/foo'. ...Having written that, I see your service is down when removing it servo:~ 0$ sudo svstat /etc/service/cereal.foo /etc/service/cereal.foo: down 51 seconds Hmm. If you can reproduce the issue reliably, can you make the service directory available, so that I can try to reproduce on my systems? Thanks, Gerrit. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#505608: runit: stopped runsv processes not responding to TERM signals
On Wed, Nov 19, 2008 at 01:30:04PM +, Gerrit Pape wrote: ...Having written that, I see your service is down when removing it servo:~ 0$ sudo svstat /etc/service/cereal.foo /etc/service/cereal.foo: down 51 seconds Hmm. If you can reproduce the issue reliably, can you make the service directory available, so that I can try to reproduce on my systems? Hey, Gerrit. Thanks so much for the response. I can definitely reproduce the issue reliably, but interestingly, only with cereal sessions. I realize now that the problem is not just when the service is down. Here is an example of a an attempt to remove a running cereal service on one of my servers: rukh:~ 0# ps -eFH | grep [c]ereal.hydra1 root 2349 2337 02724 0 Nov13 ?00:00:00 runsv cereal.hydra1 1000 6423 2349 0 5929 1624 0 Nov13 ?00:00:03 /usr/bin/SCREEN -D -m -L -c /etc/cereal/screenrc -s /bin/false -S cereal:hydra1 -t hydra1 /dev/ttyS8 115200 rukh:~ 0# update-service --remove /var/lib/cereal/sessions/hydra1 cereal.hydra1 Service cereal.hydra1 removed, the service daemon received the TERM and CONT signals. rukh:~ 0# ps -eFH | grep [c]ereal.hydra1 root 2349 2337 02728 0 Nov13 ?00:00:00 runsv cereal.hydra1 rukh:~ 0# kill 2349 rukh:~ 0# ps -eFH | grep [c]ereal.hydra1 root 2349 2337 02728 0 Nov13 ?00:00:00 runsv cereal.hydra1 rukh:~ 0# kill -9 2349 rukh:~ 0# ps -eFH | grep [c]ereal.hydra1 rukh:~ 1# Note that the SCREEN process (which was exec'd by the service run script) is running at first, the service is removed, the SCREEN process stops, but the runsv does *not* stop until I send it a KILL signal. But here is the really interesting test. First, I can create a very simple dummy service that does stop properly: rukh:/tmp/cdtemp.BKnOUC 0# cat EOF foo/run #!/bin/bash while true; do sleep 1 done EOF rukh:/tmp/cdtemp.BKnOUC 0# chmod 755 foo/run rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo rukh:/tmp/cdtemp.BKnOUC 1# update-service --add /tmp/cdtemp.BKnOUC/foo test.foo Service test.foo added. rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo root 31859 2337 02724 0 11:42 ?00:00:00 runsv test.foo rukh:/tmp/cdtemp.BKnOUC 0# update-service --remove /tmp/cdtemp.BKnOUC/foo test.foo Service test.foo removed, the service daemon received the TERM and CONT signals. rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo rukh:/tmp/cdtemp.BKnOUC 1# Now, I create a new cereal session, copy it's service directory to a temporary location, replace it's run script with the same dummy run script as above, and then try to add and remove it: rukh:/tmp/cdtemp.BKnOUC 0# cereal-admin create hydra1 /dev/ttyS8 115200 gecoadmin adm Created session 'hydra1': --f hydra1 /dev/ttyS8 115200 gecoadmin adm Service cereal.hydra1 added. rukh:/tmp/cdtemp.BKnOUC 0# cp -a /var/lib/cereal/sessions/hydra1 . rukh:/tmp/cdtemp.BKnOUC 0# cp foo/run hydra1/run cp: overwrite `hydra1/run'? y rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 rukh:/tmp/cdtemp.BKnOUC 1# update-service --add /tmp/cdtemp.BKnOUC/hydra1 test.hydra1 Service test.hydra1 added. rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 root 32008 2337 02724 0 11:47 ?00:00:00 runsv test.hydra1 rukh:/tmp/cdtemp.BKnOUC 0# update-service --remove /tmp/cdtemp.BKnOUC/hydra1 test.hydra1 Service test.hydra1 removed, the service daemon received the TERM and CONT signals. rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 root 32008 2337 02724 0 11:47 ?00:00:00 runsv test.hydra1 rukh:/tmp/cdtemp.BKnOUC 0# kill 32008 rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 root 32008 2337 02724 0 11:47 ?00:00:00 runsv test.hydra1 rukh:/tmp/cdtemp.BKnOUC 0# kill -9 32008 rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 rukh:/tmp/cdtemp.BKnOUC 1# Notice that the service does *NOT* terminate. I now believe that this problem must have something to do with how the service directory is formatted. Could there be something in the service directory that would prevent the runsv process from accepting the TERM? For instance, there is a 'down' file in the cereal service directory. That shouldn't affect this, but is it possible that it is? Could there be something else that we're doing in cereal that is causing this problem? Since I'm not sure exactly how to pass you one of these cereal service directories, is it possible for you to create a service directory from the cereal package itself? If not, please let me know what the best way for me to pass you a service directory is. Maybe I can just tar it and send it to you via email. Thanks again for you help with this issue. Please let me know what else I should do. jamie. signature.asc Description: Digital signature
Bug#505608: runit: stopped runsv processes not responding to TERM signals
Package: runit Version: 2.0.0-1 Severity: normal -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, Gerrit. I'm encountering a problem with runit that I'm hoping you can help me with. It appears that stopped runsv processes are not responding to TERM signals. The big problem that this is causing is that services removed with update-service --remove do not die: servo:~ 1$ sudo update-service --add /var/lib/cereal/sessions/foo cereal.foo Service cereal.foo added. servo:~ 0$ pidof runsv 27144 servo:~ 0$ sudo svstat /etc/service/cereal.foo /etc/service/cereal.foo: down 51 seconds servo:~ 0$ sudo update-service --remove /var/lib/cereal/sessions/foo cereal.foo Service cereal.foo removed, the service daemon received the TERM and CONT signals. servo:~ 0$ sudo svstat /etc/service/cereal.foo /etc/service/cereal.foo: unable to chdir: file does not exist servo:~ 0$ ps -eFH | grep [r]unsv root 3276 1 03032 0 Oct24 ?00:00:05 runsvdir -P /etc/service log: ... root 27144 3276 02736 0 14:30 ?00:00:00 runsv cereal.foo servo:~ 0$ pidof runsv 27144 servo:~ 0$ sudo kill -TERM $(pidof runsv) servo:~ 0$ pidof runsv 27144 servo:~ 0$ sudo kill -KILL $(pidof runsv) servo:~ 0$ pidof runsv servo:~ 1$ I emphasize that this only seems to be a problem for *stopped* services, since running services seem to go away fine: servo:~ 0$ sudo update-service --add /var/lib/cereal/sessions/foo cereal.foo Service cereal.foo added. servo:~ 0$ sudo sv start cereal.foo ok: run: cereal.foo: (pid 26515) 0s, normally down servo:~ 0$ sudo update-service --remove /var/lib/cereal/sessions/foo cereal.foo Service cereal.foo removed, the service daemon received the TERM and CONT signals. servo:~ 0$ pidof runsv servo:~ 1$ Any idea what the problem could be? I only just noticed this today, so I'm not sure how long this has been a problem. It definitely hasn't always been a problem. Please let me know if there's anything else I can do to help figure this out. And thanks, as always, for maintaining such a great program. jamie. - -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (500, 'testing'), (200, 'unstable'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.26-1-686 (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash runit depends on no packages. Versions of packages runit recommends: pn fgettynone (no description available) Versions of packages runit suggests: pn runit-run none (no description available) pn socklog-run none (no description available) - -- no debconf information -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBAgAGBQJJHILrAAoJEO00zqvie6q8iK4P/RHf3/99NDOIxK5fmbW88P1G /SwrCWyPJzs2cT3EYyOJyE5n2TYmumq5Pd2p8PPLtqme2A4ZX/yzaN39R4ud1rjZ Meyk+aNNrGKjHCLSUJYakcfVv3gFBoh7aren3e0Je4lAKuvSs6nEMwcyfD9EkXvT v8+bZBMHrSYVoLJGuUFxuxEKVimaREvJztn2uHnJd8Ior9+ghPekIAruNfyFhV8a 0u/qqi42praHVT2L3Nvjo8IFIM/iRUNX7crK9KIqwXYkvb7+lkSNhjvT4kQabpfo 2N/S4K3i+bDcF/71F/j44R5/5UorjZLejSOZqiIlOhYrBh1paEG0tYdPahjFVNf9 7TkmmONSpHXmTQ6zBHcdjDmnDdQR1qWlnYTHJ1Y40ZE4lfLq3YkIm9KPlNGqDCFM wl7efrJ1ZzpMQn0yIA6TRfz9d6PwEne98cIYs9XnPmQdt+/1Jr5kp1jKnsb3J+GC tnuQN5uxPszeiw8j2ijPKHsxyhn4RmUS4aMemHIL8Qu9zWKUHjU5ZEZAaYj/QzQ6 GPQUxXcc0UNKycArWewDQYuLh/OauHtqb37muw++edHgGkvVGG+kOc9TbCr3SEC2 wz/h+lajbtxwv1wK67tZviqhGILDmRYZCWqXaOBuNu/c2ZoLznXgsQuSh48Hgc7e wPPYmbFBKlzPtJHyVJb1 =vmoy -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]