Re: [systemd-devel] No signal sent to stop service
Hello David. On Tue, Aug 11, 2020 at 02:33:11PM +1200, David Cunningham wrote: > The problem is most likely with systemd thinking the program is stopped > because "systemctl status" reports: > Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Main process > exited, code=exited, status=1/FAILURE > Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Failed with > result 'exit-code'. This means there is a mismatch between what the service considers its man PID (17824) and what systemd tracks -- the tracked process apparently terminated with failure exit code. > 1:name=systemd:/user.slice/user-0.slice/session-623.scope > 0::/user.slice/user-0.slice/session-623.scope This suggests that the alleged main process (from PID file) was migrated out of the service's cgroup into session scope (pam_systemd, this can happen when daemon would switch uid calling into PAM, such as with su(do).) or it was started directly in the user session. My suggestion is to check whether MainPID (next time please share full `systemctl status output`) matches the contents of your PID file (while the service is "stoppable" and afterwards). Second, it's worth reviewing what happens around the time when the "Main process exited" message appears (you can increase PID 1 verbosity `systemd-analyze set-log-level debug` in order to rule out systemd issue). One idea is that someone starts another service instance from their user session which breaks the original instance and the new one is not tracked by systemd. HTH, Michal signature.asc Description: Digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] No signal sent to stop service
Hello Lennart and Michal, Thank you for your replies. The cgroup file is below - can you please advise what is the relevant part to check? The problem is most likely with systemd thinking the program is stopped because "systemctl status" reports: Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Main process exited, code=exited, status=1/FAILURE Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Failed with result 'exit-code'. We will look into that, thank you. # cat /proc/17824/cgroup 12:memory:/ 11:pids:/user.slice/user-0.slice/session-623.scope 10:rdma:/ 9:hugetlb:/ 8:blkio:/ 7:devices:/user.slice 6:cpuset:/ 5:net_cls,net_prio:/ 4:freezer:/ 3:perf_event:/ 2:cpu,cpuacct:/user.slice/user-0.slice/session-623.scope 1:name=systemd:/user.slice/user-0.slice/session-623.scope 0::/user.slice/user-0.slice/session-623.scope On Tue, 11 Aug 2020 at 03:08, Lennart Poettering wrote: > On Do, 06.08.20 13:59, David Cunningham (dcunning...@voisonics.com) wrote: > > > Hello, > > > > I'm developing a service called product_routed which is managed by > systemd. > > The service can normally be stopped with "service product_routed stop" or > > "systemctl stop product_routed", however for some reason after the > service > > has been running for a while (a few days or more) the stop command no > > longer works. Can anyone help me find why? > > > > When the application stop works initially (for the first day or two) we > see > > a TERM signal sent to the application, as confirmed by logging in the > > application itself (which is written in perl), and is reported by "strace > > -p -e 'trace=!all'". However once the problem starts no signal is > > sent to the application at all when "service product_routed stop" or > > "systemctl stop product_routed" is run. > > Note that on systemd for a unit that is already stopped issuing > another "systemctl stop" is a NOP and doesnt result in another SIGTERM > to be sent > > So, when you issue your second "systemctl stop", is the service > actually running in systemd's eyes? (i.e. what does "systemctl status" > say about the service?) > > > The systemd file is as below, and we've confirmed that the PIDFile > contains > > the correct PID when the stop is attempted. Would anyone have any > > suggestions on how to debug this? Thank you in advance. > > > > # cat /etc/systemd/system/product_routed.service > > [Unit] > > Description=Product routing daemon > > After=syslog.target network.target mysql.service > > > > [Service] > > Type=forking > > ExecStart=/opt/product/current/bin/routed > > PIDFile=/var/run/product/routed.pid > > Restart=on-abnormal > > RestartSec=1 > > LimitSTACK=infinity > > LimitNOFILE=65535 > > LimitNPROC=65535 > > > > [Install] > > WantedBy=multi-user.target > > Please provide the "sytemctl status" output when this happens. > > Lennart > > -- > Lennart Poettering, Berlin > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] No signal sent to stop service
On Do, 06.08.20 13:59, David Cunningham (dcunning...@voisonics.com) wrote: > Hello, > > I'm developing a service called product_routed which is managed by systemd. > The service can normally be stopped with "service product_routed stop" or > "systemctl stop product_routed", however for some reason after the service > has been running for a while (a few days or more) the stop command no > longer works. Can anyone help me find why? > > When the application stop works initially (for the first day or two) we see > a TERM signal sent to the application, as confirmed by logging in the > application itself (which is written in perl), and is reported by "strace > -p -e 'trace=!all'". However once the problem starts no signal is > sent to the application at all when "service product_routed stop" or > "systemctl stop product_routed" is run. Note that on systemd for a unit that is already stopped issuing another "systemctl stop" is a NOP and doesnt result in another SIGTERM to be sent So, when you issue your second "systemctl stop", is the service actually running in systemd's eyes? (i.e. what does "systemctl status" say about the service?) > The systemd file is as below, and we've confirmed that the PIDFile contains > the correct PID when the stop is attempted. Would anyone have any > suggestions on how to debug this? Thank you in advance. > > # cat /etc/systemd/system/product_routed.service > [Unit] > Description=Product routing daemon > After=syslog.target network.target mysql.service > > [Service] > Type=forking > ExecStart=/opt/product/current/bin/routed > PIDFile=/var/run/product/routed.pid > Restart=on-abnormal > RestartSec=1 > LimitSTACK=infinity > LimitNOFILE=65535 > LimitNPROC=65535 > > [Install] > WantedBy=multi-user.target Please provide the "sytemctl status" output when this happens. Lennart -- Lennart Poettering, Berlin ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] No signal sent to stop service
Hi David. On Thu, Aug 06, 2020 at 01:59:03PM +1200, David Cunningham wrote: > The systemd file is as below, and we've confirmed that the PIDFile contains > the correct PID when the stop is attempted. Would anyone have any > suggestions on how to debug this? Thank you in advance. Is the given process running under the expected cgroup (check /proc/$PID/cgroup)? Note that the default KillMode=control-group would not necessarily kill the PIDFile process (systemd.kill (5)). HTH, Michal signature.asc Description: Digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] No signal sent to stop service
Hello, I'm developing a service called product_routed which is managed by systemd. The service can normally be stopped with "service product_routed stop" or "systemctl stop product_routed", however for some reason after the service has been running for a while (a few days or more) the stop command no longer works. Can anyone help me find why? When the application stop works initially (for the first day or two) we see a TERM signal sent to the application, as confirmed by logging in the application itself (which is written in perl), and is reported by "strace -p -e 'trace=!all'". However once the problem starts no signal is sent to the application at all when "service product_routed stop" or "systemctl stop product_routed" is run. The systemd file is as below, and we've confirmed that the PIDFile contains the correct PID when the stop is attempted. Would anyone have any suggestions on how to debug this? Thank you in advance. # cat /etc/systemd/system/product_routed.service [Unit] Description=Product routing daemon After=syslog.target network.target mysql.service [Service] Type=forking ExecStart=/opt/product/current/bin/routed PIDFile=/var/run/product/routed.pid Restart=on-abnormal RestartSec=1 LimitSTACK=infinity LimitNOFILE=65535 LimitNPROC=65535 [Install] WantedBy=multi-user.target -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel