Re: [systemd-devel] No signal sent to stop service

2020-08-13 Thread Michal Koutný
Hello David.

On Tue, Aug 11, 2020 at 02:33:11PM +1200, David Cunningham 
 wrote:
> The problem is most likely with systemd thinking the program is stopped
> because "systemctl status" reports:
> Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Main process
> exited, code=exited, status=1/FAILURE
> Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Failed with
> result 'exit-code'.
This means there is a mismatch between what the service considers its
man PID (17824) and what systemd tracks -- the tracked process
apparently terminated with failure exit code.

> 1:name=systemd:/user.slice/user-0.slice/session-623.scope
> 0::/user.slice/user-0.slice/session-623.scope
This suggests that the alleged main process (from PID file) was migrated
out of the service's cgroup into session scope (pam_systemd, this can
happen when daemon would switch uid calling into PAM, such as with
su(do).) or it was started directly in the user session.

My suggestion is to check whether MainPID (next time please share full
`systemctl status output`) matches the contents of your PID file (while
the service is "stoppable" and afterwards).

Second, it's worth reviewing what happens around the time when the "Main
process exited" message appears (you can increase PID 1 verbosity
`systemd-analyze set-log-level debug` in order to rule out systemd
issue). 

One idea is that someone starts another service instance from their user
session which breaks the original instance and the new one is not
tracked by systemd.

HTH,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] No signal sent to stop service

2020-08-10 Thread David Cunningham
Hello Lennart and Michal,

Thank you for your replies. The cgroup file is below - can you please
advise what is the relevant part to check?

The problem is most likely with systemd thinking the program is stopped
because "systemctl status" reports:
Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Main process
exited, code=exited, status=1/FAILURE
Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Failed with
result 'exit-code'.

We will look into that, thank you.

# cat /proc/17824/cgroup
12:memory:/
11:pids:/user.slice/user-0.slice/session-623.scope
10:rdma:/
9:hugetlb:/
8:blkio:/
7:devices:/user.slice
6:cpuset:/
5:net_cls,net_prio:/
4:freezer:/
3:perf_event:/
2:cpu,cpuacct:/user.slice/user-0.slice/session-623.scope
1:name=systemd:/user.slice/user-0.slice/session-623.scope
0::/user.slice/user-0.slice/session-623.scope


On Tue, 11 Aug 2020 at 03:08, Lennart Poettering 
wrote:

> On Do, 06.08.20 13:59, David Cunningham (dcunning...@voisonics.com) wrote:
>
> > Hello,
> >
> > I'm developing a service called product_routed which is managed by
> systemd.
> > The service can normally be stopped with "service product_routed stop" or
> > "systemctl stop product_routed", however for some reason after the
> service
> > has been running for a while (a few days or more) the stop command no
> > longer works. Can anyone help me find why?
> >
> > When the application stop works initially (for the first day or two) we
> see
> > a TERM signal sent to the application, as confirmed by logging in the
> > application itself (which is written in perl), and is reported by "strace
> > -p  -e 'trace=!all'". However once the problem starts no signal is
> > sent to the application at all when "service product_routed stop" or
> > "systemctl stop product_routed" is run.
>
> Note that on systemd for a unit that is already stopped issuing
> another "systemctl stop" is a NOP and doesnt result in another SIGTERM
> to be sent
>
> So, when you issue your second "systemctl stop", is the service
> actually running in systemd's eyes? (i.e. what does "systemctl status"
> say about the service?)
>
> > The systemd file is as below, and we've confirmed that the PIDFile
> contains
> > the correct PID when the stop is attempted. Would anyone have any
> > suggestions on how to debug this? Thank you in advance.
> >
> > # cat /etc/systemd/system/product_routed.service
> > [Unit]
> > Description=Product routing daemon
> > After=syslog.target network.target mysql.service
> >
> > [Service]
> > Type=forking
> > ExecStart=/opt/product/current/bin/routed
> > PIDFile=/var/run/product/routed.pid
> > Restart=on-abnormal
> > RestartSec=1
> > LimitSTACK=infinity
> > LimitNOFILE=65535
> > LimitNPROC=65535
> >
> > [Install]
> > WantedBy=multi-user.target
>
> Please provide the "sytemctl status" output when this happens.
>
> Lennart
>
> --
> Lennart Poettering, Berlin
>


-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] No signal sent to stop service

2020-08-10 Thread Lennart Poettering
On Do, 06.08.20 13:59, David Cunningham (dcunning...@voisonics.com) wrote:

> Hello,
>
> I'm developing a service called product_routed which is managed by systemd.
> The service can normally be stopped with "service product_routed stop" or
> "systemctl stop product_routed", however for some reason after the service
> has been running for a while (a few days or more) the stop command no
> longer works. Can anyone help me find why?
>
> When the application stop works initially (for the first day or two) we see
> a TERM signal sent to the application, as confirmed by logging in the
> application itself (which is written in perl), and is reported by "strace
> -p  -e 'trace=!all'". However once the problem starts no signal is
> sent to the application at all when "service product_routed stop" or
> "systemctl stop product_routed" is run.

Note that on systemd for a unit that is already stopped issuing
another "systemctl stop" is a NOP and doesnt result in another SIGTERM
to be sent

So, when you issue your second "systemctl stop", is the service
actually running in systemd's eyes? (i.e. what does "systemctl status"
say about the service?)

> The systemd file is as below, and we've confirmed that the PIDFile contains
> the correct PID when the stop is attempted. Would anyone have any
> suggestions on how to debug this? Thank you in advance.
>
> # cat /etc/systemd/system/product_routed.service
> [Unit]
> Description=Product routing daemon
> After=syslog.target network.target mysql.service
>
> [Service]
> Type=forking
> ExecStart=/opt/product/current/bin/routed
> PIDFile=/var/run/product/routed.pid
> Restart=on-abnormal
> RestartSec=1
> LimitSTACK=infinity
> LimitNOFILE=65535
> LimitNPROC=65535
>
> [Install]
> WantedBy=multi-user.target

Please provide the "sytemctl status" output when this happens.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] No signal sent to stop service

2020-08-10 Thread Michal Koutný
Hi David.

On Thu, Aug 06, 2020 at 01:59:03PM +1200, David Cunningham 
 wrote:
> The systemd file is as below, and we've confirmed that the PIDFile contains
> the correct PID when the stop is attempted. Would anyone have any
> suggestions on how to debug this? Thank you in advance.
Is the given process running under the expected cgroup
(check /proc/$PID/cgroup)?

Note that the default KillMode=control-group would not necessarily kill
the PIDFile process (systemd.kill (5)).

HTH,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] No signal sent to stop service

2020-08-05 Thread David Cunningham
Hello,

I'm developing a service called product_routed which is managed by systemd.
The service can normally be stopped with "service product_routed stop" or
"systemctl stop product_routed", however for some reason after the service
has been running for a while (a few days or more) the stop command no
longer works. Can anyone help me find why?

When the application stop works initially (for the first day or two) we see
a TERM signal sent to the application, as confirmed by logging in the
application itself (which is written in perl), and is reported by "strace
-p  -e 'trace=!all'". However once the problem starts no signal is
sent to the application at all when "service product_routed stop" or
"systemctl stop product_routed" is run.

The systemd file is as below, and we've confirmed that the PIDFile contains
the correct PID when the stop is attempted. Would anyone have any
suggestions on how to debug this? Thank you in advance.

# cat /etc/systemd/system/product_routed.service
[Unit]
Description=Product routing daemon
After=syslog.target network.target mysql.service

[Service]
Type=forking
ExecStart=/opt/product/current/bin/routed
PIDFile=/var/run/product/routed.pid
Restart=on-abnormal
RestartSec=1
LimitSTACK=infinity
LimitNOFILE=65535
LimitNPROC=65535

[Install]
WantedBy=multi-user.target


-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel