Bug#781209: postinst execution order bug confuses systemd
Hi, Okay, so after doing some testing in a virtual machine I'm confident that simply adding the symlink is enough to make the transition from a sysv-managed starter to the native service work as expected. I uploaded version 5.2.1-6 with this fix: https://anonscm.debian.org/cgit/pkg-swan/strongswan.git/commit/?id=1b7c683a32c62b6e08ad7bf5af39b9f4edd634f3 Thanks for your help, Michael! Cheers, -- Romain Francoise rfranco...@debian.org http://people.debian.org/~rfrancoise/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#781209: postinst execution order bug confuses systemd
On Thu, Mar 26, 2015 at 09:36:32PM +0100, Michael Biebl wrote: So I decided to ship a /lib/systemd/system/network-manager.service symlink pointing at NetworkManager.service: http://anonscm.debian.org/cgit/pkg-utopia/network-manager.git/tree/debian/rules#n64 Why do you have a call to dh_systemd_start for the NetworkManager service there? Shouldn't this simply work via dh_installinit using the symlink for the old name? -- Romain Francoise rfranco...@debian.org http://people.debian.org/~rfrancoise/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#781209: postinst execution order bug confuses systemd
Am 01.04.2015 um 18:23 schrieb Romain Francoise: On Thu, Mar 26, 2015 at 09:36:32PM +0100, Michael Biebl wrote: So I decided to ship a /lib/systemd/system/network-manager.service symlink pointing at NetworkManager.service: http://anonscm.debian.org/cgit/pkg-utopia/network-manager.git/tree/debian/rules#n64 Why do you have a call to dh_systemd_start for the NetworkManager service there? Shouldn't this simply work via dh_installinit using the symlink for the old name? It looks like indeed it's not needed. This change was added by bigon, maybe he just added it for symmetry with the dh_systemd_enable calls? I've CCed him, maybe he remembers the details. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#781209: postinst execution order bug confuses systemd
On Sun, Mar 29, 2015 at 06:45:45PM +0200, Michael Biebl wrote: Can you be more specific, what you have in mind here? Nevermind, I found a machine with Network Manager installed and got the answer to my question: with the symlink, systemd uses the target of the symlink as the real service and adds the name of the symlink to Names=, just like in the Alias case. Thanks, -- Romain Francoise rfranco...@debian.org http://people.debian.org/~rfrancoise/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#781209: postinst execution order bug confuses systemd
Hi Romain, Am 29.03.2015 um 12:56 schrieb Romain Francoise: On Thu, Mar 26, 2015 at 09:36:32PM +0100, Michael Biebl wrote: You could also ship the alias/symlink in the package, and not create it via Alias= Actually, that's what I would suggest to do anyway to align the old and new name. Thanks for the suggestion, that would probably be more reliable indeed. Can you confirm that systemd is smart enough to recognize the two units as the same service, even if only the second one is enabled? Can you be more specific, what you have in mind here? -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#781209: postinst execution order bug confuses systemd
On Thu, Mar 26, 2015 at 09:36:32PM +0100, Michael Biebl wrote: You could also ship the alias/symlink in the package, and not create it via Alias= Actually, that's what I would suggest to do anyway to align the old and new name. Thanks for the suggestion, that would probably be more reliable indeed. Can you confirm that systemd is smart enough to recognize the two units as the same service, even if only the second one is enabled? -- Romain Francoise rfranco...@debian.org http://people.debian.org/~rfrancoise/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#781209: postinst execution order bug confuses systemd
Hi, Thank you for this detailed report, and sorry for the inconvenience... On Thu, Mar 26, 2015 at 04:29:02AM +0200, Faidon Liambotis wrote: The package's postinst, however, is buggy: it does not use dh_installinit but calls invoke-rc.d ipsec manually. That would have been fine, but invoke-rc.d ipsec is called *before* the dh_systemd_enable/deb-systemd-helper bits. This means that invoke-rc.d ipsec start runs before the systemd unit is properly installed, which in turn confuses the hell out of systemd (as, among others, it expects a Type=simple unit), as evidenced by the following commands run in sequence: [...] # ipsec stop Stopping strongSwan IPsec... # grep systemd /var/log/syslog | tail -3 Mar 26 01:02:15 curium systemd[1]: Assertion 'path' failed at ../src/shared/cgroup-util.c:913, function cg_is_empty_recursive(). Aborting. Mar 26 01:02:15 curium systemd[1]: Caught ABRT, dumped core as pid 6916. Mar 26 01:02:15 curium systemd[1]: Freezing execution. Ouch, that's quite nasty. :( Moving the invoke-rc.d call below the debhelper marker would take care of this particular situation, however looking at other packages there's also the upgrade case to take into consideration: if it's already running we should shut down the sysvinit-controlled daemon before restarting it controlled by systemd. At least that's what openssh-server does. -- Romain Francoise rfranco...@debian.org http://people.debian.org/~rfrancoise/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#781209: postinst execution order bug confuses systemd
On Thu, 26 Mar 2015 10:41:55 +0100 Romain Francoise rfranco...@debian.org wrote: Hi, Thank you for this detailed report, and sorry for the inconvenience... On Thu, Mar 26, 2015 at 04:29:02AM +0200, Faidon Liambotis wrote: The package's postinst, however, is buggy: it does not use dh_installinit but calls invoke-rc.d ipsec manually. That would have been fine, but invoke-rc.d ipsec is called *before* the dh_systemd_enable/deb-systemd-helper bits. This means that invoke-rc.d ipsec start runs before the systemd unit is properly installed, which in turn confuses the hell out of systemd (as, among others, it expects a Type=simple unit), as evidenced by the following commands run in sequence: [...] # ipsec stop Stopping strongSwan IPsec... # grep systemd /var/log/syslog | tail -3 Mar 26 01:02:15 curium systemd[1]: Assertion 'path' failed at ../src/shared/cgroup-util.c:913, function cg_is_empty_recursive(). Aborting. Mar 26 01:02:15 curium systemd[1]: Caught ABRT, dumped core as pid 6916. Mar 26 01:02:15 curium systemd[1]: Freezing execution. Ouch, that's quite nasty. :( Moving the invoke-rc.d call below the debhelper marker would take care of this particular situation, however looking at other packages there's also the upgrade case to take into consideration: if it's already running we should shut down the sysvinit-controlled daemon before restarting it controlled by systemd. At least that's what openssh-server does. You could also ship the alias/symlink in the package, and not create it via Alias= Actually, that's what I would suggest to do anyway to align the old and new name. The network-manager package is in a similar situation. Upstream ships a NetworkManager.service unit file and the sysv init script in Debian is called /etc/init.d/network-manager. So I decided to ship a /lib/systemd/system/network-manager.service symlink pointing at NetworkManager.service: http://anonscm.debian.org/cgit/pkg-utopia/network-manager.git/tree/debian/rules#n64 -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#781209: postinst execution order bug confuses systemd
Package: strongswan-starter Version: 5.2.1-5 Severity: grave strongswan-starter currently ships: - /etc/init.d/ipsec - /lib/systemd/system/strongswan.service With the latter containing Alias=ipsec.service and also calling the ipsec binary with --nofork as an (implicit) Type=simple unit. This is all a bit confusing at start but pretty sane in general and the strongswan rename is a nice move (and also consistent with Ubuntu). The package's postinst, however, is buggy: it does not use dh_installinit but calls invoke-rc.d ipsec manually. That would have been fine, but invoke-rc.d ipsec is called *before* the dh_systemd_enable/deb-systemd-helper bits. This means that invoke-rc.d ipsec start runs before the systemd unit is properly installed, which in turn confuses the hell out of systemd (as, among others, it expects a Type=simple unit), as evidenced by the following commands run in sequence: # apt-get install strongswan [...] # systemctl status strongswan ● strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf Loaded: loaded (/lib/systemd/system/strongswan.service; enabled) Active: active (running) since Thu 2015-03-26 00:50:42 UTC; 6min ago CGroup: /system.slice/ipsec.service ├─5150 /usr/lib/ipsec/starter --daemon charon └─5151 /usr/lib/ipsec/charon --use-syslog [note how starter has been called without --nofork and there is a CGroup called ipsec.service, despite the unit called strongswan.service] # systemctl restart strongswan # systemctl status strongswan ● strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf Loaded: loaded (/lib/systemd/system/strongswan.service; enabled) Active: inactive (dead) since Thu 2015-03-26 01:00:59 UTC; 2s ago Process: 5783 ExecStart=/usr/sbin/ipsec start --nofork (code=exited, status=0/SUCCESS) Main PID: 5783 (code=exited, status=0/SUCCESS) Mar 26 01:00:59 curium systemd[1]: Started strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf. Mar 26 01:00:59 curium ipsec_starter[5783]: Starting strongSwan 5.2.1 IPsec [starter]... Mar 26 01:00:59 curium ipsec_starter[5783]: charon is already running (/var/run/charon.pid exists) -- skipping daemon start Mar 26 01:00:59 curium ipsec[5783]: Starting strongSwan 5.2.1 IPsec [starter]... Mar 26 01:00:59 curium ipsec[5783]: charon is already running (/var/run/charon.pid exists) -- skipping daemon start Mar 26 01:00:59 curium ipsec[5783]: starter is already running (/var/run/starter.charon.pid exists) -- no fork done [note the inactive/dead after a restart!] # ps aux |grep ipsec root 5150 0.0 0.0 17144 968 ?Ss 00:50 0:00 /usr/lib/ipsec/starter --daemon charon root 5151 0.0 0.0 1275680 5416 ?Ssl 00:50 0:00 /usr/lib/ipsec/charon --use-syslog Those are lingering/orphan processes, unmanaged by systemd. This won't happen every time -- it's a race but reproducible, I've managed to recreate it 5 times here already on two different servers. 19 times out of 20, no process will stay behind; ipsec won't be running at all, which is also a bug. The remaining 1 time, though, the service stays out of systemd's control and remains unmanageable; systemd thinks it's dead but it really is running. This is a) confusing to the sysadmin b) means that reloads will fail, c) means that a package removal won't actually stop the daemons, d) that tools such as puppet will try to restart it again and again but failing to do so. More importantly, though, it triggers a secondary bug in systemd itself. Continuing right from the execution path above: # ipsec stop Stopping strongSwan IPsec... # grep systemd /var/log/syslog | tail -3 Mar 26 01:02:15 curium systemd[1]: Assertion 'path' failed at ../src/shared/cgroup-util.c:913, function cg_is_empty_recursive(). Aborting. Mar 26 01:02:15 curium systemd[1]: Caught ABRT, dumped core as pid 6916. Mar 26 01:02:15 curium systemd[1]: Freezing execution. # systemctl status ^C At that point, the system barely works; systemctl etc. are not responding. I'll be filing the latter separately against systemd. However, the strongswan's postinst is buggy nevertheless and creates a situation uncommon enough to trigger this cascaded failure. Regards, Faidon -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org