Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Felipe, Felipe Sateler wrote: > > > Upstream asks if cgroup is in v2-mode in the affected systems. > With `findmnt -R /sys/fs/cgroup`. It should list controllers in the cgroup > or cgroup2 filesystems. root@lorenz:~# findmnt -R /sys/fs/cgroup TARGET SOURCE FSTYPE OPTIONS /sys/fs/cgroup tmpfs tmpfs rw,nosuid,nodev,noexec,mode=755 ├─/sys/fs/cgroup/unified cgroup2 cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate └─/sys/fs/cgroup/elogind cgroup cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/elogind/elogind-cgroups-agent,name=elogind Il giorno lun 4 feb 2019 alle ore 16:11 Gedalya ha scritto: > > > This currently looks like this now (after I have uninstalled > > cgroupfs-mount): > > > > → findmnt -R /sys/fs/cgroup > > TARGET SOURCE FSTYPE OPTIONS > > /sys/fs/cgroup tmpfs tmpfs rw,nosuid,nodev,noexec,mode=755 > > ├─/sys/fs/cgroup/unified cgroup2 cgroup2 > rw,nosuid,nodev,noexec,relatime,nsdelegate > > └─/sys/fs/cgroup/elogind cgroup cgroup > rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/elogind > > > > But I have to assume that these mount points where at least present > > before I uninstalled cgroupfs-mount, too. > > > > Regards, Axel > > I can say I've always had this issue with only cgroup2 mounted and no > cgroup (what you might call "v1") > > >
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
> This currently looks like this now (after I have uninstalled > cgroupfs-mount): > > → findmnt -R /sys/fs/cgroup > TARGET SOURCE FSTYPE OPTIONS > /sys/fs/cgroup tmpfs tmpfs rw,nosuid,nodev,noexec,mode=755 > ├─/sys/fs/cgroup/unified cgroup2 cgroup2 > rw,nosuid,nodev,noexec,relatime,nsdelegate > └─/sys/fs/cgroup/elogind cgroup cgroup > rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/elogind > > But I have to assume that these mount points where at least present > before I uninstalled cgroupfs-mount, too. > > Regards, Axel I can say I've always had this issue with only cgroup2 mounted and no cgroup (what you might call "v1")
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Felipe, Felipe Sateler wrote: > > > Upstream asks if cgroup is in v2-mode in the affected systems. > > > > How do I recognize this? I have no idea of how to check that. > > With `findmnt -R /sys/fs/cgroup`. It should list controllers in the cgroup > or cgroup2 filesystems. Thanks! This currently looks like this now (after I have uninstalled cgroupfs-mount): → findmnt -R /sys/fs/cgroup TARGET SOURCE FSTYPE OPTIONS /sys/fs/cgroup tmpfs tmpfs rw,nosuid,nodev,noexec,mode=755 ├─/sys/fs/cgroup/unified cgroup2 cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate └─/sys/fs/cgroup/elogind cgroup cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/elogind But I have to assume that these mount points where at least present before I uninstalled cgroupfs-mount, too. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On Mon, Feb 4, 2019 at 11:34 AM Axel Beckert wrote: > Hi Felipe, > > Felipe Sateler wrote: > > Upstream asks if cgroup is in v2-mode in the affected systems. > > How do I recognize this? I have no idea of how to check that. > With `findmnt -R /sys/fs/cgroup`. It should list controllers in the cgroup or cgroup2 filesystems. -- Saludos, Felipe Sateler
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Felipe, Felipe Sateler wrote: > Upstream asks if cgroup is in v2-mode in the affected systems. How do I recognize this? I have no idea of how to check that. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On Sun, Feb 3, 2019 at 6:41 PM Felipe Sateler wrote: > Control: forwarded -1 https://github.com/systemd/systemd/issues/11645 > > I have forwarded the bug upstream, and proposed two solutions. If upstream > likes one, we can apply that in the debian package. > > Upstream asks if cgroup is in v2-mode in the affected systems. This might cause the detection logic to get tripped. If you can report back whether that is the case, preferably directly on the upsteam issue, it would be great. -- Saludos, Felipe Sateler
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Control: forwarded -1 https://github.com/systemd/systemd/issues/11645 I have forwarded the bug upstream, and proposed two solutions. If upstream likes one, we can apply that in the debian package. Saludos
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Nicolas, Nicolas Cavallari wrote: > udev is started in rcS.d, way before elogind, which is in rc2.d. > > The result is that, at boot, udev clearly logs that it does > not detect cgroups, so it will not make its sigkilling spree. > > But after elogind is started, the cgroups are created. > udev really needs to be restarted after that point to > have it detect the cgroups and trigger the bug. Indeed, thanks! That explains why it didn't happen directly after reboot, but only after udev has been restarted at least once. Maybe uninstalling cgroupfs-mount removes something that should have been kept intact when elogind is also installed/running — and having either cgroupfs-mount or elogind installed triggers the issue. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On 03/02/2019 13:32, Axel Beckert wrote: > Hi Nicolas! > > Nicolas Cavallari wrote: >> I do not have cgroupsfs-mount installed, but i have elogind. > > Interesting. I have elogind installed, too, and I also have that > mountpoint at /sys/fs/cgroup/elogind, but nevertheless, uninstalling > cgroupfs-mount sufficed for me. IIRC I didn't do a reboot since then, > though. I forgot another important piece of information (because i lost the original response while trying to reproduce the bug): udev is started in rcS.d, way before elogind, which is in rc2.d. The result is that, at boot, udev clearly logs that it does not detect cgroups, so it will not make its sigkilling spree. But after elogind is started, the cgroups are created. udev really needs to be restarted after that point to have it detect the cgroups and trigger the bug. That, and udev must detect that its parent pid is 1, which can happen quickly when launched by start-stop-daemon --background.
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Nicolas! Nicolas Cavallari wrote: > I do not have cgroupsfs-mount installed, but i have elogind. Interesting. I have elogind installed, too, and I also have that mountpoint at /sys/fs/cgroup/elogind, but nevertheless, uninstalling cgroupfs-mount sufficed for me. IIRC I didn't do a reboot since then, though. > A typical /proc/self/cgroup is : > > 1:name=elogind:/ > 0::/ Mine looks very similar, yet not identical: 1:name=elogind:/581 0::/ No idea what the 581 refers to. It's not the process id of elogind-daemon. > So it is my understanding from the source code that manager->cgroup > should contain '/'. > > A debugging session breaking on on_post() very unhelpfully indicates > that 'manager', 'manager->cgroup', 'userdata' and other helpful > variables have been optimized out... Yay! (And thanks for testing!) > (I only use elogind to satisfy the overly broad dependencies of > libpolkit-qt5-1-1, but that is another bug, #794537). Overly broad dependencies on libpam-logind and the like seem to be rather common these days in Debian. :-( Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Package: udev Version: 240-5 Followup-For: Bug #918764 I do not have cgroupsfs-mount installed, but i have elogind. It apparently mounts /sys/fs/cgroup/unified, so this is enough for udev to think it is under systemd. A typical /proc/self/cgroup is : 1:name=elogind:/ 0::/ So it is my understanding from the source code that manager->cgroup should contain '/'. A debugging session breaking on on_post() very unhelpfully indicates that 'manager', 'manager->cgroup', 'userdata' and other helpful variables have been optimized out... (I only use elogind to satisfy the overly broad dependencies of libpolkit-qt5-1-1, but that is another bug, #794537). -- Package-specific info: -- System Information: Debian Release: buster/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'oldoldstable'), (500, 'unstable'), (500, 'oldstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.19.0-2-amd64 (SMP w/2 CPU cores) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to fr_FR.UTF-8), LANGUAGE=fr_FR.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to fr_FR.UTF-8) Shell: /bin/sh linked to /bin/dash Init: sysvinit (via /sbin/init) LSM: AppArmor: enabled Versions of packages udev depends on: ii adduser 3.118 ii libacl1 2.2.52-3+b1 ii libblkid12.33.1-0.1 ii libc62.28-5 ii libkmod2 25-2 ii libselinux1 2.8-1+b1 ii libudev1 240-5 ii lsb-base 10.2018112800 ii util-linux 2.33.1-0.1 udev recommends no packages. udev suggests no packages. Versions of packages udev is related to: pn systemd -- Configuration Files: /etc/init.d/udev changed [not included] /etc/udev/udev.conf changed [not included] -- debconf information excluded
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Felipe, thanks for looking into this. I've made a very quick test of the patch you provided this morning, patching systemd 240-5 source and rebuilding the whole thing. It works for me. Lorenz
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On Wed, Jan 30, 2019 at 7:47 PM Axel Beckert wrote: > Hi Felipe, > > a short reply with that information I can gather without much effort: > > Felipe Sateler wrote: > > > But we also had reports where this happend > > > with systemd, so this doesn't seem to be depend on the init system (at > > > most at the init system's default features) and hence also the package > > > cgroupfs-mount can't be held guilty for this. > > > > Can you point me at one? (sorry, I'm late to this bug and currently > ENOTIME > > to read the entire backlog). It seems this should not happen on systemd > > systems, because systemd properly isolates udev to its own cgroup when > > starting. > > See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918764#122 Ah, thanks. That example is running with systemd as pid1, but not running udev as a systemd-managed daemon. This is good, because it means the diagnosis has not been refuted. -- Saludos, Felipe Sateler
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Felipe, a short reply with that information I can gather without much effort: Felipe Sateler wrote: > > But we also had reports where this happend > > with systemd, so this doesn't seem to be depend on the init system (at > > most at the init system's default features) and hence also the package > > cgroupfs-mount can't be held guilty for this. > > Can you point me at one? (sorry, I'm late to this bug and currently ENOTIME > to read the entire backlog). It seems this should not happen on systemd > systems, because systemd properly isolates udev to its own cgroup when > starting. See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918764#122 Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On Tue, Jan 29, 2019 at 9:39 PM Axel Beckert wrote: > > Then I uninstalled not all of them at once but each of them one by > one. And the one after whose purging > > # service udev restart > # udevadm control --reload-rules > > didn't kill my processes anymore was cgroupfs-mount. > > So for some reason this killing only seems to happen on my box if > cgroupfs-mount is installed. Then again, this is only necessary if > systemd is not installed. Thanks everyone for the detailed debugging. This appears to be the culprit: udev tries to detect if running under systemd, and if so will kill its entire cgroup (to cleanup leftover processes). Looks like cgroupfs-mount is fooling udev into thinking it is running under systemd. Could someone attach gdb to udev, break on the function `on_post`, trigger the bug, and report what does `manager->cgroup` contain? > But we also had reports where this happend > with systemd, so this doesn't seem to be depend on the init system (at > most at the init system's default features) and hence also the package > cgroupfs-mount can't be held guilty for this. > Can you point me at one? (sorry, I'm late to this bug and currently ENOTIME to read the entire backlog). It seems this should not happen on systemd systems, because systemd properly isolates udev to its own cgroup when starting. > > Which IMHO again leaves either src:systemd or src:linux as rc-buggy > package. > I think something like this might be sufficient to work around the bug in sysvinit systems: % git diff diff --git a/src/udev/udevd.c b/src/udev/udevd.c index fb8724ea87..a03b65a773 100644 --- a/src/udev/udevd.c +++ b/src/udev/udevd.c @@ -1814,7 +1814,7 @@ static int run(int argc, char *argv[]) { dev_setup(NULL, UID_INVALID, GID_INVALID); -if (getppid() == 1) { +if (getppid() == 1 && sd_booted()) { /* get our own cgroup, we regularly kill everything udev has left behind we only do this on systemd systems, and only if we are directly spawned by PID1. otherwise we are not guaranteed to have a dedicated cgroup */ -- Saludos, Felipe Sateler
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi, Il giorno mer 30 gen 2019 alle ore 11:26 Axel Beckert ha scritto: > >His "Actually I'm wrong on this" mail > >(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918764#127) was > >(and actually still is) confusing me. > I'm wrong on the claim that, for example, i can crash a VT session by executing a 'udevadm' command from the graphic session, or vice versa. To be more clear, when you read message #122 just ignore those lines: > ..Final Bonus Weirdness: > if you start udevd in background in the VT session, then go to the graphic > session and prompt a udevadm command from there, it's the VT session that get crashed. The rest of the message. i.e. how to trigger the bug and the commit that introduces this bug in Debian's systemd still holds true to me. > >But some of the details from his first mail which were not cited in > >his "Actually I'm wrong on this" mail (mainly "This was introduced in > >commit e803efca") tell me that this _is_ actually a bug in the udev > >package. > That commit triggered the bug in Debian, but the bug itself was already in the code since at least systemd v232-15. According to my experiments, the culprit is the following: >when udevd is run in background and it's not detached with it's own '--daemon' option, >then a udevadm command is enough to kill everything. It does not make any sense to me, but that's it. Regards, Lorenz
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Control: severity -1 critical Control: found -1 239-8 Hi, Gedalya wrote: > > > Moving udev into its own special cgroup didn't change anything: udev > > > is still running, same PID, and and the same goes for ntpd. > > > Everything else is killed. > > > > And here you gave me the right hint: cgroups! > > OK, Lorenz discovered this first :-), Just re-read his mails, especially https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918764#122. His "Actually I'm wrong on this" mail (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=918764#127) was (and actually still is) confusing me. But some of the details from his first mail which were not cited in his "Actually I'm wrong on this" mail (mainly "This was introduced in commit e803efca") tell me that this _is_ actually a bug in the udev package. Hence back to RC severity — several users are affected and reverting a commit in the udev package is said to fix the issue. Michael: Any progress on being able to reproduce the issue? Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On Wed, 30 Jan 2019 01:37:14 +0100 Axel Beckert wrote: > > Moving udev into its own special cgroup didn't change anything: udev > > is still running, same PID, and and the same goes for ntpd. > > Everything else is killed. > > And here you gave me the right hint: cgroups! > OK, Lorenz discovered this first :-), cgroups are used by systemd to implement the concepts of user session or slice.
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Control: tag -1 moreinfo Hi! Gedalya wrote: > > It just happened again, triggered by wireshark-dkms > s/wireshark/wireguard/ Ehm, yes. :-) > Also, on my router, ntpd is moved to another cgroup (for routing > purposes). This is done in cgroup2, old cgroup is not mounted at > all. ntpd remains the only running process apart from init. > > Moving udev into its own special cgroup didn't change anything: udev > is still running, same PID, and and the same goes for ntpd. > Everything else is killed. And here you gave me the right hint: cgroups! I don't have systemd, but I wanted to play with anbox, which pulled in lxc and cgroupfs-mount besides others. And while I was able to kill all processes with "udevadm control --reload-rules" just beforehand, removing these packages made the issue vanish for me: [REMOVE, NOT USED] bridge-utils:amd64 1.6-2 [REMOVE, NOT USED] dnsmasq-base:amd64 2.80-1 [REMOVE, NOT USED] liblinux-lvm-perl:amd64 0.17-2 [REMOVE, NOT USED] redir:amd64 3.2-1 [REMOVE, DEPENDENCIES] anbox:amd64 0.0~git20181210-1 [REMOVE, DEPENDENCIES] liblxc1:amd64 1:3.1.0+really3.0.3-2 [REMOVE, DEPENDENCIES] lxc:amd64 1:3.1.0+really3.0.3-2 [REMOVE, DEPENDENCIES] lxctl:amd64 0.3.1+debian-4 [REMOVE, DEPENDENCIES] python3-lxc:amd64 1:3.0.3-1 [REMOVE, DEPENDENCIES] vagrant-lxc:amd64 1.4.3-1 [REMOVE] cgroupfs-mount:amd64 1.4 (from /var/log/aptitude) So I tried to figure out which of these packages actually trigger the change and installed one by one again. Inbetween each package installation run I did the following three commands to see if anything killed my SSH session: # udevadm control --reload-rules # service udev restart # udevadm control --reload-rules But none did directly. The issue came back later. though and I rebooted. Then I uninstalled not all of them at once but each of them one by one. And the one after whose purging # service udev restart # udevadm control --reload-rules didn't kill my processes anymore was cgroupfs-mount. So for some reason this killing only seems to happen on my box if cgroupfs-mount is installed. Then again, this is only necessary if systemd is not installed. But we also had reports where this happend with systemd, so this doesn't seem to be depend on the init system (at most at the init system's default features) and hence also the package cgroupfs-mount can't be held guilty for this. Which IMHO again leaves either src:systemd or src:linux as rc-buggy package. I've allowed myself to remove at least the moreinfo tag as there are now multiple hints on how to reproduce this issue. Will now run my systemd without cgroupfs-mount and see if I ran into this issue again soon. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Control: found -1 240-5 Hi again, Axel Beckert wrote: > Another case which predates my original bug report by a few weeks > (week before christmas or maybe even mid-december), but which I now > noticed that it had the exact same symptoms: > > I have three screens connected to my workstation: > > 1x DisplayPort > 1x HDMI > 1x DVI-D > > All three screens are powered via the same switchable power strip. > Everything was fine when I powered all three screens of by turning of > the powerstrip. > > But when I powered them on again by turning on the power strip again, > the machine showed the same symptoms. I remember that because I was > confused that the machine just "crashed" that moment I wanted to use > it again after a long weekend where I was only logged in remotely, if > at all. This is now confirmed to be the same issue. If I power on my three screens via a mechanical power switch in the power strip, it kills all my processes. (Maybe only one of them is the cause, please ping if this might be relevant, otherwise I won't test more details.) > I also plan to check if my "does not happen in the first 20 minutes > after reboot" is actually the "does not happen until udev is restarted > at least once after reboot" mentioned by some other victim of this > bug. Also verified. "udevadm control --reload-rules" doesn't kill processes directly after reboot (tested with 4.20-1~exp1), but as soon as I've called "service udev restart", the next "udevadm control --reload-rules" kill's all processes again. > And if that's the case and powering on a monitor triggers it, too, it > looks to me as this indeed a bug in udev. Looks like it, yes, but in the meanwhile I was also able to stop this issue by uinstalling some other packages. I though still haven't figured out which package exactly is relevant. Will sent another mail with details once I've figured out the details. But I already want to say thanks to Gedalya for giving the right hints on that! :-) Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On Sun, 27 Jan 2019 03:20:46 +0100 Lorenz wrote: > > Non-Systemd users can workaround this by using the init script from systemd > 239-7 > > https://salsa.debian.org/systemd-team/systemd/blob/debian/239-7/debian/udev.init > > or by editing the current init script replacing the --background option > with ' -- --daemonize". > (some other adjustments are nedded to the script) > Be aware that in both case this will reintroduce bug #791944 > Yes, brilliant! Thank you so much.
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
> It just happened again, triggered by wireshark-dkms s/wireshark/wireguard/ Coincidentally, I'm also using wireguard on my router. But I haven't been able to reproduce this simply by installing wireguard and setting up a few interfaces. Wait for the end Also, on my router, ntpd is moved to another cgroup (for routing purposes). This is done in cgroup2, old cgroup is not mounted at all. ntpd remains the only running process apart from init. Moving udev into its own special cgroup didn't change anything: udev is still running, same PID, and and the same goes for ntpd. Everything else is killed. If I create a new cgroup, restart udev, and then move all processes into the new cgroup (except for init, udev, and ntp), when I run the trigger nothing happens. If I move the processes into the new cgroup before I restart udev, udev still kills everything. Now, I tried commenting out all my cgroup2 related stuff and not mounting it at all, and peace came upon the earth and there was no more death. So now I know why it's only on my router. One more small observation: when udev does kill all processes, looking at the serial console I see getty being killed and respawned again after a few seconds. After the second time, I can actually log in. I wonder, humbly, where does this idea of killing all processes even come from? Is there a good reason for this code path to exist in udev, let alone what triggers it? Thank you all!
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi, two more situations where this happens for me: Axel Beckert wrote: > I have no idea why this is happening, but several packages use "udevadm > control --reload-rules" in their postinst (e.g. fuse) and if that's run, > all process except init are instantly killed […] It just happened again, triggered by wireshark-dkms (aptitude upgrade ran via ssh): --snip-- Version: 0.0.20190123 Kernel: 4.18.0-3-amd64 (x86_64) - Status: Before uninstall, this module version was ACTIVE on this kernel. wireguard.ko: - Uninstallation - Deleting from: /lib/modules/4.18.0-3-amd64/updates/dkms/ rmdir: failed to remove 'updates/dkms': Directory not empty - Original module - No original module was found for this module on this kernel. - Use the dkms install command to reinstall any previous module version. depmod... DKMS: uninstall completed. -- Deleting module version: 0.0.20190123 completely from the DKMS tree. -- Done. Loading new wireguard-0.0.20190123 DKMS files... Building for 4.18.0-3-amd64 4.20.0-trunk-amd64 Building initial module for 4.18.0-3-amd64 Done. wireguard.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/4.18.0-3-amd64/updates/dkms/ depmod... DKMS: install completed. Building initial module for 4.20.0-trunk-amd64 packet_write_wait: Connection to UNKNOWN port 65535: Broken pipe --snap-- So for some reason it was only triggered when building the module for 4.20.0-trunk-amd64, but not for 4.18.0-3-amd64 (which was running at that time). Another case which predates my original bug report by a few weeks (week before christmas or maybe even mid-december), but which I now noticed that it had the exact same symptoms: I have three screens connected to my workstation: 1x DisplayPort 1x HDMI 1x DVI-D All three screens are powered via the same switchable power strip. Everything was fine when I powered all three screens of by turning of the powerstrip. But when I powered them on again by turning on the power strip again, the machine showed the same symptoms. I remember that because I was confused that the machine just "crashed" that moment I wanted to use it again after a long weekend where I was only logged in remotely, if at all. Will verify this suspected relation as soon as I'm back home again as it now doesn't matter anymore as there is no X session anymore I'd have to rebuild upon relogin as I'll have to do that anyway. I also intend to power on the three screens one by one to see which type of screen triggers it. (My guess would be DisplayPort.) I also plan to check if my "does not happen in the first 20 minutes after reboot" is actually the "does not happen until udev is restarted at least once after reboot" mentioned by some other victim of this bug. And if that's the case and powering on a monitor triggers it, too, it looks to me as this indeed a bug in udev. The only other possibility I see with these symptoms would be a bug in the kernel's device handling since at least 4.18.x. And I think this issue should go back to RC-severity, now that at least 3 or 4 persons are affected, independent of being a bug in udev or somewhere else. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
> >when udevd is run in background and it's not detached with it's own > '--daemonize' option, > Sorry, udev option is '--daemon' , not --daemonize. > ..Final Bonus Weirdness: > if you start udevd in background in the VT session, then go to the graphic > session and prompt a udevadm command from there, it's the VT session that get crashed. Actually I'm wrong on this, it looks like switching from VT to Graphic session (and vice versa) with systemd trigger the restart of udev and this is enough to crash the session, no need to to prompt any 'udeadm' command in this case. Regards, Lorenz
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi, I manage to reproduce this with systemd, in a Virtualbox VM. here are the steps, after booting with systemd as init * stop udev # systemctl stop systemd-udevd make sure there is no 'systemd-udevd' process list in pstree * start udev in background like this # setsid --fork /lib/systemd/systemd-udevd OR like this # start-stop-daemon --start --name systemd-udevd --exec /lib/systemd/systemd-udevd --background in any case make sure udedv is running * try one of the following (wait few seconds after prompt) # udevadm trigger --type=subsystems action=add # udevadm trigger --type=devices action=add # udevadm settle As a result all process belonging to you session are crashed and restarted; if you are doing this in a VT you get kicked out and the screen is cleared; if you are doing this in a graphical session you get the login screen of your display manager (I use slim + lxqt ). This is a bit different from what happens when init is not systemd, in that i belive systemd-logind is constraining the killing within the session ( or the slice), but ... Final Bonus Weirdness: if you start udevd in background in the VT session, then go to the graphic session and prompt a udevadm command from there, it's the VT session that get crashed. This was introduced in commit e803efca https://salsa.debian.org/systemd-team/systemd/commit/e803efca59978aa5bb1d8806247f986d0c0f7e67 when udevd is run in background and it's not detached with it's own '--daemonize' option, then a udevadm command is enough to kill everything. The commits uses 'start-stop-daemon' with '--background' option, so it triggered a bug that was already in the code since who-knows-when.. i can reproduce this also with another VM (stretch) that has systemd 232-25. Non-Systemd users can workaround this by using the init script from systemd 239-7 https://salsa.debian.org/systemd-team/systemd/blob/debian/239-7/debian/udev.init or by editing the current init script replacing the --background option with ' -- --daemonize". (some other adjustments are nedded to the script) Be aware that in both case this will reintroduce bug #791944 Dear Systemd Maintainers, still you can't reproduce this? Can you please say something? Lorenz Il giorno gio 24 gen 2019 alle ore 08:36 Gedalya ha scritto: > Hi, > > With the help of snapshot.d.o, I've found that the problem first appeared > in 239-8. > > I've also been able to trigger it by restarting udev and running 'udevadm > control --log-priority=debug'. > > Still no insight on what is the factor causing this to happen on some > machines and not on others. > > Regards, > > Gedalya > > >
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi, With the help of snapshot.d.o, I've found that the problem first appeared in 239-8. I've also been able to trigger it by restarting udev and running 'udevadm control --log-priority=debug'. Still no insight on what is the factor causing this to happen on some machines and not on others. Regards, Gedalya
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi, I have been experiencing this issue on my local router. First it was a virtual machine running under KVM, with two pppoe connections, several vlans, no RAID, no (internal) LVM. After a while I migrated the functionality to the appropriate bare metal. I reinstalled and reconfigured from scratch as opposed to cloning the filesystem. Still no RAID, no LVM. I'm using sysvinit-core. What triggers it for me is usually the one of the ISPs terminating the ppp connection. All processes get killed, and init re-launches getty's. I have other machines using sysvinit-core but I'm unable to reproduce this issue on those. It just happened on the router VM and continues to happen on the physical router. In my case it seems to happen only after udev is restarted (as would sometimes happen during a dist-upgrade). I can freshly boot the machine, restart udev, and then either bring down or up any network interface, or indeed run 'udevadm control --reload-rules' and it happens. Without first restarting udev, I haven't been able to reliably reproduce this. It's been going on for a long time, and very hard to diagnose since nothing is logged, nothing is in dmesg, and I lose network access and don't normally have console access. All I can say is it has spanned several kernel versions and been going on definitely since systemd 239-?. Now I've attached a serial console cable so I'm more able to investigate. I welcome your suggestions. Thank you all, Gedalya
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Michael, Axel I had a system crash yesterday, during an upgrade. Udev, Grub and mdadm were all involved in the upgrade. The system went down during the postinst stage, leaving some packages uncofigured. Even if i can't reproduce running udevadm control --reload-rules I think it's the same problem. Also, i had 2 similar crashes during an upgrade in October and December 2018. Looking at apt logs i found that both udev and grub were involved in those upgrades. I can reproduce the crash with the following # dpkg -i -i udev_240-4_amd64.deb mdadm_4.1-1_amd64.deb also this works as well (in crashing my system) #dpkg -i udev_240-4_amd64.deb grub-pc_2.02+dfsg1-10_amd64.deb while the both the following don't lead to a crash # dpkg -i mdadm_4.1-1_amd64.deb grub-pc_2.02+dfsg1-10_amd64.deb # dpkg -i udev_240-4_amd64.deb && dpkg -i grub-pc_2.02+dfsg1-10_amd64.deb >Is this specific to version 240-2? >Could you try downgrading udev to either 239-15 or 240-1 and report back >with the results. Downgraded udev down to udev-239-7, wich looks safe to me while udev 239-8 is affected; i'm currently with 239-8. >Anyway, I'm taking Dmitry into Cc since sysvinit-core's init is the >only process which survives this issue and hence might be involved. i don't have sysvinit-core installed, init is runit. >So I wonder what part of my setup causes this: > >* 2x LVM on LUKS on MD RAID1 (2 spinning HDDs and 2 SSDs) >* an (internal) USB 3.0 SD card reader which lets LVM throw warnings > about "no medium found" for all devices from /dev/sde to /dev/sdk or so. >* Three screens (1x HDMI, 1x DP, 1x DVI-D) >* Logitech USB dongle with Solaar I have * runit as PID 1 * /home is on RAID mirror * mdadm is installed * systemd is not installed * kernel is 4.18.0-1-amd64 >I can't reproduce this problem. >Neither with a 4.18 (4.18.20-2), 4.19 (4.19.12-1) or 4.20 (4.20-1~exp1) >kernel. Tested both with systemd as PID 1 and inside a VM with sysvinit >as PID 1. That's not my enough, i suspect you need also one (or more than one) of the following: * no systemd installed * mdadm installed * a RAID setup (althought i'm not sure this one is feasible in virtualbox) Anything else i can do to help solving this? Thanks, Lorenz Il giorno mer 16 gen 2019 alle ore 15:49 Dmitry Bogatov ha scritto: > > [ More eyes is better, so please use sysvi...@packages.debian.org > instead personally me for sysvinit-related issues. I read list > carefully. ] > > [2019-01-15 16:17] Axel Beckert > > Anyway, I'm taking Dmitry into Cc since sysvinit-core's init is the > > only process which survives this issue and hence might be involved. > > (Dmitry: Please tell me if I should rather send this to the > > mailing-list.) > > > > I will probably also check if an earlier sysvinit version, like e.g. > > 2.88dsf-59.11 (as 2.88dsf-60 IIRC had some issues of its own), makes > > the issue go away, just to be sure (like with udev 239-15). > > Yes, please compare with sysvinit-core=2.88dsf-59.9 (version from > stable), but I doubt it have something to do with sysvinit, since the > only way sysvinit interacts with udev is 'Should-Start: udev' > dependency of some bin:initscripts. > >
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
[ More eyes is better, so please use sysvi...@packages.debian.org instead personally me for sysvinit-related issues. I read list carefully. ] [2019-01-15 16:17] Axel Beckert > Anyway, I'm taking Dmitry into Cc since sysvinit-core's init is the > only process which survives this issue and hence might be involved. > (Dmitry: Please tell me if I should rather send this to the > mailing-list.) > > I will probably also check if an earlier sysvinit version, like e.g. > 2.88dsf-59.11 (as 2.88dsf-60 IIRC had some issues of its own), makes > the issue go away, just to be sure (like with udev 239-15). Yes, please compare with sysvinit-core=2.88dsf-59.9 (version from stable), but I doubt it have something to do with sysvinit, since the only way sysvinit interacts with udev is 'Should-Start: udev' dependency of some bin:initscripts.
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Am 15.01.19 um 14:28 schrieb Michael Biebl: > This all sounds, like udevd is not the root cause of all this, tbh, > especially since you also reproduced it with 239. > I think "udevadm trigger" trigger something when then causes another > component of your system to act this way. Something, which seems rather obvious: Have you checked if your system is running out of memory and the kernel OOM-killer kicks in? -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Michael, Michael Biebl wrote: > Something, which seems rather obvious: > Have you checked if your system is running out of memory and the kernel > OOM-killer kicks in? I'm _very_ sure this isn't the case: a) the machine has 64 GB of RAM. b) I can even reproduce it after most processes are killed and only a few hundred megs of RAM are used (including cache) according to htop: Mem[320M/62.6G] c) I would have seen this in dmesg. d) If I've seen OOM on any of my Debian Sid machines recently, it was always chromium which was (partially) killed and nothing else. (Ok, no real reason, just a common observation of the past year. :-) So sorry, no, it's unfortunately not that easy. ;-) Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Michael, Michael Biebl wrote: > I'm downgrading this to non-RC, as I'm not convinced this is actually a > bug in udev Fair enough. (Actually I already thought about this, too.) While it still fulfils the requirements for "critical" (as it crashes everything except the kernel, i.e. the whole system), it also seems to affect only a very small number of users (namely 1) as so far no other user appeared and said "me too". Btw., one thing, which seems to speak against a hardware issue, is, that the kernel logs nothing relevant in dmesg and easily survives the issue. > This all sounds, like udevd is not the root cause of all this, tbh, > especially since you also reproduced it with 239. *nod* > I think "udevadm trigger" trigger something when then causes another > component of your system to act this way. Yes, in the meanwhile this is my suspicion, too. The big question is: What is it? So if udev isn't the real cause, there's one other obvious package which could be involved: sysvinit-core. Let's see if there's any obvious relation between any of the sysvinit uploads and the start of my workstation becoming unstable (sic!). Looking at the logs from uptimed, it either started around 25th of November 2018, 9th of December or 29th of December. While I can't remember the exact reasons for the reboots before Christmas, this now looks as if the machine became rather unstable more than a month ago. [...] 2589 days, 01:57:45 | Linux 4.17.0-1-amd64 Mon Jul 9 21:42:02 2018 2648 days, 20:51:31 | Linux 4.18.0-2-amd64 Sat Oct 6 23:56:49 2018 27 1 day , 00:02:04 | Linux 4.18.0-3-amd64 Sun Nov 25 01:34:26 2018 2812 days, 20:14:25 | Linux 4.18.0-3-amd64 Mon Nov 26 02:43:26 2018 29 7 days, 19:01:13 | Linux 4.19.0-trunk-amd64 Sun Dec 9 00:04:33 2018 30 6 days, 17:01:35 | Linux 4.18.0-3-amd64 Mon Dec 17 00:29:24 2018 31 0 days, 02:01:22 | Linux 4.19.0-1-amd64 Sun Dec 23 18:13:43 2018 32 6 days, 14:47:09 | Linux 4.18.0-3-amd64 Sun Dec 23 21:17:11 2018 33 0 days, 22:25:36 | Linux 4.18.0-3-amd64 Wed Jan 9 07:35:56 2019 34 1 day , 02:08:31 | Linux 4.19.0-1-amd64 Thu Jan 10 06:23:03 2019 35 2 days, 07:23:41 | Linux 4.18.0-3-amd64 Sat Jan 12 04:09:30 2019 36 0 days, 11:01:45 | Linux 4.20.0-trunk-amd64 Mon Jan 14 20:53:17 2019 -> 37 0 days, 07:11:47 | Linux 4.20.0-trunk-amd64 Tue Jan 15 08:08:38 2019 Please note that the box didn't log any uptime between 29th of December (23th of December + 6 days) and 9th of January (next boot). 29th of December is where I became aware of the issue (because the SSH connection died inmidst of a dist-upgrade and from then on I just got "connection refused" despite the machine pinged, because also SSHd died) and 9th of December is where I was back home and started to dig into the issue. Now comparing with the upload times of sysvinit: sysvinit (2.93-3) unstable; urgency=medium -- Dmitry Bogatov Sat, 05 Jan 2019 11:21:53 + sysvinit (2.93-2) unstable; urgency=medium -- Dmitry Bogatov Thu, 27 Dec 2018 09:49:41 + sysvinit (2.93-1) unstable; urgency=medium -- Dmitry Bogatov Tue, 04 Dec 2018 04:23:18 + sysvinit (2.92~beta-2) unstable; urgency=medium -- Dmitry Bogatov Fri, 23 Nov 2018 16:45:40 + sysvinit (2.92~beta-1) unstable; urgency=medium -- Dmitry Bogatov Thu, 22 Nov 2018 16:13:55 + sysvinit (2.91-1) experimental; urgency=medium -- Dmitry Bogatov Thu, 15 Nov 2018 15:43:24 + (IIRC I installed sysvinit 2.91-1 from Debian Experimental back then, too.) At least I don't see an obvious correlation to e.g. the new upstream releases (or even uploads) of sysvinit. Then again, this issue doesn't need to exactly relate to the upload or install times of sysvinit (can get the exact upgrade times of sysvinit or udev from the logs, if interested), but only appears if a package maintainer script calls "udevadm control --reload-rules" like e.g. fuse. Anyway, I'm taking Dmitry into Cc since sysvinit-core's init is the only process which survives this issue and hence might be involved. (Dmitry: Please tell me if I should rather send this to the mailing-list.) I will probably also check if an earlier sysvinit version, like e.g. 2.88dsf-59.11 (as 2.88dsf-60 IIRC had some issues of its own), makes the issue go away, just to be sure (like with udev 239-15). > >> Can you also try to run udevd in debug mode to get a log from udevd (see > >> /etc/udev/udev.conf) and also an strace of the udevadm command. > > > > I think I alread sent the strace, but forgot the debug mode. Enabled > > that when starting to write this mail, but it's currently caught by > > rsyslogd's rate-limiting, see above. > > It's the kernel, which does the rate limiting. Add > log_buf_len=1M printk.devkmsg=on > to the kernel command line to turn off the ratelimiting and increase the > ring buffer. Thanks for that hint! Added it to
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Control: severity -1 important I'm downgrading this to non-RC, as I'm not convinced this is actually a bug in udev Am 15.01.19 um 09:08 schrieb Axel Beckert: > Control: found -1 239-15 > > Hi Michael, > > Michael Biebl wrote: >> Am 12.01.19 um 01:02 schrieb Axel Beckert: >>> Control: reopen -1 >>> Control: found -1 240-3 >> >>> *sigh* I'm sorry to say, but it just happened again with udev 240-3 >>> and kernel 4.20-1~exp1. >> >> Would be good to know, if it also happens with 239-15 or if it's caused >> by some other update. > > I just downloaded udev and libudev1 239-15 from > https://snapshot.debian.org/, installed it and immediately afterwards > ran "udevadm control --reload-rules" and everything was gone again. > (Was running under kernel 4.20-1~exp1.) > > After that I rebooted into 4.20-1~exp1 with the downgraded udev again, > ran "udevadm control --reload-rules" directly after reboot and nothing > (unexpected) happened. > > Then I started to write this mail, doing not much more than logging in > on the text console (despite X was running) using zsh, running > ssh-agent, ssh-add and then ssh via autossh to connect to a screen > session on some other host to write the mail with mutt. > > After about 20 minutes of uptime I was about to send that mail and > thought, I should just try running "udevadm control --reload-rules" > once again: And it killed all processes again -- systemd-udevd has > been started again, too: > > ~ # ps auxwww | fgrep udev > root 10696 0.0 0.0 13152 3592 ?S08:27 0:00 > /lib/systemd/systemd-udevd > root 11582 0.0 0.0 8144 892 tty1 R+ 08:33 0:00 grep -F > --color=auto udev > ~ # uptime > 08:33:49 up 25 min, 2 users, load average: 0.07, 0.15, 0.34 > > Unfortunately there is not much in the syslog (had to start rsyslog > first again, too): > > Jan 15 08:37:16 c6 kernel: [ 1717.930890] printk: systemd-udevd: 159 output > lines suppressed due to ratelimiting > > I though didn't get rsyslog to drop the rate-limiting and I'm a little > bit in a hurry at the moment. > > Will report back later. > > I must admit that I also had one crash/process killing yesterday where > I can't say what triggered it. aptitude just finished starting up in > TUI mode (inside screen started via ssh from remote) and I was > starting to browse through the package list while the connection > suddenly was lost (likely due to a killed sshd). This all sounds, like udevd is not the root cause of all this, tbh, especially since you also reproduced it with 239. I think "udevadm trigger" trigger something when then causes another component of your system to act this way. > Some other facts gathered recently: > > * With udev 239-15 the bootup lag is gone even without the "sleep 5". > > * The "sleep 5" helped on another box (EeePC 900A with sysvinit) where > drivers weren't loaded anymore and had to be specified manually in > /etc/modules until the "sleep 5" was added. > * memtest86 and memtest86+ just show an empty screen. Will try again > with grub's graphical mode disabled just to make sure the issue is > not triggered by some memory fault. Question would be then why I > could (within a reboot where it happened) reliably reproduce the > issue again and again. Will report any findings on this front. > * As alternative I ran memtester for one night, no issues found. (Not > sure if it was able to test everything as the affected box has 64 GB > of RAM.) > * If the issue happens while using X, there's no chance to switch back > to the text console with the getty to login again. The machine needs > a hard reboot via reset or power button then. > >> Tbh, udevd or udevadm control --reload killing all processes, sounds >> pretty strange. > > Definitely. > >> Can you also try to run udevd in debug mode to get a log from udevd (see >> /etc/udev/udev.conf) and also an strace of the udevadm command. > > I think I alread sent the strace, but forgot the debug mode. Enabled > that when starting to write this mail, but it's currently caught by > rsyslogd's rate-limiting, see above. It's the kernel, which does the rate limiting. Add log_buf_len=1M printk.devkmsg=on to the kernel command line to turn off the ratelimiting and increase the ring buffer. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Control: found -1 239-15 Hi Michael, Michael Biebl wrote: > Am 12.01.19 um 01:02 schrieb Axel Beckert: > > Control: reopen -1 > > Control: found -1 240-3 > > > *sigh* I'm sorry to say, but it just happened again with udev 240-3 > > and kernel 4.20-1~exp1. > > Would be good to know, if it also happens with 239-15 or if it's caused > by some other update. I just downloaded udev and libudev1 239-15 from https://snapshot.debian.org/, installed it and immediately afterwards ran "udevadm control --reload-rules" and everything was gone again. (Was running under kernel 4.20-1~exp1.) After that I rebooted into 4.20-1~exp1 with the downgraded udev again, ran "udevadm control --reload-rules" directly after reboot and nothing (unexpected) happened. Then I started to write this mail, doing not much more than logging in on the text console (despite X was running) using zsh, running ssh-agent, ssh-add and then ssh via autossh to connect to a screen session on some other host to write the mail with mutt. After about 20 minutes of uptime I was about to send that mail and thought, I should just try running "udevadm control --reload-rules" once again: And it killed all processes again -- systemd-udevd has been started again, too: ~ # ps auxwww | fgrep udev root 10696 0.0 0.0 13152 3592 ?S08:27 0:00 /lib/systemd/systemd-udevd root 11582 0.0 0.0 8144 892 tty1 R+ 08:33 0:00 grep -F --color=auto udev ~ # uptime 08:33:49 up 25 min, 2 users, load average: 0.07, 0.15, 0.34 Unfortunately there is not much in the syslog (had to start rsyslog first again, too): Jan 15 08:37:16 c6 kernel: [ 1717.930890] printk: systemd-udevd: 159 output lines suppressed due to ratelimiting I though didn't get rsyslog to drop the rate-limiting and I'm a little bit in a hurry at the moment. Will report back later. I must admit that I also had one crash/process killing yesterday where I can't say what triggered it. aptitude just finished starting up in TUI mode (inside screen started via ssh from remote) and I was starting to browse through the package list while the connection suddenly was lost (likely due to a killed sshd). Some other facts gathered recently: * With udev 239-15 the bootup lag is gone even without the "sleep 5". * The "sleep 5" helped on another box (EeePC 900A with sysvinit) where drivers weren't loaded anymore and had to be specified manually in /etc/modules until the "sleep 5" was added. * memtest86 and memtest86+ just show an empty screen. Will try again with grub's graphical mode disabled just to make sure the issue is not triggered by some memory fault. Question would be then why I could (within a reboot where it happened) reliably reproduce the issue again and again. Will report any findings on this front. * As alternative I ran memtester for one night, no issues found. (Not sure if it was able to test everything as the affected box has 64 GB of RAM.) * If the issue happens while using X, there's no chance to switch back to the text console with the getty to login again. The machine needs a hard reboot via reset or power button then. > Tbh, udevd or udevadm control --reload killing all processes, sounds > pretty strange. Definitely. > Can you also try to run udevd in debug mode to get a log from udevd (see > /etc/udev/udev.conf) and also an strace of the udevadm command. I think I alread sent the strace, but forgot the debug mode. Enabled that when starting to write this mail, but it's currently caught by rsyslogd's rate-limiting, see above. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On Sat, 12 Jan 2019 14:54:09 +0100 Axel Beckert wrote: > I don't have local access to the affected machine for the weekend and > hence won't be able to test reboots before Monday again, though. > > I'm though also keen to know if a downgrade to udev 239 will make it > more stable again, so I'll definitely test that. Please make sure to also test 240-4 Regards, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Michael, Michael Biebl wrote: > Please also tar up /etc/udev/rules.d and /lib/udev/rules.d and attach it > to the bug report. Attached. I don't have local access to the affected machine for the weekend and hence won't be able to test reboots before Monday again, though. I'm though also keen to know if a downgrade to udev 239 will make it more stable again, so I'll definitely test that. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE debian-bug-918764-udev-rules.d.tar.xz Description: Binary data
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Please also tar up /etc/udev/rules.d and /lib/udev/rules.d and attach it to the bug report. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Am 12.01.19 um 01:02 schrieb Axel Beckert: > Control: reopen -1 > Control: found -1 240-3 > *sigh* I'm sorry to say, but it just happened again with udev 240-3 > and kernel 4.20-1~exp1. Would be good to know, if it also happens with 239-15 or if it's caused by some other update. Tbh, udevd or udevadm control --reload killing all processes, sounds pretty strange. Can you also try to run udevd in debug mode to get a log from udevd (see /etc/udev/udev.conf) and also an strace of the udevadm command. Thanks, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Control: reopen -1 Control: found -1 240-3 Hi Michael, Michael Biebl wrote: > > I luckily can no more reproduce this with udev 240-3 and kernel > > 4.19.13-1 or 4.20-1~exp1. > > Ok, thanks for testing. *sigh* I'm sorry to say, but it just happened again with udev 240-3 and kernel 4.20-1~exp1. > > Since I can no more reproduce this with the versions above and I'm > > also affected by (parts of) #918590 which make every reboot taking > > about 8 to 9 minutes due to waiting for LVM ( > > If I would have to guess, I'd say it's another instance of > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908796 > > You could try to add a sleep 5 at around line 179 in /etc/init.d/udev > right before the udevadm calls. Did that. And since the grub update (which calls LVM commands en masse) took ages, I thought that I can maybe speed that up by restarting udev with the modified init script: "service udev restart" clearly didn't call "sleep 5", exited successfully within a second or so. So I did a "service udev stop" and a few seconds later a "service udev start". After that 5 seconds of nothing, I got about a dozen lines of warnings (something about mtp devices with I/O errors, maybe related to my USB card reader without cards in it) and then was suddenly back on my console login's getty. Afterwards only init and the gettys were running. Not sure if the still running grub package configuration triggered it or part of the udev init script. Another "service udev stop" and "service udev start" though didn't trigger the issue again. Will try "udevadm control --reload-rules" once grub is finished updating and then will reboot, see if that changes anything (startup delays or process killings) and report more details. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
On Thu, 10 Jan 2019 06:55:32 +0100 Axel Beckert wrote: > Version: 240-3 > > Hi Michael, > > I luckily can no more reproduce this with udev 240-3 and kernel > 4.19.13-1 or 4.20-1~exp1. Ok, thanks for testing. > Since I can no more reproduce this with the versions above and I'm > also affected by (parts of) #918590 which make every reboot taking > about 8 to 9 minutes due to waiting for LVM ( If I would have to guess, I'd say it's another instance of https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908796 You could try to add a sleep 5 at around line 179 in /etc/init.d/udev right before the udevadm calls. Would be interesting to know, if that fixes the mdadm issue. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Am 09.01.19 um 07:32 schrieb Axel Beckert: > Package: udev > Version: 240-2 Is this specific to version 240-2? Could you try downgrading udev to either 239-15 or 240-1 and report back with the results. Thanks, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi Michael, thanks for having looked into it. Michael Biebl wrote: > I can't reproduce this problem. Ok, good to know that it doesn't seem to affect more or less standard installations. So I wonder what part of my setup causes this: * 2x LVM on LUKS on MD RAID1 (2 spinning HDDs and 2 SSDs) * an (internal) USB 3.0 SD card reader which lets LVM throw warnings about "no medium found" for all devices from /dev/sde to /dev/sdk or so. * Three screens (1x HDMI, 1x DP, 1x DVI-D) * Logitech USB dongle with Solaar > Neither with a 4.18 (4.18.20-2), 4.19 (4.19.12-1) or 4.20 (4.20-1~exp1) > kernel. Tested both with systemd as PID 1 and inside a VM with sysvinit > as PID 1. JFTR: I have sysvinit as PID 1. Will try to reboot into 4.19.13 and 4.20 later today to see if the problem is still reproducible. (Didn't find anything obvious in the 4.19.13 changelog, but a few things which might have affected my setup.) Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Control: tags -1 moreinfo unreproducible Am 09.01.19 um 11:16 schrieb Axel Beckert: > Hi, > > Axel Beckert wrote: >> I have no idea why this is happening, but several packages use "udevadm >> control --reload-rules" in their postinst (e.g. fuse) and if that's run, >> all process except init are instantly killed (reproducibly; the gettys >> seem to be respawned by init then, so I can login locally again) > > This seems to be kernel-dependent: > >> Kernel: Linux 4.19.0-1-amd64 (SMP w/8 CPU cores) > > This was the 4.19.12 kernel, i.e. not the most recent package because > I couldn't update it before getting the system back in a usable state. > > With linux-image-4.18.0-3-amd64 the issue is not present and both, > "udevadm control --reload-rules" as well as configuring fuse worked > again. > > So I'm no more sure if this is an issue in udev or the Linux 4.19. > Haven't yet tried the current 4.19.13 or 4.20. Will do that within the > next few days. > I can't reproduce this problem. Neither with a 4.18 (4.18.20-2), 4.19 (4.19.12-1) or 4.20 (4.20-1~exp1) kernel. Tested both with systemd as PID 1 and inside a VM with sysvinit as PID 1. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Hi, Axel Beckert wrote: > I have no idea why this is happening, but several packages use "udevadm > control --reload-rules" in their postinst (e.g. fuse) and if that's run, > all process except init are instantly killed (reproducibly; the gettys > seem to be respawned by init then, so I can login locally again) This seems to be kernel-dependent: > Kernel: Linux 4.19.0-1-amd64 (SMP w/8 CPU cores) This was the 4.19.12 kernel, i.e. not the most recent package because I couldn't update it before getting the system back in a usable state. With linux-image-4.18.0-3-amd64 the issue is not present and both, "udevadm control --reload-rules" as well as configuring fuse worked again. So I'm no more sure if this is an issue in udev or the Linux 4.19. Haven't yet tried the current 4.19.13 or 4.20. Will do that within the next few days. Regards, Axel -- ,''`. | Axel Beckert , https://people.debian.org/~abe/ : :' : | Debian Developer, ftp.ch.debian.org Admin `. `' | 4096R: 2517 B724 C5F6 CA99 5329 6E61 2FF9 CD59 6126 16B5 `-| 1024D: F067 EA27 26B9 C3FC 1486 202E C09E 1D89 9593 0EDE
Bug#918764: udev: "udevadm control --reload-rules" kills all processes except init
Package: udev Version: 240-2 Severity: critical Justification: Breaks whole system Hi, I have no idea why this is happening, but several packages use "udevadm control --reload-rules" in their postinst (e.g. fuse) and if that's run, all process except init are instantly killed (reproducibly; the gettys seem to be respawned by init then, so I can login locally again) and since sshd is killed, too, the system is usually no more accessible from remote (hence the severity "critical") despite still responding to ping replies. There's nothing in dmesg. And corekeeper didn't catch any core either. Some more details which might be helpful: * The udev daemon itself also vanishes and this happens independently of the udev daemon being running or not (i.e. "service udev start" before calling udevadm doesn't prevent this from happening). * The other two udevadm command from fuse's postinst don't seem to trigger this: udevadm test --action -p $(udevadm info -q path -n /dev/fuse) * It first happened when I install linux-image-4.20-trunk-amd64 from experimental about a week ago or so. Since I was away from home until yesterday evening, I could only recently start to debug this. Here's an strace of that command: execve("/sbin/udevadm", ["udevadm", "control", "--reload-rules"], 0x7ffd60743730 /* 39 vars */) = 0 brk(NULL) = 0x561a4e851000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=446764, ...}) = 0 mmap(NULL, 446764, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f52728ca000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260A\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1824496, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f52728c8000 mmap(NULL, 1837056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5272707000 mprotect(0x7f5272729000, 1658880, PROT_NONE) = 0 mmap(0x7f5272729000, 1343488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7f5272729000 mmap(0x7f5272871000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7f5272871000 mmap(0x7f52728be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7f52728be000 mmap(0x7f52728c4000, 14336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f52728c4000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libkmod.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0206\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=100400, ...}) = 0 mmap(NULL, 102472, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f52726ed000 mprotect(0x7f52726f, 86016, PROT_NONE) = 0 mmap(0x7f52726f, 61440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f52726f mmap(0x7f52726ff000, 20480, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12000) = 0x7f52726ff000 mmap(0x7f5272705000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f5272705000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libacl.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\37\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=35488, ...}) = 0 mmap(NULL, 2130592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f52724e4000 mprotect(0x7f52724eb000, 2097152, PROT_NONE) = 0 mmap(0x7f52726eb000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0x7f52726eb000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libblkid.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\257\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=343008, ...}) = 0 mmap(NULL, 345896, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f527248f000 mprotect(0x7f5272499000, 282624, PROT_NONE) = 0 mmap(0x7f5272499000, 212992, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa000) = 0x7f5272499000 mmap(0x7f52724cd000, 65536, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3e000) = 0x7f52724cd000 mmap(0x7f52724de000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4e000) =