Great question! I am very interested in detecting systemd crashes too since I have experienced them recently and have been asked to come up with a solution to react when a PID1 crash happens. In fact, in my recent experiences, a journald crash was enough to render the system into an unreliable/degraded state in which some top-level applications worked while others didn't.
So adding to David's 1st question, I need to detect systemd and journald crashes and then trigger a `systemctl reboot --force --force` command I have also read that Linux Magic System Request Key (SysRq) can help in such scenarios but I don't know how they work. Any help would be very appreciated. Thank you. Some related links: https://news.ycombinator.com/item?id=19023695 https://news.ycombinator.com/item?id=36873927 https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html El sáb, 3 feb 2024 a las 16:14, David Timber (<d...@dev.snart.me>) escribió: > Systemd crashed on me the other day. I was writing up some Systemd units > and testing them out by daemon-reload every time I wanted to test them > out. Not the best way to go on about, I know. My bad abusing Systemd to > the point of crashing. Perhaps it was just a bit flip that caused this. > > systemd[2368]: Assertion 'path_is_absolute(p)' failed at > src/basic/chase.c:628, function chase(). Aborting. > systemd[1]: Assertion 'path_is_absolute(p)' failed at > src/basic/chase.c:628, function chase(). Aborting. > systemd[1]: Caught <ABRT> from our own process. > systemd-coredump[32497]: Due to PID 1 having crashed coredump > collection will now be turned off. > systemd-coredump[32497]: [🡕] Process 32496 (systemd) of user 0 > dumped core. > systemd[1]: Caught <ABRT>, dumped core as pid 32496. > systemd[1]: Freezing execution. > > ... > > systemd-journald[871]: Failed to send stream file descriptor to > service manager: Transport endpoint is not connected > > I didn't even bother trying producing stack trace. I can get on that if > anyone wants it. My machine started doing some weird things like Firefox > not being able to do Ajax properly whilst being able to go to a new > page, Chromium not being able to create a new tab whilst all the text > editors worked just fine, all the systemctl commands timing out. So > basically, I was using Linux without fork(). Anyway. > Well, I think any software can crash for any reason whatsoever. The > problem with Systemd I realised from this incident is that I had no way > of knowing that Systemd had crashed until I opened up the journal and > kernel logs and saw that Systemd had crashed some time ago. In this > particular incident, Systemd caught the signal and decided to just > freeze. No idea why you'd want that because if it had just crashed, the > kernel would have just panicked and I would have realised something went > wrong. > > 1: So I decided that I need a some sort of "watchdog" that warns me when > something like this happens. Using dbus to poll the status of the > Systemd process, it could be a GUI app running under a seat, just a > daemon that writes a warning message using `wall` or just send mail > using a primed up MUA process. I wonder if someone already had the same > idea and went on to make one. > > 2: How do I get Systemd to freeze to test such program? I mean, if I > kill Systemd, the kernel would crash so I have to somehow tell Systemd > to freeze? > >