Re: [systemd-devel] Detecting Systemd crash

Álvaro Cebrián Juan Sat, 03 Feb 2024 07:56:38 -0800

Great question!

I am very interested in detecting systemd crashes too since I have
experienced them recently and have been asked to come up with a solution to
react when a PID1 crash happens.
In fact, in my recent experiences, a journald crash was enough to render
the system into an unreliable/degraded state in which some top-level
applications worked while others didn't.


So adding to David's 1st question, I need to detect systemd and journald
crashes and then trigger a `systemctl reboot --force --force` command

I have also read that Linux Magic System Request Key (SysRq) can help in
such scenarios but I don't know how they work.

Any help would be very appreciated.
Thank you.

Some related links:
https://news.ycombinator.com/item?id=19023695
https://news.ycombinator.com/item?id=36873927
https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html


El sáb, 3 feb 2024 a las 16:14, David Timber (<d...@dev.snart.me>) escribió:

> Systemd crashed on me the other day. I was writing up some Systemd units
> and testing them out by daemon-reload every time I wanted to test them
> out. Not the best way to go on about, I know. My bad abusing Systemd to
> the point of crashing. Perhaps it was just a bit flip that caused this.
>
>     systemd[2368]: Assertion 'path_is_absolute(p)' failed at
>     src/basic/chase.c:628, function chase(). Aborting.
>     systemd[1]: Assertion 'path_is_absolute(p)' failed at
>     src/basic/chase.c:628, function chase(). Aborting.
>     systemd[1]: Caught <ABRT> from our own process.
>     systemd-coredump[32497]: Due to PID 1 having crashed coredump
>     collection will now be turned off.
>     systemd-coredump[32497]: [🡕] Process 32496 (systemd) of user 0
>     dumped core.
>     systemd[1]: Caught <ABRT>, dumped core as pid 32496.
>     systemd[1]: Freezing execution.
>
>     ...
>
>     systemd-journald[871]: Failed to send stream file descriptor to
>     service manager: Transport endpoint is not connected
>
> I didn't even bother trying producing stack trace. I can get on that if
> anyone wants it. My machine started doing some weird things like Firefox
> not being able to do Ajax properly whilst being able to go to a new
> page, Chromium not being able to create a new tab whilst all the text
> editors worked just fine, all the systemctl commands timing out. So
> basically, I was using Linux without fork(). Anyway.
> Well, I think any software can crash for any reason whatsoever. The
> problem with Systemd I realised from this incident is that I had no way
> of knowing that Systemd had crashed until I opened up the journal and
> kernel logs and saw that Systemd had crashed some time ago. In this
> particular incident, Systemd caught the signal and decided to just
> freeze. No idea why you'd want that because if it had just crashed, the
> kernel would have just panicked and I would have realised something went
> wrong.
>
> 1: So I decided that I need a some sort of "watchdog" that warns me when
> something like this happens. Using dbus to poll the status of the
> Systemd process, it could be a GUI app running under a seat, just a
> daemon that writes a warning message using `wall` or just send mail
> using a primed up MUA process. I wonder if someone already had the same
> idea and went on to make one.
>
> 2: How do I get Systemd to freeze to test such program? I mean, if I
> kill Systemd, the kernel would crash so I have to somehow tell Systemd
> to freeze?
>
>

Re: [systemd-devel] Detecting Systemd crash

Reply via email to