Re: [systemd-devel] Triggering the HW Watchdog

2018-02-27 Thread D.S. Ljungmark
Partially,

   It shows that systemd is handling the watchdog as I expect it to
here, but it also means that the "dysfunctional" times where the
system isn't resetting properly is _not_ due to watchdog triggering,
but is a "normal system" according to systemd.

Which is a worse case for me, since it's harder to debug.

So, conclusion:
 systemd seems to handle watchdog properly
 systemd seems to not die properly when we expect it to, leaving us to
find more debugging.

I hope that makes more sense than less.


On Tue, Feb 27, 2018 at 5:34 PM, Mantas Mikulėnas  wrote:
> On Tue, Feb 27, 2018 at 6:25 PM, D.S. Ljungmark  wrote:
>>
>>
>>
>> On 27/02/18 15:21, Lennart Poettering wrote:
>> > On Di, 27.02.18 15:12, D.S. Ljungmark (ljungm...@modio.se) wrote:
>> >
>> >>> I figure you can send SIGSTOP to PID 1, no? (there are some signals
>> >>> the kernel blocks for PID 1, but I think SIGSTOP is not among them,
>> >>> please try)
>> >>
>> >> It seems that SIGSTOP is being filtered, because nothing appears to
>> >> happen, and the system certainly isn't rebooting.
>> >
>> > You should be able to trigger an abort in PID 1 by sending it SIGABRT
>> > or SIGQUIT or so. If PID 1 aborts it will actually enter a freeze loop
>> > in which it stops pinging the hw watchdog.
>> >
>> > Lennart
>>
>>
>> ABRT works,  or well..
>>
>> systemd[1]: Caught , core dump failed (child 3844, code=killed,
>> status=6/ABRT).
>>
>> And then a broadcast, freezing execution
>>
>>
>> And after that, what I was afraid of:
>>
>> [25417.186351] watchdog: watchdog0: watchdog did not stop!
>>
>
> Isn't that exactly the result you asked for?
>
> --
> Mantas Mikulėnas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Triggering the HW Watchdog

2018-02-27 Thread Mantas Mikulėnas
On Tue, Feb 27, 2018 at 6:25 PM, D.S. Ljungmark  wrote:

>
>
> On 27/02/18 15:21, Lennart Poettering wrote:
> > On Di, 27.02.18 15:12, D.S. Ljungmark (ljungm...@modio.se) wrote:
> >
> >>> I figure you can send SIGSTOP to PID 1, no? (there are some signals
> >>> the kernel blocks for PID 1, but I think SIGSTOP is not among them,
> >>> please try)
> >>
> >> It seems that SIGSTOP is being filtered, because nothing appears to
> >> happen, and the system certainly isn't rebooting.
> >
> > You should be able to trigger an abort in PID 1 by sending it SIGABRT
> > or SIGQUIT or so. If PID 1 aborts it will actually enter a freeze loop
> > in which it stops pinging the hw watchdog.
> >
> > Lennart
>
>
> ABRT works,  or well..
>
> systemd[1]: Caught , core dump failed (child 3844, code=killed,
> status=6/ABRT).
>
> And then a broadcast, freezing execution
>
>
> And after that, what I was afraid of:
>
> [25417.186351] watchdog: watchdog0: watchdog did not stop!
>
>
Isn't that exactly the result you asked for?

-- 
Mantas Mikulėnas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Triggering the HW Watchdog

2018-02-27 Thread D.S. Ljungmark


On 27/02/18 15:21, Lennart Poettering wrote:
> On Di, 27.02.18 15:12, D.S. Ljungmark (ljungm...@modio.se) wrote:
> 
>>> I figure you can send SIGSTOP to PID 1, no? (there are some signals
>>> the kernel blocks for PID 1, but I think SIGSTOP is not among them,
>>> please try)
>>
>> It seems that SIGSTOP is being filtered, because nothing appears to
>> happen, and the system certainly isn't rebooting.
> 
> You should be able to trigger an abort in PID 1 by sending it SIGABRT
> or SIGQUIT or so. If PID 1 aborts it will actually enter a freeze loop
> in which it stops pinging the hw watchdog.
> 
> Lennart


ABRT works,  or well..

systemd[1]: Caught , core dump failed (child 3844, code=killed,
status=6/ABRT).

And then a broadcast, freezing execution


And after that, what I was afraid of:

[25417.186351] watchdog: watchdog0: watchdog did not stop!


Well, that gives me a tool to debug this with, Thank you!


//D.S

-- 
8362 CB14 98AD 11EF CEB6  FA81 FCC3 7674 449E 3CFC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Triggering the HW Watchdog

2018-02-27 Thread D.S. Ljungmark
( re-send as I forgot the list )

On 27/02/18 13:20, Lennart Poettering wrote:> On Di, 27.02.18 12:44,
D.S. Ljungmark (ljungm...@modio.se) wrote:
>
>> Hi list!
>>
>>  We're using systemd to control the hardware watchdog, and would want to
>> induce fail state to _verify_ that the shutdown/reboot process works as
>> expected.
>>
>> How do we make systemd "fail" to ping the watchdog?
>
> I figure you can send SIGSTOP to PID 1, no? (there are some signals
> the kernel blocks for PID 1, but I think SIGSTOP is not among them,
> please try)

It seems that SIGSTOP is being filtered, because nothing appears to
happen, and the system certainly isn't rebooting.


>> How do we control which states ( root fs not available, etc) cause
>> systemd to not ping the hardware watchdog?
>
> The watchdog is for detecting software hanging. Root fs not being
> available does not really qualify as "software hanging". If you want
> to reboot the machine if it fails to bring everything up, then use
> JobTimeoutAction= on some suitable action, for example local-fs.target
> or multi-user.target.
>
> Lennart
Thanks,
  I'm trying to get to a state where the machine fails over and triggers
watchdog on known things, rather than triggering the rescue shell or
similar.


I'll try with a jobtimeout on multi-user.

//D.S.


-- 
8362 CB14 98AD 11EF CEB6  FA81 FCC3 7674 449E 3CFC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Triggering the HW Watchdog

2018-02-27 Thread Lennart Poettering
On Di, 27.02.18 15:12, D.S. Ljungmark (ljungm...@modio.se) wrote:

> > I figure you can send SIGSTOP to PID 1, no? (there are some signals
> > the kernel blocks for PID 1, but I think SIGSTOP is not among them,
> > please try)
> 
> It seems that SIGSTOP is being filtered, because nothing appears to
> happen, and the system certainly isn't rebooting.

You should be able to trigger an abort in PID 1 by sending it SIGABRT
or SIGQUIT or so. If PID 1 aborts it will actually enter a freeze loop
in which it stops pinging the hw watchdog.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Triggering the HW Watchdog

2018-02-27 Thread Lennart Poettering
On Di, 27.02.18 12:44, D.S. Ljungmark (ljungm...@modio.se) wrote:

> Hi list!
> 
>  We're using systemd to control the hardware watchdog, and would want to
> induce fail state to _verify_ that the shutdown/reboot process works as
> expected.
> 
> How do we make systemd "fail" to ping the watchdog?

I figure you can send SIGSTOP to PID 1, no? (there are some signals
the kernel blocks for PID 1, but I think SIGSTOP is not among them,
please try)

> How do we control which states ( root fs not available, etc) cause
> systemd to not ping the hardware watchdog?

The watchdog is for detecting software hanging. Root fs not being
available does not really qualify as "software hanging". If you want
to reboot the machine if it fails to bring everything up, then use
JobTimeoutAction= on some suitable action, for example local-fs.target
or multi-user.target.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Triggering the HW Watchdog

2018-02-27 Thread D.S. Ljungmark
Hi list!

 We're using systemd to control the hardware watchdog, and would want to
induce fail state to _verify_ that the shutdown/reboot process works as
expected.

How do we make systemd "fail" to ping the watchdog?

How do we control which states ( root fs not available, etc) cause
systemd to not ping the hardware watchdog?

//D.S.
-- 
8362 CB14 98AD 11EF CEB6  FA81 FCC3 7674 449E 3CFC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel