Hi, @Julien Grall

On Tue, Mar 11, 2025 at 11:04 PM Julien Grall <jul...@xen.org> wrote:
>
> Hi Mykola,
>
> On 05/03/2025 09:11, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kv...@epam.com>
> >
> > This patch implements suspend/resume helpers for the watchdog.
> > While a domain is suspended its watchdogs must be paused. Otherwise,
> > if the domain stays in the suspend state for a longer period of time
> > compared to the watchdog period, the domain would be shutdown on resume.
> > Proper solution to this problem is to stop (suspend) the watchdog timers
> > after the domain suspends and to restart (resume) the watchdog timers
> > before the domain resumes. The suspend/resume of watchdog timers is done
> > in Xen and is invisible to the guests.
> >
> > Signed-off-by: Mirela Simonovic <mirela.simono...@aggios.com>
> > Signed-off-by: Saeed Nowshadi <saeed.nowsh...@xilinx.com>
> > Signed-off-by: Mykyta Poturai <mykyta_potu...@epam.com>
> > Signed-off-by: Mykola Kvach <mykola_kv...@epam.com>
> > ---
> > Changes in v3:
> > - cover the code with CONFIG_SYSTEM_SUSPEND
> >
> > Changes in v2:
> > - drop suspended field from timer structure
> > - drop the call of watchdog_domain_resume from ctxt_switch_to
> > ---
> >   xen/common/sched/core.c | 39 +++++++++++++++++++++++++++++++++++++++
> >   xen/include/xen/sched.h |  9 +++++++++
> >   2 files changed, 48 insertions(+)
> >
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index b1c6b6b9fa..6c2231826a 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -1605,6 +1605,45 @@ void watchdog_domain_destroy(struct domain *d)
> >           kill_timer(&d->watchdog_timer[i].timer);
> >   }
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
>
> The config option is Arm specific, yet this is common code. As x86,
> already suspend/resume, then shouldn't the config option be common?
>
> But more importantly, why do we need to save/restore the watchdogs for
> Arm but not x86? Is this a latent issue or design choice?

I’ve looked into this a bit. Here's what I've found:

A watchdog timer is created and initialized (but not started) for each
domain during the domain_create call. This timer can be triggered either by
the Linux kernel (I refer to Linux kernel–based operating systems because I
have access to the code and can confirm that the Xen watchdog timer is
implemented there. I don’t have knowledge about the existence or
implementation of the Xen watchdog driver in other operating systems) Xen
watchdog driver or by the xenwatchdogd service.

Case 1: Watchdog started by the Linux kernel driver (I hope that operating
systems not based on the Linux kernel also implement proper handling for
the Xen watchdog timer driver during suspend and resume)
When the Xen watchdog is started by the Linux kernel driver, everything
works as expected. The driver correctly handles system suspend events:
https://elixir.bootlin.com/linux/v6.14.8/source/drivers/watchdog/xen_wdt.c#L169

Case 2: Watchdog started by xenwatchdogd service
However, when the watchdog is started by the xenwatchdogd service, neither
the underlying OS nor the daemon takes care of stopping the watchdog timer
during suspend:

https://elixir.bootlin.com/xen/v4.20.0/source/tools/hotplug/Linux/init.d/xen-watchdog.in

https://elixir.bootlin.com/xen/v4.20.0/source/tools/hotplug/NetBSD/rc.d/xen-watchdog

Behavior on x86 during suspend:
- Linux guest is configured with xenwatchdogd, and the Xen watchdog is
started at boot
- the OS initiates suspend (we request)
- at that moment, there's an active watchdog timer in Xen for the domain,
set to, say, 15 seconds
- after suspend preparations, domain_shutdown() is called with the
SHUTDOWN_suspend argument inside Xen hypervisor internals
- inside this function, the is_shutting_down flag is set in the domain
structure
- when the watchdog timer expires, the Xen handler skips the reset action
because the domain is marked as shutting down:

https://elixir.bootlin.com/xen/v4.20.0/source/xen/common/sched/core.c#L1539

So far, everything behaves correctly.

*BUT* *for the second case there is another flow. The domain starts
resuming from suspend. As part of the resume process, the is_shutting_down
flag inside the domain is cleared, which re-enables normal watchdog
behavior. However, the watchdog timer—set before suspend—has nearly
expired. Because the OS and its user-space services (such as the watchdog
pinging daemon) have not yet fully resumed and restarted, the watchdog
timeout occurs before the ping can be sent. As a result, the watchdog
triggers a reset or shutdown (as far as i know can't be another action of
watchdog expiry, but we aren't interested in these options right now)
before the service has a chance to take control again — effectively making
a clean resume impossible.*

It's also unclear how common this situation is on x86 systems —
specifically, whether xenwatchdogd is typically used in domU or dom0, or
whether the kernel driver is more commonly relied upon instead.
---

In my opinion, since the guest OS is the one starting the Xen watchdog, it
should also manage its state during suspend/resume transitions. The general
expectation across architectures is that the OS handles device state
management when transitioning between power states. This includes stopping
or parking watchdog timers during suspend.
I think proper handling should be added to the relevant services to avoid
unexpected triggers.

What’s your take on this?

>
> > +
> > +void watchdog_domain_suspend(struct domain *d)
> > +{
> > +    unsigned int i;
> > +
> > +    spin_lock(&d->watchdog_lock);
> > +
> > +    for ( i = 0; i < NR_DOMAIN_WATCHDOG_TIMERS; i++ )
> > +    {
> > +        if ( test_bit(i, &d->watchdog_inuse_map) )
> > +        {
> > +            stop_timer(&d->watchdog_timer[i].timer);
> > +        }
> > +    }
> > +
> > +    spin_unlock(&d->watchdog_lock);
> > +}
> > +
> > +void watchdog_domain_resume(struct domain *d)
> > +{
> > +    unsigned int i;
> > +
> > +    spin_lock(&d->watchdog_lock);
> > +
> > +    for ( i = 0; i < NR_DOMAIN_WATCHDOG_TIMERS; i++ )
> > +    {
> > +        if ( test_bit(i, &d->watchdog_inuse_map) )
> > +        {
> > +            set_timer(&d->watchdog_timer[i].timer,
> > +                      NOW() + SECONDS(d->watchdog_timer[i].timeout));
> > +        }
> > +    }
> > +
> > +    spin_unlock(&d->watchdog_lock);
> > +}
> > +
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> >   /*
> >    * Pin a vcpu temporarily to a specific CPU (or restore old pinning
state if
> >    * cpu is NR_CPUS).
> > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > index d0d10612ce..caab4aad93 100644
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -1109,6 +1109,15 @@ void scheduler_disable(void);
> >   void watchdog_domain_init(struct domain *d);
> >   void watchdog_domain_destroy(struct domain *d);
> >
> > +#ifdef CONFIG_SYSTEM_SUSPEND
> > +/*
> > + * Suspend/resume watchdogs of domain (while the domain is suspended
its
> > + * watchdogs should be on pause)
> > + */
> > +void watchdog_domain_suspend(struct domain *d);
> > +void watchdog_domain_resume(struct domain *d);
> > +#endif /* CONFIG_SYSTEM_SUSPEND */
> > +
> >   /*
> >    * Use this check when the following are both true:
> >    *  - Using this feature or interface requires full access to the
hardware
>
> Cheers,
>
> --
> Julien Grall
>

Kind regards,
Mykola

Reply via email to