.thaw to handle xenbus devices

Jason Andryuk Wed, 03 Dec 2025 14:34:30 -0800

On 2025-12-01 17:16, Marek Marczykowski-Górecki wrote:

On Mon, Dec 01, 2025 at 01:20:40PM -0500, Jason Andryuk wrote:

On 2025-11-29 21:03, Marek Marczykowski-Górecki wrote:

On Wed, Nov 19, 2025 at 05:47:29PM -0500, Jason Andryuk wrote:

The goal is to fix s2idle and S3 for Xen PV devices.


Can you give a little more context of this? We do have working S3 in
qubes with no need for such change. We trigger it via the toolstack 
(libxl_domain_suspend_only()).
Are you talking about guest-initiated suspend here?


This is intended to help domU s2idle/S3 and resume.  I guess that is what
you mean by guest-initiated?  The domU can use 'echo mem > /sys/power/state'
to enter s2idle/S3.  We also have the domU react to the ACPI sleep button
from `xl trigger $dom sleep`.


Ok, so this is indeed a different path than we use in Qubes OS.

AIUI, libxl_domain_suspend_only() triggers xenstore writes which Linux
drivers/xen/manage.c:do_suspend() acts on.  `xl save/suspend/migrate` all
use this path.

The terminology gets confusing.  Xen uses "suspend" for
save/suspend/migrate, but the Linux power management codes uses
freeze/thaw/restore.  AIUI, Linux's PMSG_SUSPEND/.suspend is for runtime
power management.


Indeed it gets confusing...

When you call libxl_domain_suspend_only()/libxl_domain_resume(), you pass
suspend_cancel==1.
  *  1. (fast=1) Resume the guest without resetting the domain
        environment.
  *     The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will
        return 1.

That ends up in Linux do_suspend() as si.cancelled = 1, which calls
PMSG_THAW -> .thaw -> xenbus_dev_cancel() which is a no-op.  So it does not
change the PV devices.

We needed guest user space to perform actions before entering s2idle.
libxl_domain_suspend_only() triggers the Linux kernel path which does not
notify user space.  The ACPI power buttons let user space perform actions
(lock and blank the screen) before entering the idle state.


I see. In our case, we have our own userspace hook that gets called
before (if relevant - in most cases it isn't).

We also have kinda working (host) s2idle. You may want to take a look at this
work (some/most of it was posted upstream, but not all got
committed/reviewed):
https://github.com/QubesOS/qubes-issues/issues/6411#issuecomment-1538089344
https://github.com/QubesOS/qubes-linux-kernel/pull/910 (some patches
changed since that PR, see the current main too).


This would not affect host s2idle - it changes PV frontend devices.

Do you libxl_domain_suspend_only() all domUs and then put dom0 into s0ix?


Yes, exactly.

A domain resuming
from s3 or s2idle disconnects its PV devices during resume.  The
backends are not expecting this and do not reconnect.

b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
resume/chkpt") changed xen_suspend()/do_suspend() from
PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
suspend/resume callbacks remained.

.freeze/restore are used with hiberation where Linux restarts in a new
place in the future.  .suspend/resume are useful for runtime power
management for the duration of a boot.

The current behavior of the callbacks works for an xl save/restore or
live migration where the domain is restored/migrated to a new location
and connecting to a not-already-connected backend.

Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
.suspend/resume hook.  This matches the use in drivers/xen/manage.c for
save/restore and live migration.  With .suspend/resume empty, PV devices
are left connected during s2idle and s3, so PV devices are not changed
and work after resume.


Is that intended? While it might work for suspend by a chance(*), I'm
pretty sure not disconnecting + re-reconnecting PV devices across
save/restore/live migration will break them.


save/restore/live migration keep using .freeze/thaw/restore, which
disconnects and reconnects today.  Nothing changes there as
xen_suspend()/do_suspend() call the power management code with
PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE.

This patches makes .suspend/resume no-ops for PMSG_SUSPEND/PMSG_RESUME. When
a domU goes into s2idle/S3, the backend state remains connected. With this
patch, when the domU wakes up, the frontends do nothing and remain
connected.


This explanation makes sense.

(*) and even that I'm not sure - with driver domains, depending on
suspend order this feels like might result in a deadlock...


I'm not sure.  I don't think this patch changes anything with respect to
them.

Thanks for testing.

Maybe the commit messages should change to highlight this is for domU PV
devices?  struct xen_bus_type xenbus_backend does not define dev_pm_ops.


Good idea.

Regards,
Jason

Signed-off-by: Jason Andryuk <[email protected]>
---
   drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
   1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c 
b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 6d1819269cbe..199917b6f77c 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device 
*_dev)
   }
   static const struct dev_pm_ops xenbus_pm_ops = {
-       .suspend        = xenbus_dev_suspend,
-       .resume         = xenbus_frontend_dev_resume,
        .freeze         = xenbus_dev_suspend,
        .thaw           = xenbus_dev_cancel,
-       .restore        = xenbus_dev_resume,
+       .restore        = xenbus_frontend_dev_resume,

I was double checking before sending a v2, and I have questions aboutthis. I purposely switched the .restore callback sincexenbus_frontend_dev_resume() handles the extra case. It was added in:


commit 2abb274629614bef4044a0b98ada42e977feadfd
Author: Aurelien Chartier <[email protected]>
Date:   Tue May 28 18:09:56 2013 +0100

    xenbus: delay xenbus frontend resume if xenstored is not running

If the xenbus frontend is located in a domain running xenstored,the deviceresume is hanging because it is happening before the processresume. This

    patch adds extra logic to the resume code to check if we are the domain
    running xenstored and delay the resume if needed.

It was after b3e96c0c7562, so .freeze/thaw/restore were already presentfor domU xen_suspend() handling. So the .resume handler should havebeen used.

This is for the "domain running xenstored", so dom0. So maybe this wascalled for a dom0 S3 suspend/resume? But as stated above this is patchchanges PV frontends. Maybe the change was for Xenclient/OpenXT -OpenXT has netfront in dom0 connected to the network driver domain. Butwithout netback changes, I don't think that would work today? As it is,S3 in OpenXT has been disabled for years as broken.


Ok, yes, dom0 S3:
https://lore.kernel.org/xen-devel/[email protected]/

> This patch series fixes the S3 resume of dom0 or a Xenstore stub
> domain running a frontend over xenbus (xen-netfront in my use case).
>
> As device resume is happening before process resume, the xenbus
> frontend resume is hanging if xenstored is not running, thus causing
> a deadlock. This patch series is fixing that issue by bypassing the
> xenbus frontend resume when we are running in dom0 or a Xenstore stub
> domain.

I don't think setting .restore = xenbus_frontend_dev_resume will breakanything for a domU. It handles a case which doesn't trigger for adomU. Removing .resume, a frontend in dom0 will not be touched, so thatat least cannot hang.


Does this all sound okay?  Does anyone think I am missing anything

Regards,
Jason

Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices

Reply via email to