On Fri, Aug 22, 2025 at 05:27:20PM +0200, Jürgen Groß wrote: > On 22.08.25 16:42, Marek Marczykowski-Górecki wrote: > > On Fri, Aug 22, 2025 at 04:39:33PM +0200, Marek Marczykowski-Górecki wrote: > > > Hi, > > > > > > When suspending domU I get the following issue: > > > > > > Freezing user space processes > > > Freezing user space processes failed after 20.004 seconds (1 tasks > > > refusing to freeze, wq_busy=0): > > > task:xl state:D stack:0 pid:466 tgid:466 ppid:1 > > > task_flags:0x400040 flags:0x00004006 > > > Call Trace: > > > <TASK> > > > __schedule+0x2f3/0x780 > > > schedule+0x27/0x80 > > > schedule_preempt_disabled+0x15/0x30 > > > __mutex_lock.constprop.0+0x49f/0x880 > > > unregister_xenbus_watch+0x216/0x230 > > > xenbus_write_watch+0xb9/0x220 > > > xenbus_file_write+0x131/0x1b0 > > > vfs_writev+0x26c/0x3d0 > > > ? do_writev+0xeb/0x110 > > > do_writev+0xeb/0x110 > > > do_syscall_64+0x84/0x2c0 > > > ? do_syscall_64+0x200/0x2c0 > > > ? generic_handle_irq+0x3f/0x60 > > > ? syscall_exit_work+0x108/0x140 > > > ? do_syscall_64+0x200/0x2c0 > > > ? __irq_exit_rcu+0x4c/0xe0 > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > RIP: 0033:0x79b618138642 > > > RSP: 002b:00007fff9a192fc8 EFLAGS: 00000246 ORIG_RAX: > > > 0000000000000014 > > > RAX: ffffffffffffffda RBX: 00000000024fd490 RCX: 000079b618138642 > > > RDX: 0000000000000003 RSI: 00007fff9a193120 RDI: 0000000000000014 > > > RBP: 00007fff9a193000 R08: 0000000000000000 R09: 0000000000000000 > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014 > > > R13: 00007fff9a193120 R14: 0000000000000003 R15: 0000000000000000 > > > </TASK> > > > OOM killer enabled. > > > Restarting tasks: Starting > > > Restarting tasks: Done > > > xen:manage: do_suspend: freeze processes failed -16 > > > > > > The process in question is `xl devd` daemon. It's a domU serving a > > > xenvif backend. > > > > > > I noticed it on 6.16.1, but looking at earlier test logs I see it with > > > 6.16-rc6 already (but interestingly, not 6.16-rc2 yet? feels weird given > > > seemingly no relevant changes between rc2 and rc6). > > > > I forgot to include link for (a little) more details: > > https://github.com/QubesOS/qubes-linux-kernel/pull/1157 > > > > Especially, there is another call trace with panic_on_warn enabled - > > slightly different, but looks related. > > > > I'm pretty sure the PV variant for suspending is just wrong: it is calling > dpm_suspend_start() from do_suspend() without taking the required > system_transition_mutex, resulting in the WARN() in pm_restrict_gfp_mask(). > > It might be as easy as just adding the mutex() call to do_suspend(), but I'm > really not sure that will be a proper fix.
Hm, this might explain the second call trace, but not the freeze failure quoted here above, I think? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
signature.asc
Description: PGP signature