On 22.08.25 16:42, Marek Marczykowski-Górecki wrote:
On Fri, Aug 22, 2025 at 04:39:33PM +0200, Marek Marczykowski-Górecki wrote:
Hi,

When suspending domU I get the following issue:

     Freezing user space processes
     Freezing user space processes failed after 20.004 seconds (1 tasks 
refusing to freeze, wq_busy=0):
     task:xl              state:D stack:0     pid:466   tgid:466   ppid:1      
task_flags:0x400040 flags:0x00004006
     Call Trace:
      <TASK>
      __schedule+0x2f3/0x780
      schedule+0x27/0x80
      schedule_preempt_disabled+0x15/0x30
      __mutex_lock.constprop.0+0x49f/0x880
      unregister_xenbus_watch+0x216/0x230
      xenbus_write_watch+0xb9/0x220
      xenbus_file_write+0x131/0x1b0
      vfs_writev+0x26c/0x3d0
      ? do_writev+0xeb/0x110
      do_writev+0xeb/0x110
      do_syscall_64+0x84/0x2c0
      ? do_syscall_64+0x200/0x2c0
      ? generic_handle_irq+0x3f/0x60
      ? syscall_exit_work+0x108/0x140
      ? do_syscall_64+0x200/0x2c0
      ? __irq_exit_rcu+0x4c/0xe0
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
     RIP: 0033:0x79b618138642
     RSP: 002b:00007fff9a192fc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
     RAX: ffffffffffffffda RBX: 00000000024fd490 RCX: 000079b618138642
     RDX: 0000000000000003 RSI: 00007fff9a193120 RDI: 0000000000000014
     RBP: 00007fff9a193000 R08: 0000000000000000 R09: 0000000000000000
     R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
     R13: 00007fff9a193120 R14: 0000000000000003 R15: 0000000000000000
      </TASK>
     OOM killer enabled.
     Restarting tasks: Starting
     Restarting tasks: Done
     xen:manage: do_suspend: freeze processes failed -16

The process in question is `xl devd` daemon. It's a domU serving a
xenvif backend.

I noticed it on 6.16.1, but looking at earlier test logs I see it with
6.16-rc6 already (but interestingly, not 6.16-rc2 yet? feels weird given
seemingly no relevant changes between rc2 and rc6).

I forgot to include link for (a little) more details:
https://github.com/QubesOS/qubes-linux-kernel/pull/1157

Especially, there is another call trace with panic_on_warn enabled -
slightly different, but looks related.


I'm pretty sure the PV variant for suspending is just wrong: it is calling
dpm_suspend_start() from do_suspend() without taking the required
system_transition_mutex, resulting in the WARN() in pm_restrict_gfp_mask().

It might be as easy as just adding the mutex() call to do_suspend(), but I'm
really not sure that will be a proper fix.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to