On 12/2/25 9:30 PM, Stefan Hajnoczi wrote:
> On Tue, Dec 02, 2025 at 06:31:19PM +0200, Andrey Drobyshev wrote:
>> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
>> coredumps") introduced coroutine traces in coredumps using raw stack
>> unwinding. While this works, this approach does not allow to view the
>> function arguments in the corresponding stack frames.
>>
>> As an alternative, we can obtain saved registers from the coroutine's
>> jmpbuf, patch them into the coredump's struct elf_prstatus in place, and
>> execute another gdb subprocess to get backtrace from the patched temporary
>> coredump.
>>
>> While providing more detailed info, this alternative approach, however, is
>> more invasive as it might potentially corrupt the coredump file. We do take
>> precautions by saving the original registers values into a separate binary
>> blob /path/to/coredump.ptregs, so that it can be restores in the next
>> GDB session. Still, instead of making it a new deault, let's keep raw unwind
>> the default behaviour, but add the '--detailed' option for 'qemu bt' and
>> 'qemu coroutine' command which would enforce the new behaviour.
>>
>> That's how this looks:
>>
>> (gdb) qemu coroutine 0x7fda9335a508
>> #0 0x5602bdb41c26 in qemu_coroutine_switch<+214> () at
>> ../util/coroutine-ucontext.c:321
>> #1 0x5602bdb3e8fe in qemu_aio_coroutine_enter<+493> () at
>> ../util/qemu-coroutine.c:293
>> #2 0x5602bdb3c4eb in co_schedule_bh_cb<+538> () at ../util/async.c:547
>> #3 0x5602bdb3b518 in aio_bh_call<+119> () at ../util/async.c:172
>> #4 0x5602bdb3b79a in aio_bh_poll<+457> () at ../util/async.c:219
>> #5 0x5602bdb10f22 in aio_poll<+1201> () at ../util/aio-posix.c:719
>> #6 0x5602bd8fb1ac in iothread_run<+123> () at ../iothread.c:63
>> #7 0x5602bdb18a24 in qemu_thread_start<+355> () at
>> ../util/qemu-thread-posix.c:393
>>
>> (gdb) qemu coroutine 0x7fda9335a508 --detailed
>> patching core file /tmp/tmpq4hmk2qc
>> found "CORE" at 0x10c48
>> assume pt_regs at 0x10cbc
>> write r15 at 0x10cbc
>> write r14 at 0x10cc4
>> write r13 at 0x10ccc
>> write r12 at 0x10cd4
>> write rbp at 0x10cdc
>> write rbx at 0x10ce4
>> write rip at 0x10d3c
>> write rsp at 0x10d54
>>
>> #0 0x00005602bdb41c26 in qemu_coroutine_switch (from_=0x7fda9335a508,
>> to_=0x7fda8400c280, action=COROUTINE_ENTER) at
>> ../util/coroutine-ucontext.c:321
>> #1 0x00005602bdb3e8fe in qemu_aio_coroutine_enter (ctx=0x5602bf7147c0,
>> co=0x7fda8400c280) at ../util/qemu-coroutine.c:293
>> #2 0x00005602bdb3c4eb in co_schedule_bh_cb (opaque=0x5602bf7147c0) at
>> ../util/async.c:547
>> #3 0x00005602bdb3b518 in aio_bh_call (bh=0x5602bf714a40) at
>> ../util/async.c:172
>> #4 0x00005602bdb3b79a in aio_bh_poll (ctx=0x5602bf7147c0) at
>> ../util/async.c:219
>> #5 0x00005602bdb10f22 in aio_poll (ctx=0x5602bf7147c0, blocking=true) at
>> ../util/aio-posix.c:719
>> #6 0x00005602bd8fb1ac in iothread_run (opaque=0x5602bf42b100) at
>> ../iothread.c:63
>> #7 0x00005602bdb18a24 in qemu_thread_start (args=0x5602bf7164a0) at
>> ../util/qemu-thread-posix.c:393
>> #8 0x00007fda9e89f7f2 in start_thread (arg=<optimized out>) at
>> pthread_create.c:443
>> #9 0x00007fda9e83f450 in clone3 () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>>
>> CC: Vladimir Sementsov-Ogievskiy <[email protected]>
>> CC: Peter Xu <[email protected]>
>> Originally-by: Vladimir Sementsov-Ogievskiy <[email protected]>
>> Signed-off-by: Andrey Drobyshev <[email protected]>
>> ---
>> scripts/qemugdb/coroutine.py | 243 +++++++++++++++++++++++++++++++++--
>> 1 file changed, 233 insertions(+), 10 deletions(-)
>>
>> diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
>> index e98fc48a4b..280c02c12d 100644
>> --- a/scripts/qemugdb/coroutine.py
>> +++ b/scripts/qemugdb/coroutine.py
>> @@ -10,9 +10,116 @@
>> # or later. See the COPYING file in the top-level directory.
>>
>> import gdb
>> +import os
>> +import pty
>> +import re
>> +import struct
>> +import textwrap
>> +
>> +from collections import OrderedDict
>> +from copy import deepcopy
>>
>> VOID_PTR = gdb.lookup_type('void').pointer()
>>
>> +# Registers in the same order they're present in ELF coredump file.
>> +# See asm/ptrace.h
>> +PT_REGS = ['r15', 'r14', 'r13', 'r12', 'rbp', 'rbx', 'r11', 'r10', 'r9',
>> + 'r8', 'rax', 'rcx', 'rdx', 'rsi', 'rdi', 'orig_rax', 'rip', 'cs',
>> + 'eflags', 'rsp', 'ss']
>> +
>> +coredump = None
>> +
>> +
>> +class Coredump:
>> + _ptregs_suff = '.ptregs'
>> +
>> + def __init__(self, coredump, executable):
>> + gdb.events.exited.connect(self._cleanup)
>
> It's not clear to me that this cleanup mechanism is reliable:
>
> - The restore_regs() method is called from invoke(), but not in a
> `finally` block that would guarantee it runs even when an exception is
> thrown. Maybe _cleanup() can be called without a prior restore_regs()
> call. It would be inconvenient to lose the original register values.
>
Agreed. We might as well put restore_regs() call into a `finally` block
to make sure it's called in any case, like that:
> try:
> while True:
> co = co_cast(co_ptr)
> co_ptr = co["base"]["caller"]
> if co_ptr == 0:
> break
> gdb.write("\nCoroutine at " + str(co_ptr) + ":\n")
> bt_jmpbuf(coroutine_to_jmpbuf(co_ptr), detailed=detailed)
>
> finally:
> coredump.restore_regs()
And also we should probably call restore_regs() during the cleanup if
the dirty flag is set.
> - I'm not sure if gdb.events.exited (when GDB's inferior terminates) is
> the correct event to ensure cleanup. The worst case is that the
> temporary file is leaked, which is not a serious problem.
>
Hmm indeed, this callback isn't called upon signals. I guess we can
just call atexit.register(self._cleanup). This seems to handle both
normal and abnormal exit (except SIGKILL of course).
> But then this is a debugging script and it's probably fine:
>
> Reviewed-by: Stefan Hajnoczi <[email protected]>