Re: [PATCH RFC] util/error.c: Print backtrace on error

Pierrick Bouvier Wed, 06 Aug 2025 13:27:54 -0700

On 8/6/25 4:34 AM, Daniel P. Berrangé wrote:

On Wed, Aug 06, 2025 at 12:11:38PM +0100, Alex Bennée wrote:

Daniel P. Berrangé <[email protected]> writes:

On Tue, Aug 05, 2025 at 07:57:38PM +0300, Manos Pitsidianakis wrote:

On Tue, Aug 5, 2025 at 7:49 PM Daniel P. Berrangé <[email protected]> wrote:



Was there a specific place where you found things hard to debug
from the error message alone ?  I'm sure we have plenty of examples
of errors that can be improved, but wondering if there are some
general patterns we're doing badly that would be a good win
to improve ?


Some months ago I was debugging a MemoryRegion use-after-free and used
this code to figure out that the free was called from RCU context
instead of the main thread.

I ran into something similar recently [1], and it was a pain toreproduce it. Luckily, I caught it using rr and could debug it, but itwould have been much easier to just get a backtrace of the crash.

In this case, it was a segmentation fault, which is not covered bycurrent patch. Which brings me the thought I share at the end of this email.

[1]https://lore.kernel.org/qemu-devel/[email protected]/T/#

We give useful names to many (but not neccessarily all) threads that we
spawn. Perhaps we should call pthread_getname_np() to fetch the current
thread name, and used that as a prefix on the error message we print
out, as a bit of extra context ?


Do we always have sensible names for threads or only if we enable the
option?


I was surprised to discover we don't name threads by default, only if we
add '-name debug-threads=yes'.  I'm struggling to understand why we would
ever want thread naming disabled, if an OS supports it ?

I'm inclined to deprecate 'debug-threads' and always set the names when
available.

Obviously not as much info as a full stack trace, but that is something
we could likely enable unconditionally without any overheads to worry
about, so a likely incremental wni.


The place where it comes in useful is when we get bug reports from users
who have crashed QEMU in a embedded docker container and can't give us a
reasonable reproducer. If we can encourage such users to enable this
option (or maybe make it part of --enable-debug-info) then we could get
a slightly more useful backtrace for those bugs.


The challenge is whether this build option would be enabled widely
enough to make a significant difference ?

For developers working on crashes/bug fix, it's definitely a goodaddition (could come with --enable-debug for sure). It's something wecould enable in CI by default too. Usually, with sanitizers, thereported stacktrace is enough to get a rough idea of what the problemis, without having to use any debugger.

I don't think we could really enable this in any distro builds, as
this is way too noisy to have turned on unconditionally at build
time for all users. Most containers are going to be consuming
distro builds, with relatively few building custom QEMU themselves
IME.  We might have better luck if this was a runtime option to
the -msg arg.

Regarding the outside world and users, I share Daniel's opinion that itwould be too verbose if a backtrace is emitted with every fatal errormessage.

However, I think it could have *incredible* value if we reported thisbacktrace when QEMU segfaults, which is always something exceptional.

In this case, we could always enable this.
It's not covered by the current patch, maybe it could be a great addition?

Regarding binary size increase due to -rdynamic, I already know somepeople won't like it, so I'm not sure how we can ensure to have usefulsymbols in distributed binaries, which is a harder debate than enablingbacktraces on segfaults or not.

With regards,
Daniel

Re: [PATCH RFC] util/error.c: Print backtrace on error

Reply via email to