Pierrick Bouvier <pierrick.bouv...@linaro.org> writes: > On 8/6/25 4:34 AM, Daniel P. Berrangé wrote: >> On Wed, Aug 06, 2025 at 12:11:38PM +0100, Alex Bennée wrote: >>> Daniel P. Berrangé <berra...@redhat.com> writes: >>> >>>> On Tue, Aug 05, 2025 at 07:57:38PM +0300, Manos Pitsidianakis wrote: >>>>> On Tue, Aug 5, 2025 at 7:49 PM Daniel P. Berrangé <berra...@redhat.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> Was there a specific place where you found things hard to debug >>>>>> from the error message alone ? I'm sure we have plenty of examples >>>>>> of errors that can be improved, but wondering if there are some >>>>>> general patterns we're doing badly that would be a good win >>>>>> to improve ? >>>>> >>>>> Some months ago I was debugging a MemoryRegion use-after-free and used >>>>> this code to figure out that the free was called from RCU context >>>>> instead of the main thread. >>>> > > I ran into something similar recently [1], and it was a pain to reproduce it. > Luckily, I caught it using rr and could debug it, but it would have been much > easier to just get a backtrace of the crash. > > In this case, it was a segmentation fault, which is not covered by current > patch. Which brings me the thought I share at the end of this email. > > [1] > https://lore.kernel.org/qemu-devel/173c1c78-1432-48a4-8251-65c65568c...@linaro.org/T/# > >>>> We give useful names to many (but not neccessarily all) threads that we >>>> spawn. Perhaps we should call pthread_getname_np() to fetch the current >>>> thread name, and used that as a prefix on the error message we print >>>> out, as a bit of extra context ? >>> >>> Do we always have sensible names for threads or only if we enable the >>> option? >> >> I was surprised to discover we don't name threads by default, only if we >> add '-name debug-threads=yes'. I'm struggling to understand why we would >> ever want thread naming disabled, if an OS supports it ? >> I'm inclined to deprecate 'debug-threads' and always set the names when >> available.
On POSIX, thread naming uses pthread_setname_np(), which is a GNU extension. Can't see drawbacks; just use it when available. On Windows, thread naming appears to use a dynamically loaded SetThreadDescription(). Any drawbacks? I'm a Windows ignoramus... >>>> Obviously not as much info as a full stack trace, but that is something >>>> we could likely enable unconditionally without any overheads to worry >>>> about, so a likely incremental wni. >>> >>> The place where it comes in useful is when we get bug reports from users >>> who have crashed QEMU in a embedded docker container and can't give us a >>> reasonable reproducer. If we can encourage such users to enable this >>> option (or maybe make it part of --enable-debug-info) then we could get >>> a slightly more useful backtrace for those bugs. >> The challenge is whether this build option would be enabled widely >> enough to make a significant difference ? >> > > For developers working on crashes/bug fix, it's definitely a good addition > (could come with --enable-debug for sure). It's something we could enable in > CI by default too. Usually, with sanitizers, the reported stacktrace is > enough to get a rough idea of what the problem is, without having to use any > debugger. > >> I don't think we could really enable this in any distro builds, as >> this is way too noisy to have turned on unconditionally at build >> time for all users. Most containers are going to be consuming >> distro builds, with relatively few building custom QEMU themselves >> IME. We might have better luck if this was a runtime option to >> the -msg arg. >> > > Regarding the outside world and users, I share Daniel's opinion that it would > be too verbose if a backtrace is emitted with every fatal error message. Yes, that's out of the question. We can debate backtrace on internal errors, such as hitting &error_abort, or more generally abort(). Need to demonstrate it adds value to simply dumping core, which we get for free. > However, I think it could have *incredible* value if we reported this > backtrace when QEMU segfaults, which is always something exceptional. This would be a best effort. The program is already out of order, and printing may or may not work. Avoiding printf() and memory allocation would improve the odds. > In this case, we could always enable this. > It's not covered by the current patch, maybe it could be a great addition? > > Regarding binary size increase due to -rdynamic, I already know some people > won't like it, so I'm not sure how we can ensure to have useful symbols in > distributed binaries, which is a harder debate than enabling backtraces on > segfaults or not. 1. Core dumps may take disk space! Let's disable them. 2. My programs crash! I need to know why. 3. I know! Let's make all the program bigger! SCNR ;)