Pierrick Bouvier <pierrick.bouv...@linaro.org> writes:

> On 8/6/25 4:34 AM, Daniel P. Berrangé wrote:
>> On Wed, Aug 06, 2025 at 12:11:38PM +0100, Alex Bennée wrote:
>>> Daniel P. Berrangé <berra...@redhat.com> writes:
>>>
>>>> On Tue, Aug 05, 2025 at 07:57:38PM +0300, Manos Pitsidianakis wrote:
>>>>> On Tue, Aug 5, 2025 at 7:49 PM Daniel P. Berrangé <berra...@redhat.com> 
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Was there a specific place where you found things hard to debug
>>>>>> from the error message alone ?  I'm sure we have plenty of examples
>>>>>> of errors that can be improved, but wondering if there are some
>>>>>> general patterns we're doing badly that would be a good win
>>>>>> to improve ?
>>>>>
>>>>> Some months ago I was debugging a MemoryRegion use-after-free and used
>>>>> this code to figure out that the free was called from RCU context
>>>>> instead of the main thread.
>>>>
>
> I ran into something similar recently [1], and it was a pain to reproduce it. 
> Luckily, I caught it using rr and could debug it, but it would have been much 
> easier to just get a backtrace of the crash.
>
> In this case, it was a segmentation fault, which is not covered by current 
> patch. Which brings me the thought I share at the end of this email.
>
> [1] 
> https://lore.kernel.org/qemu-devel/173c1c78-1432-48a4-8251-65c65568c...@linaro.org/T/#
>
>>>> We give useful names to many (but not neccessarily all) threads that we
>>>> spawn. Perhaps we should call pthread_getname_np() to fetch the current
>>>> thread name, and used that as a prefix on the error message we print
>>>> out, as a bit of extra context ?
>>>
>>> Do we always have sensible names for threads or only if we enable the
>>> option?
>>
>> I was surprised to discover we don't name threads by default, only if we
>> add '-name debug-threads=yes'.  I'm struggling to understand why we would
>> ever want thread naming disabled, if an OS supports it ?
>> I'm inclined to deprecate 'debug-threads' and always set the names when
>> available.

On POSIX, thread naming uses pthread_setname_np(), which is a GNU
extension.  Can't see drawbacks; just use it when available.

On Windows, thread naming appears to use a dynamically loaded
SetThreadDescription().  Any drawbacks?  I'm a Windows ignoramus...

>>>> Obviously not as much info as a full stack trace, but that is something
>>>> we could likely enable unconditionally without any overheads to worry
>>>> about, so a likely incremental wni.
>>>
>>> The place where it comes in useful is when we get bug reports from users
>>> who have crashed QEMU in a embedded docker container and can't give us a
>>> reasonable reproducer. If we can encourage such users to enable this
>>> option (or maybe make it part of --enable-debug-info) then we could get
>>> a slightly more useful backtrace for those bugs.
>> The challenge is whether this build option would be enabled widely
>> enough to make a significant difference ?
>>
>
> For developers working on crashes/bug fix, it's definitely a good addition 
> (could come with --enable-debug for sure). It's something we could enable in 
> CI by default too. Usually, with sanitizers, the reported stacktrace is 
> enough to get a rough idea of what the problem is, without having to use any 
> debugger.
>
>> I don't think we could really enable this in any distro builds, as
>> this is way too noisy to have turned on unconditionally at build
>> time for all users. Most containers are going to be consuming
>> distro builds, with relatively few building custom QEMU themselves
>> IME.  We might have better luck if this was a runtime option to
>> the -msg arg.
>>
>
> Regarding the outside world and users, I share Daniel's opinion that it would 
> be too verbose if a backtrace is emitted with every fatal error message.

Yes, that's out of the question.  We can debate backtrace on internal
errors, such as hitting &error_abort, or more generally abort().  Need
to demonstrate it adds value to simply dumping core, which we get for
free.

> However, I think it could have *incredible* value if we reported this 
> backtrace when QEMU segfaults, which is always something exceptional.

This would be a best effort.  The program is already out of order, and
printing may or may not work.  Avoiding printf() and memory allocation
would improve the odds.

> In this case, we could always enable this.
> It's not covered by the current patch, maybe it could be a great addition?
>
> Regarding binary size increase due to -rdynamic, I already know some people 
> won't like it, so I'm not sure how we can ensure to have useful symbols in 
> distributed binaries, which is a harder debate than enabling backtraces on 
> segfaults or not.

1. Core dumps may take disk space!  Let's disable them.

2. My programs crash!  I need to know why.

3. I know!  Let's make all the program bigger!

SCNR ;)


Reply via email to