On Mon, 22 Sept 2025 at 16:23, Markus Armbruster <[email protected]> wrote:
>
> Alessandro Ratti <[email protected]> writes:
>
> > On Mon, 22 Sept 2025 at 14:29, Markus Armbruster <[email protected]> wrote:
> >>
> >> Alex Bennée <[email protected]> writes:
> >>
> >> > Daniel P. Berrangé <[email protected]> writes:
> >> >
> >> >> On Mon, Sep 22, 2025 at 12:37:57PM +0200, Markus Armbruster wrote:
> >> >>> Alessandro Ratti <[email protected]> writes:
> >> >>>
> >> >>> > Replace virtio_error() with a macro that automatically prepends the
> >> >>> > calling function name to error messages. This provides better context
> >> >>> > for debugging virtio issues by showing exactly which function
> >> >>> > encountered the error.
> >> >>> >
> >> >>> > Before: "Invalid queue size: 1024"
> >> >>> > After:  "virtio_queue_set_num: Invalid queue size: 1024"
> >> >>> >
> >> >>> > The implementation uses a macro to insert __func__ at compile time,
> >> >>> > avoiding any runtime overhead while providing more specific error
> >> >>> > context than a generic "virtio:" prefix.
> >> >>>
> >> >>> A need for function names and such in error messages suggests the error
> >> >>> messages are crap.
> >> >>
> >> >> I pretty much agree. If we take that view forwards, then I think our
> >> >> coding guidelines should explicitly state something like
> >> >>
> >> >>  "Function names must never be included in error messages.
> >> >>
> >> >>   The messages need to be sufficiently descriptive in their
> >> >>   text, such that including function names is redundant"
> >>
> >> I'm in favor.
> >>
> >> > Ahh I missed the fact this ends up as an error_report. I think having
> >> > function names in debug output is fine.
> >>
> >> No argument!
> >>
> >> > It does however miss important information like which VirtIO device is
> >> > actually failing, despite having vdev passed down to the function.
> >>
> >> Yes, which device failed should definitely be reported.
> >>
> >> [...]
> >
> > Hi Markus, Alex, Daniel,
> >
> > Thanks again for the thoughtful feedback and for helping me see the bigger
> > picture. I now fully agree that adding function names to error messages (via
> > __func__) doesn't really address the core issue, and I appreciate the
> > push to rethink how error reporting can better serve both users and 
> > developers.
> >
> > I've taken a first stab at improving one of the messages in
> > virtio_init_region_cache(), following your suggestions.
> >
> > Here's the updated call:
> >
> > ---8<--- Example diff --8<---
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -240,6 +240,7 @@ void virtio_init_region_cache(VirtIODevice *vdev, int n)
> >      VirtQueue *vq = &vdev->vq[n];
> >      VRingMemoryRegionCaches *old = vq->vring.caches;
> >      VRingMemoryRegionCaches *new = NULL;
> > +    DeviceState *dev = DEVICE(vdev);
> >      hwaddr addr, size;
> >      int64_t len;
> >      bool packed;
> > @@ -264,7 +265,10 @@ void virtio_init_region_cache(VirtIODevice *vdev, int 
> > n)
> >      len = address_space_cache_init(&new->used, vdev->dma_as,
> >                                     vq->vring.used, size, true);
> >      if (len < size) {
> > -        virtio_error(vdev, "Cannot map used");
> > +        virtio_error(vdev,
> > +                    "Failed to map used ring for device %s - "
> > +                    "possible guest misconfiguration or insufficient 
> > memory",
> > +                    qdev_get_dev_path(dev));
> >          goto err_used;
> >      }
> >
> > With this change, the error output now reads:
> >
> >     qemu-system-x86_64: Failed to map used ring for device
> > 0000:00:04.0 - possible guest misconfiguration or insufficient memory
> >
> > This feels like a clear improvement — it gives context (what failed),
> > identifies the device, and hints at likely causes.
>
> It's *much* better!
>
> Developers will appreciate "Failed to map used ring for device".  By
> itself it would still be gobbledygook for users, but together with the
> "possible guest misconfiguration or insufficient memory" clue it's fine.
>
> Perhaps we can still improve on "device 0000:00:04.0".  The device's ID
> is a good way to identify it to the user, because it's chosen by the
> user, and unique (among devices).  Sadly, devices without ID exist.  We
> fall back to canonical QOM path in places.  Have a look at
> qdev_get_human_name() to see whether it works here.

I experimented with qdev_get_human_name(), but it usually returns paths like:

  /machine/peripheral-anon/device[0]/virtio-backend

…which seems less user-friendly than the PCI address provided by
qdev_get_dev_path().
For now, I'm sticking to using the device ID when set (e.g. via -device…,id=foo)
and falling back to qdev_get_dev_path() otherwise — which provides predictable
output for both PCI and non-PCI devices.

> Should we spell out that the device is now broken (whatever vdev->broken
> means)?

That's a good point — I'll leave that for a possible follow-up so we can keep
this patch focused on improving error clarity and device identification first.

I'll submit a new patch shortly based on this discussion, with the
debug scaffolding removed. I'll include a link to this thread in the cover
letter for continuity.

> > If this is more in line with what you'd expect, I'd be happy to submit a new
> > patch focused solely on improving a few key virtio error messages in this
> > direction, starting with the worst offenders in virtio_init_region_cache().
> >
> > Thanks again for your time and guidance — I'm learning a lot from this 
> > process.
>
> You're welcome!  And thank you for accepting my little rant as
> constructive criticism :)

:)
Thanks again for your time and guidance — it's really helping me learn the ropes
here.

Best regards,
Alessandro

Reply via email to