Kevin Wolf <[email protected]> writes:

> Am 10.03.2026 um 14:32 hat Markus Armbruster geschrieben:
>> This is now commit 544ddbb6373d61292a0e2dc269809cd6bd5edec6.  I'm not
>> objecting.  Just a few remarks.  I dropped qemu-stable@ from cc.
>> 
>> Kevin Wolf <[email protected]> writes:
>> 
>> > Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an
>> > event only once a second. This makes sense for cases in which the guest
>> > keeps running and can submit more requests that would possibly also fail
>> > because there is a problem with the backend.
>> >
>> > However, if the error policy is configured so that the VM is stopped on
>> > errors, this is both unnecessary because stopping the VM means that the
>> > guest can't issue more requests and in fact harmful because stopping the
>> > VM is an important state change that management tools need to keep track
>> > of even if it happens more than once in a given second. If an event is
>> > dropped, the management tool would see a VM randomly going to paused
>> > state without an associated error, so it has a hard time deciding how to
>> > handle the situation.
>> >
>> > This patch disables rate limiting for action=stop by not relying on the
>> > event type alone any more in monitor_qapi_event_queue_no_reenter(), but
>> > checking action for BLOCK_IO_ERROR, too. If the error is reported to the
>> > guest or ignored, the rate limiting stays in place.
>> >
>> > Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of 
>> > BLOCK_IO_ERROR reports')
>> > Signed-off-by: Kevin Wolf <[email protected]>
>> > ---
>> >  qapi/block-core.json |  2 +-
>> >  monitor/monitor.c    | 21 ++++++++++++++++++++-
>> >  2 files changed, 21 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/qapi/block-core.json b/qapi/block-core.json
>> > index b66bf316e2f..da0b36a3751 100644
>> > --- a/qapi/block-core.json
>> > +++ b/qapi/block-core.json
>> > @@ -5794,7 +5794,7 @@
>> >  # .. note:: If action is "stop", a `STOP` event will eventually follow
>> >  #    the `BLOCK_IO_ERROR` event.
>> >  #
>> > -# .. note:: This event is rate-limited.
>> > +# .. note:: This event is rate-limited, except if action is "stop".
>> >  #
>> >  # Since: 0.13
>> >  #
>> > diff --git a/monitor/monitor.c b/monitor/monitor.c
>> > index 1273eb72605..37fa674cfe6 100644
>> > --- a/monitor/monitor.c
>> > +++ b/monitor/monitor.c
>> > @@ -367,14 +367,33 @@ monitor_qapi_event_queue_no_reenter(QAPIEvent event, 
>> > QDict *qdict)
>> >  {
>> >      MonitorQAPIEventConf *evconf;
>> >      MonitorQAPIEventState *evstate;
>> > +    bool throttled;
>> >  
>> >      assert(event < QAPI_EVENT__MAX);
>> >      evconf = &monitor_qapi_event_conf[event];
>> >      trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
>> > +    throttled = evconf->rate;
>> > +
>> > +    /*
>> > +     * Rate limit BLOCK_IO_ERROR only for action != "stop".
>> > +     *
>> > +     * If the VM is stopped after an I/O error, this is important 
>> > information
>> > +     * for the management tool to keep track of the state of QEMU and we 
>> > can't
>> > +     * merge any events. At the same time, stopping the VM means that the 
>> > guest
>> > +     * can't send additional requests and the number of events is already
>> > +     * limited, so we can do without rate limiting.
>> > +     */
>> > +    if (event == QAPI_EVENT_BLOCK_IO_ERROR) {
>> > +        QDict *data = qobject_to(QDict, qdict_get(qdict, "data"));
>> > +        const char *action = qdict_get_str(data, "action");
>> > +        if (!strcmp(action, "stop")) {
>> > +            throttled = false;
>> > +        }
>> > +    }
>> 
>> Having event-specific logic in the general event emission function is
>> ugly.
>> 
>> Before the patch, the "throttle this event?" logic is coded in one
>> place: table monitor_qapi_event_conf[].
>> 
>> The table maps from event kind (enum QAPIEvent) to the minimum time
>> between two events.  Non-zero specifies a rate limit, zero makes it
>> unlimited.
>> 
>> Aside: as far as I can tell, we've only ever used one
>> MonitorQAPIEventConf: { .rate = 1000 * SCALE_MS }.  Could be dumbed down
>> to bool.
>> 
>> This is insufficient for QAPIEvent QAPI_EVENT_BLOCK_IO_ERROR, where the
>> desired rate depends on event data, not just the QAPIEvent.
>> 
>> The patch gives us the desired rate, but it splits the logic between the
>> map and the map's user.
>> 
>> I think the cleaner solution is to make the map more capable: have it
>> maps from the entire event, not just its kind.
>> 
>> An obvious way to do that would be a table of function pointers
>> 
>>     bool (*)(QAPIEvent, QDict *)
>> 
>> Null means unlimited.
>> 
>> The table entry for BLOCK_IO_ERROR returns false for unlimited when
>> "action" is "stop", else true.
>> 
>> The table entries for the other rate-limited events simply return true.
>> 
>> Thoughts?
>
> Like you, I often prefer data to code. However, if the data becomes just
> a table of function pointers to trivial functions, wouldn't it be better
> to just have it as code in the first place?
>
> It could be a helper function with the same signature as you proposed
> above and it would simply be a switch statement returning true for a few
> event types, looking at the QDict for BLOCK_IO_ERROR and returning false
> for default. Seems a lot simpler than an explicit table of function
> pointers.

You're right.


Reply via email to