On 6/17/25 3:03 PM, Daniel Borkmann wrote:
> On 6/17/25 1:59 PM, Ilya Maximets wrote:
>> On 6/4/25 1:29 PM, Daniel Borkmann wrote:
>>> While testing, it turned out that upon error in the queue creation loop,
>>> we never trigger the af_xdp_cleanup() handler. This is because we pass
>>> errp instead of a local err pointer into the various AF_XDP setup functions
>>> instead of a scheme like:
>>>
>>>      bool fn(..., Error **errp)
>>>      {
>>>          Error *err = NULL;
>>>
>>>          foo(arg, &err);
>>>          if (err) {
>>>              handle the error...
>>>              error_propagate(errp, err);
>>>              return false;
>>>          }
>>>          ...
>>>      }
>>>
>>> With a conversion into the above format, the af_xdp_cleanup() handler is
>>> called as expected.
>>
>> How exactly this prevents calling the cleanup function?  I don't see the
>> errp being checked anywhere in the qemu_del_net_client() path.
>>
>> Could you provide a more detailed call sequence description where the cleanup
>> is not called?
>>
>> I agree thought that the local err variable is actually unused.  We should
>> be able to just remove it and remove the error_propagate() call as well.
> 
> Ok, I basically manually injected an error in 
> af_xdp_{umem_create,socket_create,
> update_xsk_map} and noticed that in fact none of the af_xdp_cleanup() callback
> was called and qemu was exiting right away.
> 
>  From reading up on the qemu error handling patterns that should be used via
> include/qapi/error.h I noticed that this was due to passing in errp directly
> rather than a local error variable as done in many other places in qemu code.

Hmm, you're right.  I can reproduce this issue.

I think, this fix should be a first patch of the set and it should have
a Fixes tag on it, so it can be backported, if necessary.

> 
>>> Also, making sure the XDP program will be removed does
>>> require to set s->n_queues to i + 1 since the test is nc->queue_index ==
>>> s->n_queues - 1, where nc->queue_index was set to i earlier.
>>
>> The idea behind 'i' instead of 'i + 1' was that if either 
>> af_xdp_umem_create()
>> or af_xdp_socket_create() fails, we do not have xdp_flags initialized on the
>> last queue.  And without it we can't remove the program, so we remove it 
>> while
>> destroying the last actually configured queue.  And this is OK, because the
>> failed queue was not added to the program, and if the af_xdp_socket_create()
>> fails for the very first queue, then we don't have a program loaded at all.
>>
>> With the new changes in this patch set, we have an extra function that can 
>> fail,
>> which is a new af_xdp_update_xsk_map(), and only if this one fails, we need 
>> to
>> remove the program while cleaning up the current failed queue, since it was
>> already created and xdp_flags are available.
>>
>> If we get this patch as-is and the af_xdp_socket_create() fails, we will not
>> remove the program, AFAICT.
> 
> I'll double check this concern and see if it can be solved (iirc we do test 
> for
> s->xdp_flags in the cleanup callback)..

Yes, we check 'nc->queue_index == s->n_queues - 1 && s->xdp_flags'.  And if the
last queue doesn't have xdp_flags (because it wasn't fully created), then the
program will not be detached.

But it seems that code is broken anyway even on the current main, because we're
setting n_queues into the last queue that never has xdp_flags on failure.

To fix that we need to do something like:

  uint32_t xdp_flags = 0;

  ...
  for each queue {
    if (failed) {
      s->n_queues = i + 1;
      s->xdp_flags = xdp_flags;
    }
    xdp_flags = s->xdp_flags;
  }

This way we'll have xdp_flags set for the last queue and have a proper queue 
number
and will be able to call bpf_xdp_detach().

One thing I do not understand is why the program is actually getting removed 
even
if we do not call bpf_xdp_detach()...  We're not calling this function pretty 
much
at all, and yet, when qemu fails, there is no program left behind...  It looks 
like
the program gets automatically detached when we call xsk_socket__delete() for 
the
last successfully configured queue.  Which is strange, I don't remember it doing
this before.

> in case of xsk map, it should not detach
> anything from an XDP program PoV (given inhibit) but rather it should remove
> prior installed xsk sockets from the xsk map to not leave them around.
> 
> Best,
> Daniel


Reply via email to