On 6/17/25 11:41 PM, Ilya Maximets wrote: > On 6/17/25 3:03 PM, Daniel Borkmann wrote: >> On 6/17/25 1:59 PM, Ilya Maximets wrote: >>> On 6/4/25 1:29 PM, Daniel Borkmann wrote: >>>> While testing, it turned out that upon error in the queue creation loop, >>>> we never trigger the af_xdp_cleanup() handler. This is because we pass >>>> errp instead of a local err pointer into the various AF_XDP setup functions >>>> instead of a scheme like: >>>> >>>> bool fn(..., Error **errp) >>>> { >>>> Error *err = NULL; >>>> >>>> foo(arg, &err); >>>> if (err) { >>>> handle the error... >>>> error_propagate(errp, err); >>>> return false; >>>> } >>>> ... >>>> } >>>> >>>> With a conversion into the above format, the af_xdp_cleanup() handler is >>>> called as expected. >>> >>> How exactly this prevents calling the cleanup function? I don't see the >>> errp being checked anywhere in the qemu_del_net_client() path. >>> >>> Could you provide a more detailed call sequence description where the >>> cleanup >>> is not called? >>> >>> I agree thought that the local err variable is actually unused. We should >>> be able to just remove it and remove the error_propagate() call as well. >> >> Ok, I basically manually injected an error in >> af_xdp_{umem_create,socket_create, >> update_xsk_map} and noticed that in fact none of the af_xdp_cleanup() >> callback >> was called and qemu was exiting right away. >> >> From reading up on the qemu error handling patterns that should be used via >> include/qapi/error.h I noticed that this was due to passing in errp directly >> rather than a local error variable as done in many other places in qemu code. > > Hmm, you're right. I can reproduce this issue. > > I think, this fix should be a first patch of the set and it should have > a Fixes tag on it, so it can be backported, if necessary. > >> >>>> Also, making sure the XDP program will be removed does >>>> require to set s->n_queues to i + 1 since the test is nc->queue_index == >>>> s->n_queues - 1, where nc->queue_index was set to i earlier. >>> >>> The idea behind 'i' instead of 'i + 1' was that if either >>> af_xdp_umem_create() >>> or af_xdp_socket_create() fails, we do not have xdp_flags initialized on the >>> last queue. And without it we can't remove the program, so we remove it >>> while >>> destroying the last actually configured queue. And this is OK, because the >>> failed queue was not added to the program, and if the af_xdp_socket_create() >>> fails for the very first queue, then we don't have a program loaded at all. >>> >>> With the new changes in this patch set, we have an extra function that can >>> fail, >>> which is a new af_xdp_update_xsk_map(), and only if this one fails, we need >>> to >>> remove the program while cleaning up the current failed queue, since it was >>> already created and xdp_flags are available. >>> >>> If we get this patch as-is and the af_xdp_socket_create() fails, we will not >>> remove the program, AFAICT. >> >> I'll double check this concern and see if it can be solved (iirc we do test >> for >> s->xdp_flags in the cleanup callback).. > > Yes, we check 'nc->queue_index == s->n_queues - 1 && s->xdp_flags'. And if > the > last queue doesn't have xdp_flags (because it wasn't fully created), then the > program will not be detached. > > But it seems that code is broken anyway even on the current main, because > we're > setting n_queues into the last queue that never has xdp_flags on failure. > > To fix that we need to do something like: > > uint32_t xdp_flags = 0; > > ... > for each queue { > if (failed) { > s->n_queues = i + 1; > s->xdp_flags = xdp_flags; > } > xdp_flags = s->xdp_flags; > } > > This way we'll have xdp_flags set for the last queue and have a proper queue > number > and will be able to call bpf_xdp_detach(). > > One thing I do not understand is why the program is actually getting removed > even > if we do not call bpf_xdp_detach()... We're not calling this function pretty > much > at all, and yet, when qemu fails, there is no program left behind... It > looks like > the program gets automatically detached when we call xsk_socket__delete() for > the > last successfully configured queue. Which is strange, I don't remember it > doing > this before.
OK, I figured this one out. This is a "new" behavior in libxdp 1.3.0: https://github.com/xdp-project/xdp-tools/commit/38c2914988fd5c1ef65f2381fc8af9f3e8404e2b We require libxdp >= 1.4.0 for QEMU to build though, so we may probably just drop the bpf_xdp_detach() call together with the n_queues hack. We're not loading the program from QEMU side, so not removing it manually makes sense. Should be able to drop the n_queues field from the AFXDPState as well. > >> in case of xsk map, it should not detach >> anything from an XDP program PoV (given inhibit) but rather it should remove >> prior installed xsk sockets from the xsk map to not leave them around. >> >> Best, >> Daniel >