On 5/12/25 4:23 PM, Daniel Borkmann wrote: > On 5/12/25 2:03 PM, Ilya Maximets wrote: >> On 5/9/25 4:05 PM, Daniel Borkmann wrote: >>> On 5/9/25 12:53 AM, Ilya Maximets wrote: >>>> On 5/8/25 2:34 PM, Daniel Borkmann wrote: >>>>> Extend inhibit=on setting with the option to specify a pinned XSK map >>>>> path along with a starting index (default 0) to push the created XSK >>>>> sockets into. Example usage: >>>>> >>>>> # ./build/qemu-system-x86_64 [...] \ >>>>> -netdev >>>>> af-xdp,ifname=eth0,id=net0,mode=native,queues=2,inhibit=on,map-path=/sys/fs/bpf/foo,map-start-index=2 >>>>> -device virtio-net-pci,netdev=net0 [...] >>>>> >>>>> This is useful for the case where an existing XDP program with XSK map >>>>> is present on the AF_XDP supported phys device and the XSK map not yet >>>>> populated. Qemu will then push the XSK sockets into the specified map. >>>> >>>> Thanks for the patch! >>>> >>>> Could you, please, explain the use case a little more? Is this patch >>>> aiming to improve usability? Do you have a specific use case in mind? >>> >>> The use case we have is basically that the phys NIC has an XDP program >>> already attached which redirects into xsk map (e.g. installed from separate >>> control plane), the xsk map got pinned during that process into bpf fs, >>> and now qemu is launched, it creates the xsk sockets and then places them >>> into the map by gathering the map fd from the pinned bpf fs file. >> >> OK. That's what I thought. Would be good to expand the commit message >> a bit explaining the use case. > > Ack, I already adjusted locally. Planning to send v2 ~today with your feedback > incorporated. Much appreciated! > >>>> The main idea behind 'inhibit' is that the qemu doesn't need to have a lot >>>> of privileges to use the pre-loaded program and the pre-created sockets. >>>> But creating the sockets and setting them into a map doesn't allow us to >>>> run without privileges, IIUC. May be worth mentioning at least in the >>>> commit message. >>> >>> Yes, privileges for above use case are still needed. Will clarify in the >>> description. >> >> OK. >> >>>> Also, isn't map-start-index the same thing as start-queue ? Do we need >>>> both of them? >>> >>> I'd say yes given it does not have to be an exact mapping wrt queue index >>> to map slot. The default is 0 though and I expect this to be the most used >>> scenario. >> >> I'm still not sure about this. For example, libxdp treats queue id as a map >> index. And this value is actually not being used much in libxdp when the >> program load is inhibited. I see that with a custom XDP program the indexes >> inside the map may not directly correspond to queues in the device, and, in >> fact, may have no relation to the actual queues in the device at all. > > Right, that's correct. > >> However, we're still calling them "queues" from the QEMU interface (as in the >> "queues" parameter of the net/af-xdp device), and QEMU will just treat every >> slot in the BPF map as separate queues, as this BPF map is essentially the >> network device that QEMU is working with, it doesn't actually know what's >> behind it. >> >> So, I think, it should be appropriate to simplify the interface and >> just use existing start-queue configuration knob for this. >> >> What do you think? > > I was thinking of an example like the below (plainly taken from the XDP > example > programs at github.com/xdp-project/bpf-examples). > > struct { > __uint(type, BPF_MAP_TYPE_XSKMAP); > __uint(max_entries, MAX_SOCKS); > __uint(key_size, sizeof(int)); > __uint(value_size, sizeof(int)); > } xsks_map SEC(".maps"); > > int num_socks = 0; > static unsigned int rr; > > SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) > { > rr = (rr + 1) & (num_socks - 1); > return bpf_redirect_map(&xsks_map, rr, XDP_DROP); > } > > If we'd just reuse the start-queue configuration knob for this, then it > wouldn't > work. So I think having the flexibility of where to place the sockets in the > map > would make sense. But I can also drop that part of you think it does not > warrant > the extra knob and align to start-queue then the map always needs to be of the > same size as the combined NIC queues.
I'm a little confused here. The 'start-queue' is not used for anything important, AFAICT, in case of inhibit=on. So, why re-using it instead of adding a new config option reduces the number of available use cases? Bets regards, Ilya Maximets.