Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Wed, Mar 22, 2023 at 03:26:59PM +0200, Andrew Melnichenko wrote: > Hi all, > I've researched an issue a bit. The solution with passing eBPF blob > and loading in the Libvirt looks promising. > Overall, the possible solution looks like this: > * Libvirt checks virtio-net properties and understands that eBPF > steering may be required. > * Libvirt requests eBPF blob through QMP. > * Libvirt loads blob for virtio-net and passes fds from eBPF to QEMU. > > I think that it's a good idea to pass only eBPF blob without > additional metainformation. Most metainfo that we need could be > retrieved from eBPF blob, and the only question is to pass fds > sequence to QEMU. > I propose to pass them as they appear in the blob itself, like > "virtio-net-pci,ebpf_rss_fds=,,,...". Using ',' for separating FDs is a bad idea, because ',' is already used for separating QemuOpts arguments. With -netdev we use ':' for spearating FDs with vhostfds= and fds= arguments, so I'd suggest following that practice. > Also, I think it's a good idea to make a "general" QMP request for > eBPF blobs. Something like "request_ebpf "(g.e "request_ebpf > virtio-net-rss"). That's reasonable as a future proofing idea I think. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH 3/5] qmp: Added the helper stamp check.
Hi all, I've researched an issue a bit. The solution with passing eBPF blob and loading in the Libvirt looks promising. Overall, the possible solution looks like this: * Libvirt checks virtio-net properties and understands that eBPF steering may be required. * Libvirt requests eBPF blob through QMP. * Libvirt loads blob for virtio-net and passes fds from eBPF to QEMU. I think that it's a good idea to pass only eBPF blob without additional metainformation. Most metainfo that we need could be retrieved from eBPF blob, and the only question is to pass fds sequence to QEMU. I propose to pass them as they appear in the blob itself, like "virtio-net-pci,ebpf_rss_fds=,,,...". Also, I think it's a good idea to make a "general" QMP request for eBPF blobs. Something like "request_ebpf "(g.e "request_ebpf virtio-net-rss"). I'll prepare new RFC patches if you have questions or something to discuss, please let me know. On Thu, Mar 2, 2023 at 12:40 AM Toke Høiland-Jørgensen wrote: > > Daniel P. Berrangé writes: > > > On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote: > >> Daniel P. Berrangé writes: > >> > >> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote: > >> >> Daniel P. Berrangé writes: > >> >> > >> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen > >> >> > wrote: > >> >> >> Daniel P. Berrangé writes: > >> >> >> > >> >> >> Just to interject a note on this here: the skeleton code is mostly a > >> >> >> convenience feature used to embed BPF programs into the calling > >> >> >> binary. > >> >> >> It is perfectly possible to just have the BPF object file itself > >> >> >> reside > >> >> >> directly in the file system and just use the regular libbpf APIs to > >> >> >> load > >> >> >> it. Some things get a bit more cumbersome (mostly setting values of > >> >> >> global variables, if the BPF program uses those). > >> >> >> > >> >> >> So the JSON example above could just be a regular compiled-from-clang > >> >> >> BPF object file, and the management program can load that, inspect > >> >> >> its > >> >> >> contents using the libbpf APIs and pass the file descriptors on to > >> >> >> Qemu. > >> >> >> It's even possible to embed version information into this so that > >> >> >> Qemu > >> >> >> can check if it understands the format and bail out if it doesn't - > >> >> >> just > >> >> >> stick a version field in the configuration map as the first entry :) > >> >> > > >> >> > If all you have is the BPF object file is it possible to interrogate > >> >> > it to get a list of all the maps, and get FDs associated for them ? > >> >> > I had a look at the libbpf API and wasn't sure about that, it seemed > >> >> > like you had to know the required maps upfront ? If it is possible > >> >> > to auto-discover everything you need, soley from the BPF object file > >> >> > as input, then just dealing with that in isolation would feel simpler. > >> >> > >> >> It is. You load the object file, and bpf_object__for_each_map() lets you > >> >> discover which maps it contains, with the different bpf_map__*() APIs > >> >> telling you the properties of that map (and you can modify them too > >> >> before loading the object if needed). > >> >> > >> >> The only thing that's not in the object file is any initial data you > >> >> want to put into the map(s). But except for read-only maps that can be > >> >> added by userspace after loading the maps, so you could just let Qemu do > >> >> that... > >> >> > >> >> > It occurrs to me that exposing the BPF program as data rather than > >> >> > via binary will make more practical to integrate this into KubeVirt's > >> >> > architecture. In their deployment setup both QEMU and libvirt are > >> >> > running unprivileged inside a container. For any advanced nmetworking > >> >> > a completely separate component creates the TAP device and passes it > >> >> > into the container running QEMU. I don't think that the separate > >> >> > precisely matched helper binary would be something they can use, but > >> >> > it might be possible to expose a data file providing the BPF program > >> >> > blob and describing its maps. > >> >> > >> >> Well, "a data file providing the BPF program blob and describing its > >> >> maps" is basically what a BPF .o file is. It just happens to be encoded > >> >> in ELF format :) > >> >> > >> >> You can embed it into some other data structure and have libbpf load it > >> >> from a blob in memory as well as from the filesystem, though; that is > >> >> basically what the skeleton file does (notice the big character string > >> >> at the end, that's just the original .o file contents). > >> > > >> > Ok, in that case I'm really wondering why any of this helper program > >> > stuff was proposed. I recall the rationale was that it was impossible > >> > for an external program to load the BPF object on behalf of QEMU, > >> > because it would not know how todo that without QEMU specific > >> > knowledge. > >> > >>
Re: [PATCH 3/5] qmp: Added the helper stamp check.
Daniel P. Berrangé writes: > On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote: >> Daniel P. Berrangé writes: >> >> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote: >> >> Daniel P. Berrangé writes: >> >> >> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote: >> >> >> Daniel P. Berrangé writes: >> >> >> >> >> >> Just to interject a note on this here: the skeleton code is mostly a >> >> >> convenience feature used to embed BPF programs into the calling binary. >> >> >> It is perfectly possible to just have the BPF object file itself reside >> >> >> directly in the file system and just use the regular libbpf APIs to >> >> >> load >> >> >> it. Some things get a bit more cumbersome (mostly setting values of >> >> >> global variables, if the BPF program uses those). >> >> >> >> >> >> So the JSON example above could just be a regular compiled-from-clang >> >> >> BPF object file, and the management program can load that, inspect its >> >> >> contents using the libbpf APIs and pass the file descriptors on to >> >> >> Qemu. >> >> >> It's even possible to embed version information into this so that Qemu >> >> >> can check if it understands the format and bail out if it doesn't - >> >> >> just >> >> >> stick a version field in the configuration map as the first entry :) >> >> > >> >> > If all you have is the BPF object file is it possible to interrogate >> >> > it to get a list of all the maps, and get FDs associated for them ? >> >> > I had a look at the libbpf API and wasn't sure about that, it seemed >> >> > like you had to know the required maps upfront ? If it is possible >> >> > to auto-discover everything you need, soley from the BPF object file >> >> > as input, then just dealing with that in isolation would feel simpler. >> >> >> >> It is. You load the object file, and bpf_object__for_each_map() lets you >> >> discover which maps it contains, with the different bpf_map__*() APIs >> >> telling you the properties of that map (and you can modify them too >> >> before loading the object if needed). >> >> >> >> The only thing that's not in the object file is any initial data you >> >> want to put into the map(s). But except for read-only maps that can be >> >> added by userspace after loading the maps, so you could just let Qemu do >> >> that... >> >> >> >> > It occurrs to me that exposing the BPF program as data rather than >> >> > via binary will make more practical to integrate this into KubeVirt's >> >> > architecture. In their deployment setup both QEMU and libvirt are >> >> > running unprivileged inside a container. For any advanced nmetworking >> >> > a completely separate component creates the TAP device and passes it >> >> > into the container running QEMU. I don't think that the separate >> >> > precisely matched helper binary would be something they can use, but >> >> > it might be possible to expose a data file providing the BPF program >> >> > blob and describing its maps. >> >> >> >> Well, "a data file providing the BPF program blob and describing its >> >> maps" is basically what a BPF .o file is. It just happens to be encoded >> >> in ELF format :) >> >> >> >> You can embed it into some other data structure and have libbpf load it >> >> from a blob in memory as well as from the filesystem, though; that is >> >> basically what the skeleton file does (notice the big character string >> >> at the end, that's just the original .o file contents). >> > >> > Ok, in that case I'm really wondering why any of this helper program >> > stuff was proposed. I recall the rationale was that it was impossible >> > for an external program to load the BPF object on behalf of QEMU, >> > because it would not know how todo that without QEMU specific >> > knowledge. >> >> I'm not sure either. Was there some bits that initially needed to be set >> before the program was loaded (read-only maps or something)? Also, >> upstream does encourage the use of skeletons for embedding into >> applications, so it's not an unreasonable thing to start with if you >> don't have the kind of deployment constraints that Qemu does in this >> case. >> >> > It looks like we can simply expose the BPF object blob to mgmt apps >> > directly and get rid of this helper program entirely. >> >> I believe so, yes. You'd still need to be sure that the BPF object file >> itself comes from a trusted place, but hopefully it should be enough to >> load it from a known filesystem path? (Sorry if this is a stupid >> question, I only have a fuzzy idea of how all the pieces fit together >> here). > > It could be from a well known location on the filesystem, but might > be better to make it possible to query it from QMP, which is mostly > safe *provided* you've not yet started guest CPUs running. It could > be queried at startup and then cached for future use. Right, I don't have a strong opinion about the exact mechanism, just wanted to convey a general "loading an
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote: > Daniel P. Berrangé writes: > > > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote: > >> Daniel P. Berrangé writes: > >> > >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote: > >> >> Daniel P. Berrangé writes: > >> >> > >> >> Just to interject a note on this here: the skeleton code is mostly a > >> >> convenience feature used to embed BPF programs into the calling binary. > >> >> It is perfectly possible to just have the BPF object file itself reside > >> >> directly in the file system and just use the regular libbpf APIs to load > >> >> it. Some things get a bit more cumbersome (mostly setting values of > >> >> global variables, if the BPF program uses those). > >> >> > >> >> So the JSON example above could just be a regular compiled-from-clang > >> >> BPF object file, and the management program can load that, inspect its > >> >> contents using the libbpf APIs and pass the file descriptors on to Qemu. > >> >> It's even possible to embed version information into this so that Qemu > >> >> can check if it understands the format and bail out if it doesn't - just > >> >> stick a version field in the configuration map as the first entry :) > >> > > >> > If all you have is the BPF object file is it possible to interrogate > >> > it to get a list of all the maps, and get FDs associated for them ? > >> > I had a look at the libbpf API and wasn't sure about that, it seemed > >> > like you had to know the required maps upfront ? If it is possible > >> > to auto-discover everything you need, soley from the BPF object file > >> > as input, then just dealing with that in isolation would feel simpler. > >> > >> It is. You load the object file, and bpf_object__for_each_map() lets you > >> discover which maps it contains, with the different bpf_map__*() APIs > >> telling you the properties of that map (and you can modify them too > >> before loading the object if needed). > >> > >> The only thing that's not in the object file is any initial data you > >> want to put into the map(s). But except for read-only maps that can be > >> added by userspace after loading the maps, so you could just let Qemu do > >> that... > >> > >> > It occurrs to me that exposing the BPF program as data rather than > >> > via binary will make more practical to integrate this into KubeVirt's > >> > architecture. In their deployment setup both QEMU and libvirt are > >> > running unprivileged inside a container. For any advanced nmetworking > >> > a completely separate component creates the TAP device and passes it > >> > into the container running QEMU. I don't think that the separate > >> > precisely matched helper binary would be something they can use, but > >> > it might be possible to expose a data file providing the BPF program > >> > blob and describing its maps. > >> > >> Well, "a data file providing the BPF program blob and describing its > >> maps" is basically what a BPF .o file is. It just happens to be encoded > >> in ELF format :) > >> > >> You can embed it into some other data structure and have libbpf load it > >> from a blob in memory as well as from the filesystem, though; that is > >> basically what the skeleton file does (notice the big character string > >> at the end, that's just the original .o file contents). > > > > Ok, in that case I'm really wondering why any of this helper program > > stuff was proposed. I recall the rationale was that it was impossible > > for an external program to load the BPF object on behalf of QEMU, > > because it would not know how todo that without QEMU specific > > knowledge. > > I'm not sure either. Was there some bits that initially needed to be set > before the program was loaded (read-only maps or something)? Also, > upstream does encourage the use of skeletons for embedding into > applications, so it's not an unreasonable thing to start with if you > don't have the kind of deployment constraints that Qemu does in this > case. > > > It looks like we can simply expose the BPF object blob to mgmt apps > > directly and get rid of this helper program entirely. > > I believe so, yes. You'd still need to be sure that the BPF object file > itself comes from a trusted place, but hopefully it should be enough to > load it from a known filesystem path? (Sorry if this is a stupid > question, I only have a fuzzy idea of how all the pieces fit together > here). It could be from a well known location on the filesystem, but might be better to make it possible to query it from QMP, which is mostly safe *provided* you've not yet started guest CPUs running. It could be queried at startup and then cached for future use. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange
Re: [PATCH 3/5] qmp: Added the helper stamp check.
Daniel P. Berrangé writes: > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote: >> Daniel P. Berrangé writes: >> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote: >> >> Daniel P. Berrangé writes: >> >> >> >> Just to interject a note on this here: the skeleton code is mostly a >> >> convenience feature used to embed BPF programs into the calling binary. >> >> It is perfectly possible to just have the BPF object file itself reside >> >> directly in the file system and just use the regular libbpf APIs to load >> >> it. Some things get a bit more cumbersome (mostly setting values of >> >> global variables, if the BPF program uses those). >> >> >> >> So the JSON example above could just be a regular compiled-from-clang >> >> BPF object file, and the management program can load that, inspect its >> >> contents using the libbpf APIs and pass the file descriptors on to Qemu. >> >> It's even possible to embed version information into this so that Qemu >> >> can check if it understands the format and bail out if it doesn't - just >> >> stick a version field in the configuration map as the first entry :) >> > >> > If all you have is the BPF object file is it possible to interrogate >> > it to get a list of all the maps, and get FDs associated for them ? >> > I had a look at the libbpf API and wasn't sure about that, it seemed >> > like you had to know the required maps upfront ? If it is possible >> > to auto-discover everything you need, soley from the BPF object file >> > as input, then just dealing with that in isolation would feel simpler. >> >> It is. You load the object file, and bpf_object__for_each_map() lets you >> discover which maps it contains, with the different bpf_map__*() APIs >> telling you the properties of that map (and you can modify them too >> before loading the object if needed). >> >> The only thing that's not in the object file is any initial data you >> want to put into the map(s). But except for read-only maps that can be >> added by userspace after loading the maps, so you could just let Qemu do >> that... >> >> > It occurrs to me that exposing the BPF program as data rather than >> > via binary will make more practical to integrate this into KubeVirt's >> > architecture. In their deployment setup both QEMU and libvirt are >> > running unprivileged inside a container. For any advanced nmetworking >> > a completely separate component creates the TAP device and passes it >> > into the container running QEMU. I don't think that the separate >> > precisely matched helper binary would be something they can use, but >> > it might be possible to expose a data file providing the BPF program >> > blob and describing its maps. >> >> Well, "a data file providing the BPF program blob and describing its >> maps" is basically what a BPF .o file is. It just happens to be encoded >> in ELF format :) >> >> You can embed it into some other data structure and have libbpf load it >> from a blob in memory as well as from the filesystem, though; that is >> basically what the skeleton file does (notice the big character string >> at the end, that's just the original .o file contents). > > Ok, in that case I'm really wondering why any of this helper program > stuff was proposed. I recall the rationale was that it was impossible > for an external program to load the BPF object on behalf of QEMU, > because it would not know how todo that without QEMU specific > knowledge. I'm not sure either. Was there some bits that initially needed to be set before the program was loaded (read-only maps or something)? Also, upstream does encourage the use of skeletons for embedding into applications, so it's not an unreasonable thing to start with if you don't have the kind of deployment constraints that Qemu does in this case. > It looks like we can simply expose the BPF object blob to mgmt apps > directly and get rid of this helper program entirely. I believe so, yes. You'd still need to be sure that the BPF object file itself comes from a trusted place, but hopefully it should be enough to load it from a known filesystem path? (Sorry if this is a stupid question, I only have a fuzzy idea of how all the pieces fit together here). -Toke
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Wed, Mar 01, 2023 at 08:49:42AM +0200, Yuri Benditovich wrote: > On Tue, Feb 28, 2023 at 8:05 PM Daniel P. Berrangé > wrote: > > > On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote: > > > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé > > > > > wrote: > > > > > > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: > > > > > Added a function to check the stamp in the helper. > > > > > eBPF helper should have a special symbol that generates during the > > build. > > > > > QEMU checks the helper and determines that it fits, so the helper > > > > > will produce proper output. > > > > > > > > I think this is quite limiting for in place upgrades. > > > > > > > > Consider this scenario > > > > > > > > * Host has QEMU 8.1.0 installed > > > > * VM is running QEMU 8.1.0 > > > > * QEMU 8.1.1 is released with a bug fix in the EBF program > > > > * Host is upgraded to QEMU 8.1.1 > > > > * User attempts to hotplug a NIC to the running VM > > > > > > > > IIUC this last step is going to fail because we'll be loading > > > > the EBF program from 8.1.1 and so its hash is different from > > > > that expected by the QEMU 8.1.0 that the pre-existing VM is > > > > running. > > > > > > > > Indeed we did not take in account the in-place upgrade. > > > > > > > > > > > > > If some changes to the EBF program are not going to be back > > > > compatible from the POV of the QEMU process, should we instead > > > > be versioning the EBF program. eg so new QEMU will ship both > > > > the old and new versions of the EBF program. > > > > > > This does not seem to be an elegant option: QEMU theoretically can > > include > > > different eBPF programs but it hardly can interface with each one of > > them. > > > The code of QEMU (access to eBPF maps etc) includes header files which > > eBPF > > > of the day is being built with them. > > > > > > I see 2 options to address this issue (of course there are more) > > > 1. Build and install qemu-rss-helper- executable. Libvirt will > > always > > > have a correct name, so for the running instance it will use > > > qemu-rss-helper-, for the new instance it will use > > > qemu-rss-helper- > > > > We'll get an ever growing number of program variants we need to > > build & distribute with each new QEMU release. > > > > New release of the qemu-rss-helper- will be created in fact only > when the eBPF binary is updated. > This does not happen on each release. But yes, this looks like versioning > of all the shared libraries. > > > > > > > 2. Build the helper executable and link it inside qemu as a blob. Libvirt > > > will always retrieve the executable to the temporary file name and use > > it. > > > So the retrieved helper will always be compatible with QEMU. I'm not sure > > > what is the most portable way to do that. > > > > QEMU is considered an untrusted process, so there's no way we're going > > to ask it to give us an ELF binary and then execute that in privileged > > context. > > > > > Does one of these seem suitable? > > > > Neither feels very appealing to me. > > > > I've been trying to understand the eBPF code we're dealing with in a > > little more detail. > > > > IIUC, QEMU, or rather the virtio-net driver needs to receive one FD > > for the BPF program, and one or more FDs for the BPF maps that the > > program uses. Currently it uses 3 maps, so needs 3 map FDs on top of > > the program FD. > > > > The helper program that is proposed here calls ebpf_rss_load() to > > load the program and get back a struct which gives access to the > > 4 FDs, which are then sent to the mgmt app, which forwards them > > onto QEMU. > > > > The ebpf_rss_load() method is making use of various structs that > > are specific to the RSS program implementation, but does not seems > > to do anything especially interesting. It calls into rss_bpf__open() > > which eventually gets around to calling rss_bpf__create_skeleton > > which is where the interesting stuff happens. > > > > This rss_bpf__create_skeleton() method is implemented in terms of > > totally generic libbpf APIs, and has the actual blob that is the > > BPF program. > > > > Looking at what this does, I feel it should be trivial for a mgmt > > app to implement equivalent logic to rss_bpf__create_skeleton in a > > generic manner, if we could just expose the program blob and the > > map names to the mgmt app. eg a simple json file > > > > { > > "maps": [ > > "tap_rss_map_configurations", > > "tap_rss_map_indirection_table", > > "tap_rss_map_toeplitz_key", > > ], > > "program": "the big blob encoded in base64..." > > } > > > > if we installed that file are /usr/share/qemu/bpf/net-rss.json > > then when a QEMU process is started, the mgmt app capture the > > data in this JSON file. It now has enough info to create the > > EBPF programs needed and pass the FDs over to QEMU. This would > > be robust against QEMU software upgrades, and not tied to the > > specific
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote: > Daniel P. Berrangé writes: > > > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote: > >> Daniel P. Berrangé writes: > >> > >> Just to interject a note on this here: the skeleton code is mostly a > >> convenience feature used to embed BPF programs into the calling binary. > >> It is perfectly possible to just have the BPF object file itself reside > >> directly in the file system and just use the regular libbpf APIs to load > >> it. Some things get a bit more cumbersome (mostly setting values of > >> global variables, if the BPF program uses those). > >> > >> So the JSON example above could just be a regular compiled-from-clang > >> BPF object file, and the management program can load that, inspect its > >> contents using the libbpf APIs and pass the file descriptors on to Qemu. > >> It's even possible to embed version information into this so that Qemu > >> can check if it understands the format and bail out if it doesn't - just > >> stick a version field in the configuration map as the first entry :) > > > > If all you have is the BPF object file is it possible to interrogate > > it to get a list of all the maps, and get FDs associated for them ? > > I had a look at the libbpf API and wasn't sure about that, it seemed > > like you had to know the required maps upfront ? If it is possible > > to auto-discover everything you need, soley from the BPF object file > > as input, then just dealing with that in isolation would feel simpler. > > It is. You load the object file, and bpf_object__for_each_map() lets you > discover which maps it contains, with the different bpf_map__*() APIs > telling you the properties of that map (and you can modify them too > before loading the object if needed). > > The only thing that's not in the object file is any initial data you > want to put into the map(s). But except for read-only maps that can be > added by userspace after loading the maps, so you could just let Qemu do > that... > > > It occurrs to me that exposing the BPF program as data rather than > > via binary will make more practical to integrate this into KubeVirt's > > architecture. In their deployment setup both QEMU and libvirt are > > running unprivileged inside a container. For any advanced nmetworking > > a completely separate component creates the TAP device and passes it > > into the container running QEMU. I don't think that the separate > > precisely matched helper binary would be something they can use, but > > it might be possible to expose a data file providing the BPF program > > blob and describing its maps. > > Well, "a data file providing the BPF program blob and describing its > maps" is basically what a BPF .o file is. It just happens to be encoded > in ELF format :) > > You can embed it into some other data structure and have libbpf load it > from a blob in memory as well as from the filesystem, though; that is > basically what the skeleton file does (notice the big character string > at the end, that's just the original .o file contents). Ok, in that case I'm really wondering why any of this helper program stuff was proposed. I recall the rationale was that it was impossible for an external program to load the BPF object on behalf of QEMU, because it would not know how todo that without QEMU specific knowledge. It looks like we can simply expose the BPF object blob to mgmt apps directly and get rid of this helper program entirely. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Tue, Feb 28, 2023 at 8:05 PM Daniel P. Berrangé wrote: > On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote: > > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé > > > wrote: > > > > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: > > > > Added a function to check the stamp in the helper. > > > > eBPF helper should have a special symbol that generates during the > build. > > > > QEMU checks the helper and determines that it fits, so the helper > > > > will produce proper output. > > > > > > I think this is quite limiting for in place upgrades. > > > > > > Consider this scenario > > > > > > * Host has QEMU 8.1.0 installed > > > * VM is running QEMU 8.1.0 > > > * QEMU 8.1.1 is released with a bug fix in the EBF program > > > * Host is upgraded to QEMU 8.1.1 > > > * User attempts to hotplug a NIC to the running VM > > > > > > IIUC this last step is going to fail because we'll be loading > > > the EBF program from 8.1.1 and so its hash is different from > > > that expected by the QEMU 8.1.0 that the pre-existing VM is > > > running. > > > > > > Indeed we did not take in account the in-place upgrade. > > > > > > > > > If some changes to the EBF program are not going to be back > > > compatible from the POV of the QEMU process, should we instead > > > be versioning the EBF program. eg so new QEMU will ship both > > > the old and new versions of the EBF program. > > > > This does not seem to be an elegant option: QEMU theoretically can > include > > different eBPF programs but it hardly can interface with each one of > them. > > The code of QEMU (access to eBPF maps etc) includes header files which > eBPF > > of the day is being built with them. > > > > I see 2 options to address this issue (of course there are more) > > 1. Build and install qemu-rss-helper- executable. Libvirt will > always > > have a correct name, so for the running instance it will use > > qemu-rss-helper-, for the new instance it will use > > qemu-rss-helper- > > We'll get an ever growing number of program variants we need to > build & distribute with each new QEMU release. > New release of the qemu-rss-helper- will be created in fact only when the eBPF binary is updated. This does not happen on each release. But yes, this looks like versioning of all the shared libraries. > > > 2. Build the helper executable and link it inside qemu as a blob. Libvirt > > will always retrieve the executable to the temporary file name and use > it. > > So the retrieved helper will always be compatible with QEMU. I'm not sure > > what is the most portable way to do that. > > QEMU is considered an untrusted process, so there's no way we're going > to ask it to give us an ELF binary and then execute that in privileged > context. > > > Does one of these seem suitable? > > Neither feels very appealing to me. > > I've been trying to understand the eBPF code we're dealing with in a > little more detail. > > IIUC, QEMU, or rather the virtio-net driver needs to receive one FD > for the BPF program, and one or more FDs for the BPF maps that the > program uses. Currently it uses 3 maps, so needs 3 map FDs on top of > the program FD. > > The helper program that is proposed here calls ebpf_rss_load() to > load the program and get back a struct which gives access to the > 4 FDs, which are then sent to the mgmt app, which forwards them > onto QEMU. > > The ebpf_rss_load() method is making use of various structs that > are specific to the RSS program implementation, but does not seems > to do anything especially interesting. It calls into rss_bpf__open() > which eventually gets around to calling rss_bpf__create_skeleton > which is where the interesting stuff happens. > > This rss_bpf__create_skeleton() method is implemented in terms of > totally generic libbpf APIs, and has the actual blob that is the > BPF program. > > Looking at what this does, I feel it should be trivial for a mgmt > app to implement equivalent logic to rss_bpf__create_skeleton in a > generic manner, if we could just expose the program blob and the > map names to the mgmt app. eg a simple json file > > { > "maps": [ > "tap_rss_map_configurations", > "tap_rss_map_indirection_table", > "tap_rss_map_toeplitz_key", > ], > "program": "the big blob encoded in base64..." > } > > if we installed that file are /usr/share/qemu/bpf/net-rss.json > then when a QEMU process is started, the mgmt app capture the > data in this JSON file. It now has enough info to create the > EBPF programs needed and pass the FDs over to QEMU. This would > be robust against QEMU software upgrades, and not tied to the > specific EBPF program imlp. We can add or remove maps / change > their names etc any time, as the details in the JSON descriptor > can be updated. This avoids need for any special helper program > to be provided by QEMU with the problems that is throwing up > for us. > If I understand correctly, the libvirt will have
Re: [PATCH 3/5] qmp: Added the helper stamp check.
Daniel P. Berrangé writes: > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote: >> Daniel P. Berrangé writes: >> >> Just to interject a note on this here: the skeleton code is mostly a >> convenience feature used to embed BPF programs into the calling binary. >> It is perfectly possible to just have the BPF object file itself reside >> directly in the file system and just use the regular libbpf APIs to load >> it. Some things get a bit more cumbersome (mostly setting values of >> global variables, if the BPF program uses those). >> >> So the JSON example above could just be a regular compiled-from-clang >> BPF object file, and the management program can load that, inspect its >> contents using the libbpf APIs and pass the file descriptors on to Qemu. >> It's even possible to embed version information into this so that Qemu >> can check if it understands the format and bail out if it doesn't - just >> stick a version field in the configuration map as the first entry :) > > If all you have is the BPF object file is it possible to interrogate > it to get a list of all the maps, and get FDs associated for them ? > I had a look at the libbpf API and wasn't sure about that, it seemed > like you had to know the required maps upfront ? If it is possible > to auto-discover everything you need, soley from the BPF object file > as input, then just dealing with that in isolation would feel simpler. It is. You load the object file, and bpf_object__for_each_map() lets you discover which maps it contains, with the different bpf_map__*() APIs telling you the properties of that map (and you can modify them too before loading the object if needed). The only thing that's not in the object file is any initial data you want to put into the map(s). But except for read-only maps that can be added by userspace after loading the maps, so you could just let Qemu do that... > It occurrs to me that exposing the BPF program as data rather than > via binary will make more practical to integrate this into KubeVirt's > architecture. In their deployment setup both QEMU and libvirt are > running unprivileged inside a container. For any advanced nmetworking > a completely separate component creates the TAP device and passes it > into the container running QEMU. I don't think that the separate > precisely matched helper binary would be something they can use, but > it might be possible to expose a data file providing the BPF program > blob and describing its maps. Well, "a data file providing the BPF program blob and describing its maps" is basically what a BPF .o file is. It just happens to be encoded in ELF format :) You can embed it into some other data structure and have libbpf load it from a blob in memory as well as from the filesystem, though; that is basically what the skeleton file does (notice the big character string at the end, that's just the original .o file contents). -Toke
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote: > Daniel P. Berrangé writes: > > Just to interject a note on this here: the skeleton code is mostly a > convenience feature used to embed BPF programs into the calling binary. > It is perfectly possible to just have the BPF object file itself reside > directly in the file system and just use the regular libbpf APIs to load > it. Some things get a bit more cumbersome (mostly setting values of > global variables, if the BPF program uses those). > > So the JSON example above could just be a regular compiled-from-clang > BPF object file, and the management program can load that, inspect its > contents using the libbpf APIs and pass the file descriptors on to Qemu. > It's even possible to embed version information into this so that Qemu > can check if it understands the format and bail out if it doesn't - just > stick a version field in the configuration map as the first entry :) If all you have is the BPF object file is it possible to interrogate it to get a list of all the maps, and get FDs associated for them ? I had a look at the libbpf API and wasn't sure about that, it seemed like you had to know the required maps upfront ? If it is possible to auto-discover everything you need, soley from the BPF object file as input, then just dealing with that in isolation would feel simpler. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Tue, Feb 28, 2023 at 06:04:51PM +, Daniel P. Berrangé wrote: > On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote: > > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé > > wrote: > > > > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: > > > > Added a function to check the stamp in the helper. > > > > eBPF helper should have a special symbol that generates during the > > > > build. > > > > QEMU checks the helper and determines that it fits, so the helper > > > > will produce proper output. > > > > > > I think this is quite limiting for in place upgrades. > > > > > > Consider this scenario > > > > > > * Host has QEMU 8.1.0 installed > > > * VM is running QEMU 8.1.0 > > > * QEMU 8.1.1 is released with a bug fix in the EBF program > > > * Host is upgraded to QEMU 8.1.1 > > > * User attempts to hotplug a NIC to the running VM > > > > > > IIUC this last step is going to fail because we'll be loading > > > the EBF program from 8.1.1 and so its hash is different from > > > that expected by the QEMU 8.1.0 that the pre-existing VM is > > > running. > > > > > > Indeed we did not take in account the in-place upgrade. > > > > > > > > > If some changes to the EBF program are not going to be back > > > compatible from the POV of the QEMU process, should we instead > > > be versioning the EBF program. eg so new QEMU will ship both > > > the old and new versions of the EBF program. > > > > This does not seem to be an elegant option: QEMU theoretically can include > > different eBPF programs but it hardly can interface with each one of them. > > The code of QEMU (access to eBPF maps etc) includes header files which eBPF > > of the day is being built with them. > > > > I see 2 options to address this issue (of course there are more) > > 1. Build and install qemu-rss-helper- executable. Libvirt will always > > have a correct name, so for the running instance it will use > > qemu-rss-helper-, for the new instance it will use > > qemu-rss-helper- > > We'll get an ever growing number of program variants we need to > build & distribute with each new QEMU release. > > > 2. Build the helper executable and link it inside qemu as a blob. Libvirt > > will always retrieve the executable to the temporary file name and use it. > > So the retrieved helper will always be compatible with QEMU. I'm not sure > > what is the most portable way to do that. > > QEMU is considered an untrusted process, so there's no way we're going > to ask it to give us an ELF binary and then execute that in privileged > context. > > > Does one of these seem suitable? > > Neither feels very appealing to me. > > I've been trying to understand the eBPF code we're dealing with in a > little more detail. > > IIUC, QEMU, or rather the virtio-net driver needs to receive one FD > for the BPF program, and one or more FDs for the BPF maps that the > program uses. Currently it uses 3 maps, so needs 3 map FDs on top of > the program FD. > > The helper program that is proposed here calls ebpf_rss_load() to > load the program and get back a struct which gives access to the > 4 FDs, which are then sent to the mgmt app, which forwards them > onto QEMU. > > The ebpf_rss_load() method is making use of various structs that > are specific to the RSS program implementation, but does not seems > to do anything especially interesting. It calls into rss_bpf__open() > which eventually gets around to calling rss_bpf__create_skeleton > which is where the interesting stuff happens. > > This rss_bpf__create_skeleton() method is implemented in terms of > totally generic libbpf APIs, and has the actual blob that is the > BPF program. > > Looking at what this does, I feel it should be trivial for a mgmt > app to implement equivalent logic to rss_bpf__create_skeleton in a > generic manner, if we could just expose the program blob and the > map names to the mgmt app. eg a simple json file > > { > "maps": [ > "tap_rss_map_configurations", > "tap_rss_map_indirection_table", > "tap_rss_map_toeplitz_key", > ], > "program": "the big blob encoded in base64..." > } > > if we installed that file are /usr/share/qemu/bpf/net-rss.json > then when a QEMU process is started, the mgmt app capture the > data in this JSON file. It now has enough info to create the > EBPF programs needed and pass the FDs over to QEMU. This would > be robust against QEMU software upgrades, and not tied to the > specific EBPF program imlp. We can add or remove maps / change > their names etc any time, as the details in the JSON descriptor > can be updated. This avoids need for any special helper program > to be provided by QEMU with the problems that is throwing up > for us. It occurrs to me that exposing the BPF program as data rather than via binary will make more practical to integrate this into KubeVirt's architecture. In their deployment setup both QEMU and libvirt are running unprivileged inside a
Re: [PATCH 3/5] qmp: Added the helper stamp check.
Daniel P. Berrangé writes: > On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote: >> On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé >> wrote: >> >> > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: >> > > Added a function to check the stamp in the helper. >> > > eBPF helper should have a special symbol that generates during the build. >> > > QEMU checks the helper and determines that it fits, so the helper >> > > will produce proper output. >> > >> > I think this is quite limiting for in place upgrades. >> > >> > Consider this scenario >> > >> > * Host has QEMU 8.1.0 installed >> > * VM is running QEMU 8.1.0 >> > * QEMU 8.1.1 is released with a bug fix in the EBF program >> > * Host is upgraded to QEMU 8.1.1 >> > * User attempts to hotplug a NIC to the running VM >> > >> > IIUC this last step is going to fail because we'll be loading >> > the EBF program from 8.1.1 and so its hash is different from >> > that expected by the QEMU 8.1.0 that the pre-existing VM is >> > running. >> > >> > Indeed we did not take in account the in-place upgrade. >> >> >> >> > If some changes to the EBF program are not going to be back >> > compatible from the POV of the QEMU process, should we instead >> > be versioning the EBF program. eg so new QEMU will ship both >> > the old and new versions of the EBF program. >> >> This does not seem to be an elegant option: QEMU theoretically can include >> different eBPF programs but it hardly can interface with each one of them. >> The code of QEMU (access to eBPF maps etc) includes header files which eBPF >> of the day is being built with them. >> >> I see 2 options to address this issue (of course there are more) >> 1. Build and install qemu-rss-helper- executable. Libvirt will always >> have a correct name, so for the running instance it will use >> qemu-rss-helper-, for the new instance it will use >> qemu-rss-helper- > > We'll get an ever growing number of program variants we need to > build & distribute with each new QEMU release. > >> 2. Build the helper executable and link it inside qemu as a blob. Libvirt >> will always retrieve the executable to the temporary file name and use it. >> So the retrieved helper will always be compatible with QEMU. I'm not sure >> what is the most portable way to do that. > > QEMU is considered an untrusted process, so there's no way we're going > to ask it to give us an ELF binary and then execute that in privileged > context. > >> Does one of these seem suitable? > > Neither feels very appealing to me. > > I've been trying to understand the eBPF code we're dealing with in a > little more detail. > > IIUC, QEMU, or rather the virtio-net driver needs to receive one FD > for the BPF program, and one or more FDs for the BPF maps that the > program uses. Currently it uses 3 maps, so needs 3 map FDs on top of > the program FD. > > The helper program that is proposed here calls ebpf_rss_load() to > load the program and get back a struct which gives access to the > 4 FDs, which are then sent to the mgmt app, which forwards them > onto QEMU. > > The ebpf_rss_load() method is making use of various structs that > are specific to the RSS program implementation, but does not seems > to do anything especially interesting. It calls into rss_bpf__open() > which eventually gets around to calling rss_bpf__create_skeleton > which is where the interesting stuff happens. > > This rss_bpf__create_skeleton() method is implemented in terms of > totally generic libbpf APIs, and has the actual blob that is the > BPF program. > > Looking at what this does, I feel it should be trivial for a mgmt > app to implement equivalent logic to rss_bpf__create_skeleton in a > generic manner, if we could just expose the program blob and the > map names to the mgmt app. eg a simple json file > > { > "maps": [ > "tap_rss_map_configurations", > "tap_rss_map_indirection_table", > "tap_rss_map_toeplitz_key", > ], > "program": "the big blob encoded in base64..." > } > > if we installed that file are /usr/share/qemu/bpf/net-rss.json > then when a QEMU process is started, the mgmt app capture the > data in this JSON file. It now has enough info to create the > EBPF programs needed and pass the FDs over to QEMU. This would > be robust against QEMU software upgrades, and not tied to the > specific EBPF program imlp. We can add or remove maps / change > their names etc any time, as the details in the JSON descriptor > can be updated. This avoids need for any special helper program > to be provided by QEMU with the problems that is throwing up > for us. Just to interject a note on this here: the skeleton code is mostly a convenience feature used to embed BPF programs into the calling binary. It is perfectly possible to just have the BPF object file itself reside directly in the file system and just use the regular libbpf APIs to load it. Some things get a bit more cumbersome (mostly setting
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote: > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé > wrote: > > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: > > > Added a function to check the stamp in the helper. > > > eBPF helper should have a special symbol that generates during the build. > > > QEMU checks the helper and determines that it fits, so the helper > > > will produce proper output. > > > > I think this is quite limiting for in place upgrades. > > > > Consider this scenario > > > > * Host has QEMU 8.1.0 installed > > * VM is running QEMU 8.1.0 > > * QEMU 8.1.1 is released with a bug fix in the EBF program > > * Host is upgraded to QEMU 8.1.1 > > * User attempts to hotplug a NIC to the running VM > > > > IIUC this last step is going to fail because we'll be loading > > the EBF program from 8.1.1 and so its hash is different from > > that expected by the QEMU 8.1.0 that the pre-existing VM is > > running. > > > > Indeed we did not take in account the in-place upgrade. > > > > > If some changes to the EBF program are not going to be back > > compatible from the POV of the QEMU process, should we instead > > be versioning the EBF program. eg so new QEMU will ship both > > the old and new versions of the EBF program. > > This does not seem to be an elegant option: QEMU theoretically can include > different eBPF programs but it hardly can interface with each one of them. > The code of QEMU (access to eBPF maps etc) includes header files which eBPF > of the day is being built with them. > > I see 2 options to address this issue (of course there are more) > 1. Build and install qemu-rss-helper- executable. Libvirt will always > have a correct name, so for the running instance it will use > qemu-rss-helper-, for the new instance it will use > qemu-rss-helper- We'll get an ever growing number of program variants we need to build & distribute with each new QEMU release. > 2. Build the helper executable and link it inside qemu as a blob. Libvirt > will always retrieve the executable to the temporary file name and use it. > So the retrieved helper will always be compatible with QEMU. I'm not sure > what is the most portable way to do that. QEMU is considered an untrusted process, so there's no way we're going to ask it to give us an ELF binary and then execute that in privileged context. > Does one of these seem suitable? Neither feels very appealing to me. I've been trying to understand the eBPF code we're dealing with in a little more detail. IIUC, QEMU, or rather the virtio-net driver needs to receive one FD for the BPF program, and one or more FDs for the BPF maps that the program uses. Currently it uses 3 maps, so needs 3 map FDs on top of the program FD. The helper program that is proposed here calls ebpf_rss_load() to load the program and get back a struct which gives access to the 4 FDs, which are then sent to the mgmt app, which forwards them onto QEMU. The ebpf_rss_load() method is making use of various structs that are specific to the RSS program implementation, but does not seems to do anything especially interesting. It calls into rss_bpf__open() which eventually gets around to calling rss_bpf__create_skeleton which is where the interesting stuff happens. This rss_bpf__create_skeleton() method is implemented in terms of totally generic libbpf APIs, and has the actual blob that is the BPF program. Looking at what this does, I feel it should be trivial for a mgmt app to implement equivalent logic to rss_bpf__create_skeleton in a generic manner, if we could just expose the program blob and the map names to the mgmt app. eg a simple json file { "maps": [ "tap_rss_map_configurations", "tap_rss_map_indirection_table", "tap_rss_map_toeplitz_key", ], "program": "the big blob encoded in base64..." } if we installed that file are /usr/share/qemu/bpf/net-rss.json then when a QEMU process is started, the mgmt app capture the data in this JSON file. It now has enough info to create the EBPF programs needed and pass the FDs over to QEMU. This would be robust against QEMU software upgrades, and not tied to the specific EBPF program imlp. We can add or remove maps / change their names etc any time, as the details in the JSON descriptor can be updated. This avoids need for any special helper program to be provided by QEMU with the problems that is throwing up for us. What am I missing ? This seems pretty straightforward to achieve from what I see. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé wrote: > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: > > Added a function to check the stamp in the helper. > > eBPF helper should have a special symbol that generates during the build. > > QEMU checks the helper and determines that it fits, so the helper > > will produce proper output. > > I think this is quite limiting for in place upgrades. > > Consider this scenario > > * Host has QEMU 8.1.0 installed > * VM is running QEMU 8.1.0 > * QEMU 8.1.1 is released with a bug fix in the EBF program > * Host is upgraded to QEMU 8.1.1 > * User attempts to hotplug a NIC to the running VM > > IIUC this last step is going to fail because we'll be loading > the EBF program from 8.1.1 and so its hash is different from > that expected by the QEMU 8.1.0 that the pre-existing VM is > running. > > Indeed we did not take in account the in-place upgrade. > If some changes to the EBF program are not going to be back > compatible from the POV of the QEMU process, should we instead > be versioning the EBF program. eg so new QEMU will ship both > the old and new versions of the EBF program. > > This does not seem to be an elegant option: QEMU theoretically can include different eBPF programs but it hardly can interface with each one of them. The code of QEMU (access to eBPF maps etc) includes header files which eBPF of the day is being built with them. I see 2 options to address this issue (of course there are more) 1. Build and install qemu-rss-helper- executable. Libvirt will always have a correct name, so for the running instance it will use qemu-rss-helper-, for the new instance it will use qemu-rss-helper- 2. Build the helper executable and link it inside qemu as a blob. Libvirt will always retrieve the executable to the temporary file name and use it. So the retrieved helper will always be compatible with QEMU. I'm not sure what is the most portable way to do that. Daniel, Does one of these seem suitable? > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org-o- > https://www.instagram.com/dberrange :| > >
Re: [PATCH 3/5] qmp: Added the helper stamp check.
Andrew Melnichenko writes: > Hi all, > > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé > wrote: >> >> On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: >> > Added a function to check the stamp in the helper. >> > eBPF helper should have a special symbol that generates during the build. >> > QEMU checks the helper and determines that it fits, so the helper >> > will produce proper output. >> >> I think this is quite limiting for in place upgrades. >> >> Consider this scenario >> >> * Host has QEMU 8.1.0 installed >> * VM is running QEMU 8.1.0 >> * QEMU 8.1.1 is released with a bug fix in the EBF program >> * Host is upgraded to QEMU 8.1.1 >> * User attempts to hotplug a NIC to the running VM >> >> IIUC this last step is going to fail because we'll be loading >> the EBF program from 8.1.1 and so its hash is different from >> that expected by the QEMU 8.1.0 that the pre-existing VM is >> running. >> >> If some changes to the EBF program are not going to be back >> compatible from the POV of the QEMU process, should we instead >> be versioning the EBF program. eg so new QEMU will ship both >> the old and new versions of the EBF program. >> > > I think it's too complicated to maintain backward compatibility with > eBPF programs in "one package". > As we can see, the eBPF skeleton may be changed not only by bugfix but > with changes in libbpf(g.e. libbpf 1.0.1+). Hmm, what change in libbpf 1.0.1 affects the skeleton format? > This may lead to issues when with a newer environment you can't > consistently recreate the "old" skeleton. This should be detectable, though? As in, if you know that a new version breaks compatibility you just bump the version number regardless of the underlying application version? -Toke
Re: [PATCH 3/5] qmp: Added the helper stamp check.
Hi all, On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé wrote: > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: > > Added a function to check the stamp in the helper. > > eBPF helper should have a special symbol that generates during the build. > > QEMU checks the helper and determines that it fits, so the helper > > will produce proper output. > > I think this is quite limiting for in place upgrades. > > Consider this scenario > > * Host has QEMU 8.1.0 installed > * VM is running QEMU 8.1.0 > * QEMU 8.1.1 is released with a bug fix in the EBF program > * Host is upgraded to QEMU 8.1.1 > * User attempts to hotplug a NIC to the running VM > > IIUC this last step is going to fail because we'll be loading > the EBF program from 8.1.1 and so its hash is different from > that expected by the QEMU 8.1.0 that the pre-existing VM is > running. > > If some changes to the EBF program are not going to be back > compatible from the POV of the QEMU process, should we instead > be versioning the EBF program. eg so new QEMU will ship both > the old and new versions of the EBF program. > I think it's too complicated to maintain backward compatibility with eBPF programs in "one package". As we can see, the eBPF skeleton may be changed not only by bugfix but with changes in libbpf(g.e. libbpf 1.0.1+). This may lead to issues when with a newer environment you can't consistently recreate the "old" skeleton. To solve the case - we need to "save" the halper from an old package somehow. > > With regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| >
Re: [PATCH 3/5] qmp: Added the helper stamp check.
On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote: > Added a function to check the stamp in the helper. > eBPF helper should have a special symbol that generates during the build. > QEMU checks the helper and determines that it fits, so the helper > will produce proper output. I think this is quite limiting for in place upgrades. Consider this scenario * Host has QEMU 8.1.0 installed * VM is running QEMU 8.1.0 * QEMU 8.1.1 is released with a bug fix in the EBF program * Host is upgraded to QEMU 8.1.1 * User attempts to hotplug a NIC to the running VM IIUC this last step is going to fail because we'll be loading the EBF program from 8.1.1 and so its hash is different from that expected by the QEMU 8.1.0 that the pre-existing VM is running. If some changes to the EBF program are not going to be back compatible from the POV of the QEMU process, should we instead be versioning the EBF program. eg so new QEMU will ship both the old and new versions of the EBF program. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[PATCH 3/5] qmp: Added the helper stamp check.
Added a function to check the stamp in the helper. eBPF helper should have a special symbol that generates during the build. QEMU checks the helper and determines that it fits, so the helper will produce proper output. Signed-off-by: Andrew Melnychenko --- meson.build| 10 + monitor/meson.build| 1 + monitor/qemu-ebpf-rss-helper-stamp-utils.c | 322 + monitor/qemu-ebpf-rss-helper-stamp-utils.h | 39 +++ 4 files changed, 372 insertions(+) create mode 100644 monitor/qemu-ebpf-rss-helper-stamp-utils.c create mode 100644 monitor/qemu-ebpf-rss-helper-stamp-utils.h diff --git a/meson.build b/meson.build index a76c855312..b409912aed 100644 --- a/meson.build +++ b/meson.build @@ -2868,6 +2868,16 @@ foreach d : hx_headers endforeach genh += hxdep +ebpf_rss_helper_stamp = custom_target( +'qemu-ebpf-rss-helper-stamp.h', +output : 'qemu-ebpf-rss-helper-stamp.h', +input : 'ebpf/rss.bpf.skeleton.h', +command : [python, '-c', 'import hashlib; print(\'#define QEMU_EBPF_RSS_HELPER_STAMP qemuEbpfRssHelperStamp_{}\'.format(hashlib.sha1(open(\'@INPUT@\', \'rb\').read()).hexdigest()))'], +capture: true, +) + +genh += ebpf_rss_helper_stamp + ### # Collect sources # ### diff --git a/monitor/meson.build b/monitor/meson.build index ccb4d1a8e6..36de73414b 100644 --- a/monitor/meson.build +++ b/monitor/meson.build @@ -6,6 +6,7 @@ softmmu_ss.add(files( 'hmp.c', )) softmmu_ss.add([spice_headers, files('qmp-cmds.c')]) +softmmu_ss.add(files('qemu-ebpf-rss-helper-stamp-utils.c')) specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: [files( 'hmp-cmds-target.c', 'hmp-target.c'), spice]) diff --git a/monitor/qemu-ebpf-rss-helper-stamp-utils.c b/monitor/qemu-ebpf-rss-helper-stamp-utils.c new file mode 100644 index 00..23efc36ef0 --- /dev/null +++ b/monitor/qemu-ebpf-rss-helper-stamp-utils.c @@ -0,0 +1,322 @@ +/* + * QEMU helper stamp check utils. + * + * Developed by Daynix Computing LTD (http://www.daynix.com) + * + * Authors: + * Andrew Melnychenko + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Description: This file mostly implements helper stamp checking. + * The stamp is implemented in a similar way as in qemu modules. + * The helper should contain a specific symbol. + * Not in a similar way is symbol checking - here we parse + * the ELF file. For now, only eBPF helper contains + * the stamp, and the stamp is generated from + * sha1 ebpf/rss.bpf.skeleton.h (see meson.build). + */ + +#include "qemu/osdep.h" +#include "elf.h" +#include "qemu-ebpf-rss-helper-stamp-utils.h" + +#include + +#ifdef CONFIG_LINUX + +static void *file_allocate_and_read(int fd, off_t off, size_t size) +{ +void *data; +int err; + +if (fd < 0) { +return NULL; +} + +err = lseek(fd, off, SEEK_SET); +if (err < 0) { +return NULL; +} + +data = g_new0(char, size); +if (data == NULL) { +return NULL; +} + +err = read(fd, data, size); +if (err < 0) { +g_free(data); +return NULL; +} + +return data; +} + +static Elf64_Shdr *elf64_get_section_table(int fd, Elf64_Ehdr *elf_header) +{ +if (elf_header == NULL) { +return NULL; +} +return (Elf64_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff, + elf_header->e_shnum * elf_header->e_shentsize); +} + +static Elf32_Shdr *elf32_get_section_table(int fd, Elf32_Ehdr *elf_header) +{ +if (elf_header == NULL) { +return NULL; +} +return (Elf32_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff, + elf_header->e_shnum * elf_header->e_shentsize); +} + +static void *elf64_get_section_data(int fd, const Elf64_Shdr* section_header) +{ +if (fd < 0 || section_header == NULL) { +return NULL; +} +return file_allocate_and_read(fd, section_header->sh_offset, + section_header->sh_size); +} + +static void *elf32_get_section_data(int fd, const Elf32_Shdr* section_header) +{ +if (fd < 0 || section_header == NULL) { +return NULL; +} +return file_allocate_and_read(fd, section_header->sh_offset, + section_header->sh_size); +} + +static bool elf64_check_symbol_in_symbol_table(int fd, + Elf64_Shdr *section_table, + Elf64_Shdr *symbol_section, + const char *symbol) +{ +Elf64_Sym *symbol_table; +char *string_table; +uint32_t i; +bool ret = false; + +symbol_table = (Elf64_Sym *) elf64_get_section_data(fd, symbol_section); +if (symbol_table == NULL) { +return
[PATCH 3/5] qmp: Added the helper stamp check.
Added function to check the stamp in the helper. eBPF helper should have a special symbol that generates during build. QEMU checks the helper and determinates that it fits, so the helper will produce proper output. Signed-off-by: Andrew Melnychenko --- meson.build | 10 + monitor/meson.build | 1 + monitor/qemu-helper-stamp-utils.c | 297 ++ monitor/qemu-helper-stamp-utils.h | 24 +++ 4 files changed, 332 insertions(+) create mode 100644 monitor/qemu-helper-stamp-utils.c create mode 100644 monitor/qemu-helper-stamp-utils.h diff --git a/meson.build b/meson.build index 626cf932c1..257e51d91b 100644 --- a/meson.build +++ b/meson.build @@ -1757,6 +1757,16 @@ foreach d : hx_headers endforeach genh += hxdep +helper_stamp = custom_target( +'qemu-helper-stamp.h', +output : 'qemu-helper-stamp.h', +input : 'ebpf/rss.bpf.skeleton.h', +command : [python, '-c', 'import hashlib; print(\'#define QEMU_HELPER_STAMP qemuHelperStamp_{}\'.format(hashlib.sha1(open(\'@INPUT@\', \'rb\').read()).hexdigest()))'], +capture: true, +) + +genh += helper_stamp + ### # Collect sources # ### diff --git a/monitor/meson.build b/monitor/meson.build index 6d00985ace..2b6b39549b 100644 --- a/monitor/meson.build +++ b/monitor/meson.build @@ -5,5 +5,6 @@ softmmu_ss.add(files( 'hmp.c', )) softmmu_ss.add([spice_headers, files('qmp-cmds.c')]) +softmmu_ss.add(files('qemu-helper-stamp-utils.c')) specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: [files('misc.c'), spice]) diff --git a/monitor/qemu-helper-stamp-utils.c b/monitor/qemu-helper-stamp-utils.c new file mode 100644 index 00..d34c3b94c5 --- /dev/null +++ b/monitor/qemu-helper-stamp-utils.c @@ -0,0 +1,297 @@ +/* + * QEMU helper stamp check utils. + * + * Developed by Daynix Computing LTD (http://www.daynix.com) + * + * Authors: + * Andrew Melnychenko + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Description: This file mostly implements helper stamp checking. + * The stamp is implemented in a similar way as in qemu modules. + * The helper should contain a specific symbol. + * Not in a similar way is symbol checking - here we parse + * the ELF file. For now(10.07.2021), only eBPF helper contains + * the stamp, and the stamp is generated from + * sha1 ebpf/rss.bpf.skeleton.h (see meson.build). + */ + +#include "qemu/osdep.h" +#include "elf.h" +#include "qemu-helper-stamp-utils.h" + +#include + +#ifdef CONFIG_LINUX + +static void *file_allocate_and_read(int fd, off_t off, size_t size) +{ +void *data; +int err; + +if (fd < 0) { +return NULL; +} + +err = lseek(fd, off, SEEK_SET); +if (err < 0) { +return NULL; +} + +data = g_new0(char, size); +if (data == NULL) { +return NULL; +} + +err = read(fd, data, size); +if (err < 0) { +g_free(data); +return NULL; +} + +return data; +} + +static Elf64_Shdr *elf64_get_section_table(int fd, Elf64_Ehdr *elf_header) +{ +if (elf_header == NULL) { +return NULL; +} +return (Elf64_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff, + elf_header->e_shnum * elf_header->e_shentsize); +} + +static Elf32_Shdr *elf32_get_section_table(int fd, Elf32_Ehdr *elf_header) +{ +if (elf_header == NULL) { +return NULL; +} +return (Elf32_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff, + elf_header->e_shnum * elf_header->e_shentsize); +} + +static void *elf64_get_section_data(int fd, const Elf64_Shdr* section_header) +{ +if (fd < 0 || section_header == NULL) { +return NULL; +} +return file_allocate_and_read(fd, section_header->sh_offset, + section_header->sh_size); +} + +static void *elf32_get_section_data(int fd, const Elf32_Shdr* section_header) +{ +if (fd < 0 || section_header == NULL) { +return NULL; +} +return file_allocate_and_read(fd, section_header->sh_offset, + section_header->sh_size); +} + +static bool elf64_check_symbol_in_symbol_table(int fd, + Elf64_Shdr *section_table, + Elf64_Shdr *symbol_section, + const char *symbol) +{ +Elf64_Sym *symbol_table; +char *string_table; +uint32_t i; +bool ret = false; + +symbol_table = (Elf64_Sym *) elf64_get_section_data(fd, symbol_section); +if (symbol_table == NULL) { +return false; +} + +string_table = (char *) elf64_get_section_data( +fd, section_table + symbol_section->sh_link); +if (string_table == NULL) { +