Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-03-22 Thread Daniel P . Berrangé
On Wed, Mar 22, 2023 at 03:26:59PM +0200, Andrew Melnichenko wrote:
> Hi all,
> I've researched an issue a bit. The solution with passing eBPF blob
> and loading in the Libvirt looks promising.
> Overall, the possible solution looks like this:
>  * Libvirt checks virtio-net properties and understands that eBPF
> steering may be required.
>  * Libvirt requests eBPF blob through QMP.
>  * Libvirt loads blob for virtio-net and passes fds from eBPF to QEMU.
> 
> I think that it's a good idea to pass only eBPF blob without
> additional metainformation. Most metainfo that we need could be
> retrieved from eBPF blob, and the only question is to pass fds
> sequence to QEMU.
> I propose to pass them as they appear in the blob itself, like
> "virtio-net-pci,ebpf_rss_fds=,,,...".

Using ',' for separating FDs is a bad idea, because ',' is already
used for separating QemuOpts arguments.

With -netdev we use ':' for spearating FDs with vhostfds= and fds=
arguments, so I'd suggest following that practice.

> Also, I think it's a good idea to make a "general" QMP request for
> eBPF blobs. Something like "request_ebpf "(g.e "request_ebpf
> virtio-net-rss").

That's reasonable as a future proofing idea I think.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-03-22 Thread Andrew Melnichenko
Hi all,
I've researched an issue a bit. The solution with passing eBPF blob
and loading in the Libvirt looks promising.
Overall, the possible solution looks like this:
 * Libvirt checks virtio-net properties and understands that eBPF
steering may be required.
 * Libvirt requests eBPF blob through QMP.
 * Libvirt loads blob for virtio-net and passes fds from eBPF to QEMU.

I think that it's a good idea to pass only eBPF blob without
additional metainformation. Most metainfo that we need could be
retrieved from eBPF blob, and the only question is to pass fds
sequence to QEMU.
I propose to pass them as they appear in the blob itself, like
"virtio-net-pci,ebpf_rss_fds=,,,...".
Also, I think it's a good idea to make a "general" QMP request for
eBPF blobs. Something like "request_ebpf "(g.e "request_ebpf
virtio-net-rss").

I'll prepare new RFC patches if you have questions or something to
discuss, please let me know.

On Thu, Mar 2, 2023 at 12:40 AM Toke Høiland-Jørgensen  wrote:
>
> Daniel P. Berrangé  writes:
>
> > On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote:
> >> Daniel P. Berrangé  writes:
> >>
> >> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
> >> >> Daniel P. Berrangé  writes:
> >> >>
> >> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen 
> >> >> > wrote:
> >> >> >> Daniel P. Berrangé  writes:
> >> >> >>
> >> >> >> Just to interject a note on this here: the skeleton code is mostly a
> >> >> >> convenience feature used to embed BPF programs into the calling 
> >> >> >> binary.
> >> >> >> It is perfectly possible to just have the BPF object file itself 
> >> >> >> reside
> >> >> >> directly in the file system and just use the regular libbpf APIs to 
> >> >> >> load
> >> >> >> it. Some things get a bit more cumbersome (mostly setting values of
> >> >> >> global variables, if the BPF program uses those).
> >> >> >>
> >> >> >> So the JSON example above could just be a regular compiled-from-clang
> >> >> >> BPF object file, and the management program can load that, inspect 
> >> >> >> its
> >> >> >> contents using the libbpf APIs and pass the file descriptors on to 
> >> >> >> Qemu.
> >> >> >> It's even possible to embed version information into this so that 
> >> >> >> Qemu
> >> >> >> can check if it understands the format and bail out if it doesn't - 
> >> >> >> just
> >> >> >> stick a version field in the configuration map as the first entry :)
> >> >> >
> >> >> > If all you have is the BPF object file is it possible to interrogate
> >> >> > it to get a list of all the maps, and get FDs associated for them ?
> >> >> > I had a look at the libbpf API and wasn't sure about that, it seemed
> >> >> > like you had to know the required maps upfront ?  If it is possible
> >> >> > to auto-discover everything you need, soley from the BPF object file
> >> >> > as input, then just dealing with that in isolation would feel simpler.
> >> >>
> >> >> It is. You load the object file, and bpf_object__for_each_map() lets you
> >> >> discover which maps it contains, with the different bpf_map__*() APIs
> >> >> telling you the properties of that map (and you can modify them too
> >> >> before loading the object if needed).
> >> >>
> >> >> The only thing that's not in the object file is any initial data you
> >> >> want to put into the map(s). But except for read-only maps that can be
> >> >> added by userspace after loading the maps, so you could just let Qemu do
> >> >> that...
> >> >>
> >> >> > It occurrs to me that exposing the BPF program as data rather than
> >> >> > via binary will make more practical to integrate this into KubeVirt's
> >> >> > architecture. In their deployment setup both QEMU and libvirt are
> >> >> > running unprivileged inside a container. For any advanced nmetworking
> >> >> > a completely separate component creates the TAP device and passes it
> >> >> > into the container running QEMU. I don't think that the separate
> >> >> > precisely matched helper binary would be something they can use, but
> >> >> > it might be possible to expose a data file providing the BPF program
> >> >> > blob and describing its maps.
> >> >>
> >> >> Well, "a data file providing the BPF program blob and describing its
> >> >> maps" is basically what a BPF .o file is. It just happens to be encoded
> >> >> in ELF format :)
> >> >>
> >> >> You can embed it into some other data structure and have libbpf load it
> >> >> from a blob in memory as well as from the filesystem, though; that is
> >> >> basically what the skeleton file does (notice the big character string
> >> >> at the end, that's just the original .o file contents).
> >> >
> >> > Ok, in that case I'm really wondering why any of this helper program
> >> > stuff was proposed. I recall the rationale was that it was impossible
> >> > for an external program to load the BPF object on behalf of QEMU,
> >> > because it would not know how todo that without QEMU specific
> >> > knowledge.
> >>
> >> 

Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-03-01 Thread Toke Høiland-Jørgensen
Daniel P. Berrangé  writes:

> On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote:
>> Daniel P. Berrangé  writes:
>> 
>> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
>> >> Daniel P. Berrangé  writes:
>> >> 
>> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
>> >> >> Daniel P. Berrangé  writes:
>> >> >> 
>> >> >> Just to interject a note on this here: the skeleton code is mostly a
>> >> >> convenience feature used to embed BPF programs into the calling binary.
>> >> >> It is perfectly possible to just have the BPF object file itself reside
>> >> >> directly in the file system and just use the regular libbpf APIs to 
>> >> >> load
>> >> >> it. Some things get a bit more cumbersome (mostly setting values of
>> >> >> global variables, if the BPF program uses those).
>> >> >> 
>> >> >> So the JSON example above could just be a regular compiled-from-clang
>> >> >> BPF object file, and the management program can load that, inspect its
>> >> >> contents using the libbpf APIs and pass the file descriptors on to 
>> >> >> Qemu.
>> >> >> It's even possible to embed version information into this so that Qemu
>> >> >> can check if it understands the format and bail out if it doesn't - 
>> >> >> just
>> >> >> stick a version field in the configuration map as the first entry :)
>> >> >
>> >> > If all you have is the BPF object file is it possible to interrogate
>> >> > it to get a list of all the maps, and get FDs associated for them ?
>> >> > I had a look at the libbpf API and wasn't sure about that, it seemed
>> >> > like you had to know the required maps upfront ?  If it is possible
>> >> > to auto-discover everything you need, soley from the BPF object file
>> >> > as input, then just dealing with that in isolation would feel simpler.
>> >> 
>> >> It is. You load the object file, and bpf_object__for_each_map() lets you
>> >> discover which maps it contains, with the different bpf_map__*() APIs
>> >> telling you the properties of that map (and you can modify them too
>> >> before loading the object if needed).
>> >> 
>> >> The only thing that's not in the object file is any initial data you
>> >> want to put into the map(s). But except for read-only maps that can be
>> >> added by userspace after loading the maps, so you could just let Qemu do
>> >> that...
>> >> 
>> >> > It occurrs to me that exposing the BPF program as data rather than
>> >> > via binary will make more practical to integrate this into KubeVirt's
>> >> > architecture. In their deployment setup both QEMU and libvirt are
>> >> > running unprivileged inside a container. For any advanced nmetworking
>> >> > a completely separate component creates the TAP device and passes it
>> >> > into the container running QEMU. I don't think that the separate
>> >> > precisely matched helper binary would be something they can use, but
>> >> > it might be possible to expose a data file providing the BPF program
>> >> > blob and describing its maps.
>> >> 
>> >> Well, "a data file providing the BPF program blob and describing its
>> >> maps" is basically what a BPF .o file is. It just happens to be encoded
>> >> in ELF format :)
>> >> 
>> >> You can embed it into some other data structure and have libbpf load it
>> >> from a blob in memory as well as from the filesystem, though; that is
>> >> basically what the skeleton file does (notice the big character string
>> >> at the end, that's just the original .o file contents).
>> >
>> > Ok, in that case I'm really wondering why any of this helper program
>> > stuff was proposed. I recall the rationale was that it was impossible
>> > for an external program to load the BPF object on behalf of QEMU,
>> > because it would not know how todo that without QEMU specific
>> > knowledge.
>> 
>> I'm not sure either. Was there some bits that initially needed to be set
>> before the program was loaded (read-only maps or something)? Also,
>> upstream does encourage the use of skeletons for embedding into
>> applications, so it's not an unreasonable thing to start with if you
>> don't have the kind of deployment constraints that Qemu does in this
>> case.
>> 
>> > It looks like we can simply expose the BPF object blob to mgmt apps
>> > directly and get rid of this helper program entirely.
>> 
>> I believe so, yes. You'd still need to be sure that the BPF object file
>> itself comes from a trusted place, but hopefully it should be enough to
>> load it from a known filesystem path? (Sorry if this is a stupid
>> question, I only have a fuzzy idea of how all the pieces fit together
>> here).
>
> It could be from a well known location on the filesystem, but might
> be better to make it possible to query it from QMP, which is mostly
> safe *provided* you've not yet started guest CPUs running. It could
> be queried at startup and then cached for future use.

Right, I don't have a strong opinion about the exact mechanism, just
wanted to convey a general "loading an 

Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-03-01 Thread Daniel P . Berrangé
On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote:
> Daniel P. Berrangé  writes:
> 
> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
> >> Daniel P. Berrangé  writes:
> >> 
> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
> >> >> Daniel P. Berrangé  writes:
> >> >> 
> >> >> Just to interject a note on this here: the skeleton code is mostly a
> >> >> convenience feature used to embed BPF programs into the calling binary.
> >> >> It is perfectly possible to just have the BPF object file itself reside
> >> >> directly in the file system and just use the regular libbpf APIs to load
> >> >> it. Some things get a bit more cumbersome (mostly setting values of
> >> >> global variables, if the BPF program uses those).
> >> >> 
> >> >> So the JSON example above could just be a regular compiled-from-clang
> >> >> BPF object file, and the management program can load that, inspect its
> >> >> contents using the libbpf APIs and pass the file descriptors on to Qemu.
> >> >> It's even possible to embed version information into this so that Qemu
> >> >> can check if it understands the format and bail out if it doesn't - just
> >> >> stick a version field in the configuration map as the first entry :)
> >> >
> >> > If all you have is the BPF object file is it possible to interrogate
> >> > it to get a list of all the maps, and get FDs associated for them ?
> >> > I had a look at the libbpf API and wasn't sure about that, it seemed
> >> > like you had to know the required maps upfront ?  If it is possible
> >> > to auto-discover everything you need, soley from the BPF object file
> >> > as input, then just dealing with that in isolation would feel simpler.
> >> 
> >> It is. You load the object file, and bpf_object__for_each_map() lets you
> >> discover which maps it contains, with the different bpf_map__*() APIs
> >> telling you the properties of that map (and you can modify them too
> >> before loading the object if needed).
> >> 
> >> The only thing that's not in the object file is any initial data you
> >> want to put into the map(s). But except for read-only maps that can be
> >> added by userspace after loading the maps, so you could just let Qemu do
> >> that...
> >> 
> >> > It occurrs to me that exposing the BPF program as data rather than
> >> > via binary will make more practical to integrate this into KubeVirt's
> >> > architecture. In their deployment setup both QEMU and libvirt are
> >> > running unprivileged inside a container. For any advanced nmetworking
> >> > a completely separate component creates the TAP device and passes it
> >> > into the container running QEMU. I don't think that the separate
> >> > precisely matched helper binary would be something they can use, but
> >> > it might be possible to expose a data file providing the BPF program
> >> > blob and describing its maps.
> >> 
> >> Well, "a data file providing the BPF program blob and describing its
> >> maps" is basically what a BPF .o file is. It just happens to be encoded
> >> in ELF format :)
> >> 
> >> You can embed it into some other data structure and have libbpf load it
> >> from a blob in memory as well as from the filesystem, though; that is
> >> basically what the skeleton file does (notice the big character string
> >> at the end, that's just the original .o file contents).
> >
> > Ok, in that case I'm really wondering why any of this helper program
> > stuff was proposed. I recall the rationale was that it was impossible
> > for an external program to load the BPF object on behalf of QEMU,
> > because it would not know how todo that without QEMU specific
> > knowledge.
> 
> I'm not sure either. Was there some bits that initially needed to be set
> before the program was loaded (read-only maps or something)? Also,
> upstream does encourage the use of skeletons for embedding into
> applications, so it's not an unreasonable thing to start with if you
> don't have the kind of deployment constraints that Qemu does in this
> case.
> 
> > It looks like we can simply expose the BPF object blob to mgmt apps
> > directly and get rid of this helper program entirely.
> 
> I believe so, yes. You'd still need to be sure that the BPF object file
> itself comes from a trusted place, but hopefully it should be enough to
> load it from a known filesystem path? (Sorry if this is a stupid
> question, I only have a fuzzy idea of how all the pieces fit together
> here).

It could be from a well known location on the filesystem, but might
be better to make it possible to query it from QMP, which is mostly
safe *provided* you've not yet started guest CPUs running. It could
be queried at startup and then cached for future use.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange 

Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-03-01 Thread Toke Høiland-Jørgensen
Daniel P. Berrangé  writes:

> On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
>> Daniel P. Berrangé  writes:
>> 
>> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
>> >> Daniel P. Berrangé  writes:
>> >> 
>> >> Just to interject a note on this here: the skeleton code is mostly a
>> >> convenience feature used to embed BPF programs into the calling binary.
>> >> It is perfectly possible to just have the BPF object file itself reside
>> >> directly in the file system and just use the regular libbpf APIs to load
>> >> it. Some things get a bit more cumbersome (mostly setting values of
>> >> global variables, if the BPF program uses those).
>> >> 
>> >> So the JSON example above could just be a regular compiled-from-clang
>> >> BPF object file, and the management program can load that, inspect its
>> >> contents using the libbpf APIs and pass the file descriptors on to Qemu.
>> >> It's even possible to embed version information into this so that Qemu
>> >> can check if it understands the format and bail out if it doesn't - just
>> >> stick a version field in the configuration map as the first entry :)
>> >
>> > If all you have is the BPF object file is it possible to interrogate
>> > it to get a list of all the maps, and get FDs associated for them ?
>> > I had a look at the libbpf API and wasn't sure about that, it seemed
>> > like you had to know the required maps upfront ?  If it is possible
>> > to auto-discover everything you need, soley from the BPF object file
>> > as input, then just dealing with that in isolation would feel simpler.
>> 
>> It is. You load the object file, and bpf_object__for_each_map() lets you
>> discover which maps it contains, with the different bpf_map__*() APIs
>> telling you the properties of that map (and you can modify them too
>> before loading the object if needed).
>> 
>> The only thing that's not in the object file is any initial data you
>> want to put into the map(s). But except for read-only maps that can be
>> added by userspace after loading the maps, so you could just let Qemu do
>> that...
>> 
>> > It occurrs to me that exposing the BPF program as data rather than
>> > via binary will make more practical to integrate this into KubeVirt's
>> > architecture. In their deployment setup both QEMU and libvirt are
>> > running unprivileged inside a container. For any advanced nmetworking
>> > a completely separate component creates the TAP device and passes it
>> > into the container running QEMU. I don't think that the separate
>> > precisely matched helper binary would be something they can use, but
>> > it might be possible to expose a data file providing the BPF program
>> > blob and describing its maps.
>> 
>> Well, "a data file providing the BPF program blob and describing its
>> maps" is basically what a BPF .o file is. It just happens to be encoded
>> in ELF format :)
>> 
>> You can embed it into some other data structure and have libbpf load it
>> from a blob in memory as well as from the filesystem, though; that is
>> basically what the skeleton file does (notice the big character string
>> at the end, that's just the original .o file contents).
>
> Ok, in that case I'm really wondering why any of this helper program
> stuff was proposed. I recall the rationale was that it was impossible
> for an external program to load the BPF object on behalf of QEMU,
> because it would not know how todo that without QEMU specific
> knowledge.

I'm not sure either. Was there some bits that initially needed to be set
before the program was loaded (read-only maps or something)? Also,
upstream does encourage the use of skeletons for embedding into
applications, so it's not an unreasonable thing to start with if you
don't have the kind of deployment constraints that Qemu does in this
case.

> It looks like we can simply expose the BPF object blob to mgmt apps
> directly and get rid of this helper program entirely.

I believe so, yes. You'd still need to be sure that the BPF object file
itself comes from a trusted place, but hopefully it should be enough to
load it from a known filesystem path? (Sorry if this is a stupid
question, I only have a fuzzy idea of how all the pieces fit together
here).

-Toke




Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-03-01 Thread Daniel P . Berrangé
On Wed, Mar 01, 2023 at 08:49:42AM +0200, Yuri Benditovich wrote:
> On Tue, Feb 28, 2023 at 8:05 PM Daniel P. Berrangé 
> wrote:
> 
> > On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> > > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé  > >
> > > wrote:
> > >
> > > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > > > Added a function to check the stamp in the helper.
> > > > > eBPF helper should have a special symbol that generates during the
> > build.
> > > > > QEMU checks the helper and determines that it fits, so the helper
> > > > > will produce proper output.
> > > >
> > > > I think this is quite limiting for in place upgrades.
> > > >
> > > > Consider this scenario
> > > >
> > > >  * Host has QEMU 8.1.0 installed
> > > >  * VM is running QEMU 8.1.0
> > > >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> > > >  * Host is upgraded to QEMU 8.1.1
> > > >  * User attempts to hotplug a NIC to the running VM
> > > >
> > > > IIUC this last step is going to fail because we'll be loading
> > > > the EBF program from 8.1.1 and so its hash is different from
> > > > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > > > running.
> > > >
> > > >   Indeed we did not take in account the in-place upgrade.
> > >
> > >
> > >
> > > > If some changes to the EBF program are not going to be back
> > > > compatible from the POV of the QEMU process, should we instead
> > > > be versioning the EBF program. eg so new QEMU will ship both
> > > > the old and new versions of the EBF program.
> > >
> > > This does not seem to be an elegant option: QEMU theoretically can
> > include
> > > different eBPF programs but it hardly can interface with each one of
> > them.
> > > The code of QEMU (access to eBPF maps etc) includes header files which
> > eBPF
> > > of the day is being built with them.
> > >
> > > I see 2 options to address this issue (of course there are more)
> > > 1. Build and install qemu-rss-helper- executable. Libvirt will
> > always
> > > have a correct name, so for the running instance it will use
> > > qemu-rss-helper-, for the new instance it will use
> > > qemu-rss-helper-
> >
> > We'll get an ever growing number of program variants we need to
> > build & distribute with each new QEMU release.
> >
> 
> New release of the qemu-rss-helper- will be created in fact only
> when the eBPF binary is updated.
> This does not happen on each release. But yes, this looks like versioning
> of all the shared libraries.
> 
> 
> >
> > > 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> > > will always retrieve the executable to the temporary file name and use
> > it.
> > > So the retrieved helper will always be compatible with QEMU. I'm not sure
> > > what is the most portable way to do that.
> >
> > QEMU is considered an untrusted process, so there's no way we're going
> > to ask it to give us an ELF binary and then execute that in privileged
> > context.
> >
> > > Does one of these seem suitable?
> >
> > Neither feels very appealing to me.
> >
> > I've been trying to understand the eBPF code we're dealing with in a
> > little more detail.
> >
> > IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> > for the BPF program, and one or more FDs for the BPF maps that the
> > program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> > the program FD.
> >
> > The helper program that is proposed here calls ebpf_rss_load() to
> > load the program and get back a struct which gives access to the
> > 4 FDs, which are then sent to the mgmt app, which forwards them
> > onto QEMU.
> >
> > The ebpf_rss_load() method is making use of various structs that
> > are specific to the RSS program implementation, but does not seems
> > to do anything especially interesting.  It calls into rss_bpf__open()
> > which eventually gets around to calling rss_bpf__create_skeleton
> > which is where the interesting stuff happens.
> >
> > This rss_bpf__create_skeleton() method is implemented in terms of
> > totally generic libbpf APIs, and has the actual blob that is the
> > BPF program.
> >
> > Looking at what this does, I feel it should be trivial for a mgmt
> > app to implement equivalent logic to rss_bpf__create_skeleton in a
> > generic manner, if we could just expose the program blob and the
> > map names to the mgmt app. eg a simple json file
> >
> >   {
> >  "maps": [
> > "tap_rss_map_configurations",
> > "tap_rss_map_indirection_table",
> > "tap_rss_map_toeplitz_key",
> >  ],
> >  "program": "the big blob encoded in base64..."
> >   }
> >
> > if we installed that file are /usr/share/qemu/bpf/net-rss.json
> > then when a QEMU process is started, the mgmt app capture the
> > data in this JSON file. It now has enough info to create the
> > EBPF programs needed and pass the FDs over to QEMU. This would
> > be robust against QEMU software upgrades, and not tied to the
> > specific 

Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-03-01 Thread Daniel P . Berrangé
On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
> Daniel P. Berrangé  writes:
> 
> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
> >> Daniel P. Berrangé  writes:
> >> 
> >> Just to interject a note on this here: the skeleton code is mostly a
> >> convenience feature used to embed BPF programs into the calling binary.
> >> It is perfectly possible to just have the BPF object file itself reside
> >> directly in the file system and just use the regular libbpf APIs to load
> >> it. Some things get a bit more cumbersome (mostly setting values of
> >> global variables, if the BPF program uses those).
> >> 
> >> So the JSON example above could just be a regular compiled-from-clang
> >> BPF object file, and the management program can load that, inspect its
> >> contents using the libbpf APIs and pass the file descriptors on to Qemu.
> >> It's even possible to embed version information into this so that Qemu
> >> can check if it understands the format and bail out if it doesn't - just
> >> stick a version field in the configuration map as the first entry :)
> >
> > If all you have is the BPF object file is it possible to interrogate
> > it to get a list of all the maps, and get FDs associated for them ?
> > I had a look at the libbpf API and wasn't sure about that, it seemed
> > like you had to know the required maps upfront ?  If it is possible
> > to auto-discover everything you need, soley from the BPF object file
> > as input, then just dealing with that in isolation would feel simpler.
> 
> It is. You load the object file, and bpf_object__for_each_map() lets you
> discover which maps it contains, with the different bpf_map__*() APIs
> telling you the properties of that map (and you can modify them too
> before loading the object if needed).
> 
> The only thing that's not in the object file is any initial data you
> want to put into the map(s). But except for read-only maps that can be
> added by userspace after loading the maps, so you could just let Qemu do
> that...
> 
> > It occurrs to me that exposing the BPF program as data rather than
> > via binary will make more practical to integrate this into KubeVirt's
> > architecture. In their deployment setup both QEMU and libvirt are
> > running unprivileged inside a container. For any advanced nmetworking
> > a completely separate component creates the TAP device and passes it
> > into the container running QEMU. I don't think that the separate
> > precisely matched helper binary would be something they can use, but
> > it might be possible to expose a data file providing the BPF program
> > blob and describing its maps.
> 
> Well, "a data file providing the BPF program blob and describing its
> maps" is basically what a BPF .o file is. It just happens to be encoded
> in ELF format :)
> 
> You can embed it into some other data structure and have libbpf load it
> from a blob in memory as well as from the filesystem, though; that is
> basically what the skeleton file does (notice the big character string
> at the end, that's just the original .o file contents).

Ok, in that case I'm really wondering why any of this helper program
stuff was proposed. I recall the rationale was that it was impossible
for an external program to load the BPF object on behalf of QEMU,
because it would not know how todo that without QEMU specific
knowledge.

It looks like we can simply expose the BPF object blob to mgmt apps
directly and get rid of this helper program entirely.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-28 Thread Yuri Benditovich
On Tue, Feb 28, 2023 at 8:05 PM Daniel P. Berrangé 
wrote:

> On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé  >
> > wrote:
> >
> > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > > Added a function to check the stamp in the helper.
> > > > eBPF helper should have a special symbol that generates during the
> build.
> > > > QEMU checks the helper and determines that it fits, so the helper
> > > > will produce proper output.
> > >
> > > I think this is quite limiting for in place upgrades.
> > >
> > > Consider this scenario
> > >
> > >  * Host has QEMU 8.1.0 installed
> > >  * VM is running QEMU 8.1.0
> > >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> > >  * Host is upgraded to QEMU 8.1.1
> > >  * User attempts to hotplug a NIC to the running VM
> > >
> > > IIUC this last step is going to fail because we'll be loading
> > > the EBF program from 8.1.1 and so its hash is different from
> > > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > > running.
> > >
> > >   Indeed we did not take in account the in-place upgrade.
> >
> >
> >
> > > If some changes to the EBF program are not going to be back
> > > compatible from the POV of the QEMU process, should we instead
> > > be versioning the EBF program. eg so new QEMU will ship both
> > > the old and new versions of the EBF program.
> >
> > This does not seem to be an elegant option: QEMU theoretically can
> include
> > different eBPF programs but it hardly can interface with each one of
> them.
> > The code of QEMU (access to eBPF maps etc) includes header files which
> eBPF
> > of the day is being built with them.
> >
> > I see 2 options to address this issue (of course there are more)
> > 1. Build and install qemu-rss-helper- executable. Libvirt will
> always
> > have a correct name, so for the running instance it will use
> > qemu-rss-helper-, for the new instance it will use
> > qemu-rss-helper-
>
> We'll get an ever growing number of program variants we need to
> build & distribute with each new QEMU release.
>

New release of the qemu-rss-helper- will be created in fact only
when the eBPF binary is updated.
This does not happen on each release. But yes, this looks like versioning
of all the shared libraries.


>
> > 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> > will always retrieve the executable to the temporary file name and use
> it.
> > So the retrieved helper will always be compatible with QEMU. I'm not sure
> > what is the most portable way to do that.
>
> QEMU is considered an untrusted process, so there's no way we're going
> to ask it to give us an ELF binary and then execute that in privileged
> context.
>
> > Does one of these seem suitable?
>
> Neither feels very appealing to me.
>
> I've been trying to understand the eBPF code we're dealing with in a
> little more detail.
>
> IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> for the BPF program, and one or more FDs for the BPF maps that the
> program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> the program FD.
>
> The helper program that is proposed here calls ebpf_rss_load() to
> load the program and get back a struct which gives access to the
> 4 FDs, which are then sent to the mgmt app, which forwards them
> onto QEMU.
>
> The ebpf_rss_load() method is making use of various structs that
> are specific to the RSS program implementation, but does not seems
> to do anything especially interesting.  It calls into rss_bpf__open()
> which eventually gets around to calling rss_bpf__create_skeleton
> which is where the interesting stuff happens.
>
> This rss_bpf__create_skeleton() method is implemented in terms of
> totally generic libbpf APIs, and has the actual blob that is the
> BPF program.
>
> Looking at what this does, I feel it should be trivial for a mgmt
> app to implement equivalent logic to rss_bpf__create_skeleton in a
> generic manner, if we could just expose the program blob and the
> map names to the mgmt app. eg a simple json file
>
>   {
>  "maps": [
> "tap_rss_map_configurations",
> "tap_rss_map_indirection_table",
> "tap_rss_map_toeplitz_key",
>  ],
>  "program": "the big blob encoded in base64..."
>   }
>
> if we installed that file are /usr/share/qemu/bpf/net-rss.json
> then when a QEMU process is started, the mgmt app capture the
> data in this JSON file. It now has enough info to create the
> EBPF programs needed and pass the FDs over to QEMU. This would
> be robust against QEMU software upgrades, and not tied to the
> specific EBPF program imlp. We can add or remove maps / change
> their names etc any time, as the details in the JSON descriptor
> can be updated.  This avoids need for any special helper program
> to be provided by QEMU with the problems that is throwing up
> for us.
>

If I understand correctly, the libvirt will have 

Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-28 Thread Toke Høiland-Jørgensen
Daniel P. Berrangé  writes:

> On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
>> Daniel P. Berrangé  writes:
>> 
>> Just to interject a note on this here: the skeleton code is mostly a
>> convenience feature used to embed BPF programs into the calling binary.
>> It is perfectly possible to just have the BPF object file itself reside
>> directly in the file system and just use the regular libbpf APIs to load
>> it. Some things get a bit more cumbersome (mostly setting values of
>> global variables, if the BPF program uses those).
>> 
>> So the JSON example above could just be a regular compiled-from-clang
>> BPF object file, and the management program can load that, inspect its
>> contents using the libbpf APIs and pass the file descriptors on to Qemu.
>> It's even possible to embed version information into this so that Qemu
>> can check if it understands the format and bail out if it doesn't - just
>> stick a version field in the configuration map as the first entry :)
>
> If all you have is the BPF object file is it possible to interrogate
> it to get a list of all the maps, and get FDs associated for them ?
> I had a look at the libbpf API and wasn't sure about that, it seemed
> like you had to know the required maps upfront ?  If it is possible
> to auto-discover everything you need, soley from the BPF object file
> as input, then just dealing with that in isolation would feel simpler.

It is. You load the object file, and bpf_object__for_each_map() lets you
discover which maps it contains, with the different bpf_map__*() APIs
telling you the properties of that map (and you can modify them too
before loading the object if needed).

The only thing that's not in the object file is any initial data you
want to put into the map(s). But except for read-only maps that can be
added by userspace after loading the maps, so you could just let Qemu do
that...

> It occurrs to me that exposing the BPF program as data rather than
> via binary will make more practical to integrate this into KubeVirt's
> architecture. In their deployment setup both QEMU and libvirt are
> running unprivileged inside a container. For any advanced nmetworking
> a completely separate component creates the TAP device and passes it
> into the container running QEMU. I don't think that the separate
> precisely matched helper binary would be something they can use, but
> it might be possible to expose a data file providing the BPF program
> blob and describing its maps.

Well, "a data file providing the BPF program blob and describing its
maps" is basically what a BPF .o file is. It just happens to be encoded
in ELF format :)

You can embed it into some other data structure and have libbpf load it
from a blob in memory as well as from the filesystem, though; that is
basically what the skeleton file does (notice the big character string
at the end, that's just the original .o file contents).

-Toke




Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-28 Thread Daniel P . Berrangé
On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
> Daniel P. Berrangé  writes:
> 
> Just to interject a note on this here: the skeleton code is mostly a
> convenience feature used to embed BPF programs into the calling binary.
> It is perfectly possible to just have the BPF object file itself reside
> directly in the file system and just use the regular libbpf APIs to load
> it. Some things get a bit more cumbersome (mostly setting values of
> global variables, if the BPF program uses those).
> 
> So the JSON example above could just be a regular compiled-from-clang
> BPF object file, and the management program can load that, inspect its
> contents using the libbpf APIs and pass the file descriptors on to Qemu.
> It's even possible to embed version information into this so that Qemu
> can check if it understands the format and bail out if it doesn't - just
> stick a version field in the configuration map as the first entry :)

If all you have is the BPF object file is it possible to interrogate
it to get a list of all the maps, and get FDs associated for them ?
I had a look at the libbpf API and wasn't sure about that, it seemed
like you had to know the required maps upfront ?  If it is possible
to auto-discover everything you need, soley from the BPF object file
as input, then just dealing with that in isolation would feel simpler.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-28 Thread Daniel P . Berrangé
On Tue, Feb 28, 2023 at 06:04:51PM +, Daniel P. Berrangé wrote:
> On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé 
> > wrote:
> > 
> > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > > Added a function to check the stamp in the helper.
> > > > eBPF helper should have a special symbol that generates during the 
> > > > build.
> > > > QEMU checks the helper and determines that it fits, so the helper
> > > > will produce proper output.
> > >
> > > I think this is quite limiting for in place upgrades.
> > >
> > > Consider this scenario
> > >
> > >  * Host has QEMU 8.1.0 installed
> > >  * VM is running QEMU 8.1.0
> > >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> > >  * Host is upgraded to QEMU 8.1.1
> > >  * User attempts to hotplug a NIC to the running VM
> > >
> > > IIUC this last step is going to fail because we'll be loading
> > > the EBF program from 8.1.1 and so its hash is different from
> > > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > > running.
> > >
> > >   Indeed we did not take in account the in-place upgrade.
> > 
> > 
> > 
> > > If some changes to the EBF program are not going to be back
> > > compatible from the POV of the QEMU process, should we instead
> > > be versioning the EBF program. eg so new QEMU will ship both
> > > the old and new versions of the EBF program.
> > 
> > This does not seem to be an elegant option: QEMU theoretically can include
> > different eBPF programs but it hardly can interface with each one of them.
> > The code of QEMU (access to eBPF maps etc) includes header files which eBPF
> > of the day is being built with them.
> > 
> > I see 2 options to address this issue (of course there are more)
> > 1. Build and install qemu-rss-helper- executable. Libvirt will always
> > have a correct name, so for the running instance it will use
> > qemu-rss-helper-, for the new instance it will use
> > qemu-rss-helper-
> 
> We'll get an ever growing number of program variants we need to
> build & distribute with each new QEMU release.
> 
> > 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> > will always retrieve the executable to the temporary file name and use it.
> > So the retrieved helper will always be compatible with QEMU. I'm not sure
> > what is the most portable way to do that.
> 
> QEMU is considered an untrusted process, so there's no way we're going
> to ask it to give us an ELF binary and then execute that in privileged
> context.
> 
> > Does one of these seem suitable?
> 
> Neither feels very appealing to me.
> 
> I've been trying to understand the eBPF code we're dealing with in a
> little more detail.
> 
> IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> for the BPF program, and one or more FDs for the BPF maps that the
> program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> the program FD.
> 
> The helper program that is proposed here calls ebpf_rss_load() to
> load the program and get back a struct which gives access to the
> 4 FDs, which are then sent to the mgmt app, which forwards them
> onto QEMU.
> 
> The ebpf_rss_load() method is making use of various structs that
> are specific to the RSS program implementation, but does not seems
> to do anything especially interesting.  It calls into rss_bpf__open()
> which eventually gets around to calling rss_bpf__create_skeleton
> which is where the interesting stuff happens.
> 
> This rss_bpf__create_skeleton() method is implemented in terms of
> totally generic libbpf APIs, and has the actual blob that is the
> BPF program.
> 
> Looking at what this does, I feel it should be trivial for a mgmt
> app to implement equivalent logic to rss_bpf__create_skeleton in a
> generic manner, if we could just expose the program blob and the
> map names to the mgmt app. eg a simple json file
> 
>   {
>  "maps": [
> "tap_rss_map_configurations",
>   "tap_rss_map_indirection_table",
>   "tap_rss_map_toeplitz_key",
>  ],
>  "program": "the big blob encoded in base64..."
>   }
> 
> if we installed that file are /usr/share/qemu/bpf/net-rss.json
> then when a QEMU process is started, the mgmt app capture the
> data in this JSON file. It now has enough info to create the
> EBPF programs needed and pass the FDs over to QEMU. This would
> be robust against QEMU software upgrades, and not tied to the
> specific EBPF program imlp. We can add or remove maps / change
> their names etc any time, as the details in the JSON descriptor
> can be updated.  This avoids need for any special helper program
> to be provided by QEMU with the problems that is throwing up
> for us.

It occurrs to me that exposing the BPF program as data rather than
via binary will make more practical to integrate this into KubeVirt's
architecture. In their deployment setup both QEMU and libvirt are
running unprivileged inside a 

Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-28 Thread Toke Høiland-Jørgensen
Daniel P. Berrangé  writes:

> On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
>> On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé 
>> wrote:
>> 
>> > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
>> > > Added a function to check the stamp in the helper.
>> > > eBPF helper should have a special symbol that generates during the build.
>> > > QEMU checks the helper and determines that it fits, so the helper
>> > > will produce proper output.
>> >
>> > I think this is quite limiting for in place upgrades.
>> >
>> > Consider this scenario
>> >
>> >  * Host has QEMU 8.1.0 installed
>> >  * VM is running QEMU 8.1.0
>> >  * QEMU 8.1.1 is released with a bug fix in the EBF program
>> >  * Host is upgraded to QEMU 8.1.1
>> >  * User attempts to hotplug a NIC to the running VM
>> >
>> > IIUC this last step is going to fail because we'll be loading
>> > the EBF program from 8.1.1 and so its hash is different from
>> > that expected by the QEMU 8.1.0 that the pre-existing VM is
>> > running.
>> >
>> >   Indeed we did not take in account the in-place upgrade.
>> 
>> 
>> 
>> > If some changes to the EBF program are not going to be back
>> > compatible from the POV of the QEMU process, should we instead
>> > be versioning the EBF program. eg so new QEMU will ship both
>> > the old and new versions of the EBF program.
>> 
>> This does not seem to be an elegant option: QEMU theoretically can include
>> different eBPF programs but it hardly can interface with each one of them.
>> The code of QEMU (access to eBPF maps etc) includes header files which eBPF
>> of the day is being built with them.
>> 
>> I see 2 options to address this issue (of course there are more)
>> 1. Build and install qemu-rss-helper- executable. Libvirt will always
>> have a correct name, so for the running instance it will use
>> qemu-rss-helper-, for the new instance it will use
>> qemu-rss-helper-
>
> We'll get an ever growing number of program variants we need to
> build & distribute with each new QEMU release.
>
>> 2. Build the helper executable and link it inside qemu as a blob. Libvirt
>> will always retrieve the executable to the temporary file name and use it.
>> So the retrieved helper will always be compatible with QEMU. I'm not sure
>> what is the most portable way to do that.
>
> QEMU is considered an untrusted process, so there's no way we're going
> to ask it to give us an ELF binary and then execute that in privileged
> context.
>
>> Does one of these seem suitable?
>
> Neither feels very appealing to me.
>
> I've been trying to understand the eBPF code we're dealing with in a
> little more detail.
>
> IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> for the BPF program, and one or more FDs for the BPF maps that the
> program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> the program FD.
>
> The helper program that is proposed here calls ebpf_rss_load() to
> load the program and get back a struct which gives access to the
> 4 FDs, which are then sent to the mgmt app, which forwards them
> onto QEMU.
>
> The ebpf_rss_load() method is making use of various structs that
> are specific to the RSS program implementation, but does not seems
> to do anything especially interesting.  It calls into rss_bpf__open()
> which eventually gets around to calling rss_bpf__create_skeleton
> which is where the interesting stuff happens.
>
> This rss_bpf__create_skeleton() method is implemented in terms of
> totally generic libbpf APIs, and has the actual blob that is the
> BPF program.
>
> Looking at what this does, I feel it should be trivial for a mgmt
> app to implement equivalent logic to rss_bpf__create_skeleton in a
> generic manner, if we could just expose the program blob and the
> map names to the mgmt app. eg a simple json file
>
>   {
>  "maps": [
> "tap_rss_map_configurations",
>   "tap_rss_map_indirection_table",
>   "tap_rss_map_toeplitz_key",
>  ],
>  "program": "the big blob encoded in base64..."
>   }
>
> if we installed that file are /usr/share/qemu/bpf/net-rss.json
> then when a QEMU process is started, the mgmt app capture the
> data in this JSON file. It now has enough info to create the
> EBPF programs needed and pass the FDs over to QEMU. This would
> be robust against QEMU software upgrades, and not tied to the
> specific EBPF program imlp. We can add or remove maps / change
> their names etc any time, as the details in the JSON descriptor
> can be updated.  This avoids need for any special helper program
> to be provided by QEMU with the problems that is throwing up
> for us.

Just to interject a note on this here: the skeleton code is mostly a
convenience feature used to embed BPF programs into the calling binary.
It is perfectly possible to just have the BPF object file itself reside
directly in the file system and just use the regular libbpf APIs to load
it. Some things get a bit more cumbersome (mostly setting 

Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-28 Thread Daniel P . Berrangé
On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé 
> wrote:
> 
> > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > Added a function to check the stamp in the helper.
> > > eBPF helper should have a special symbol that generates during the build.
> > > QEMU checks the helper and determines that it fits, so the helper
> > > will produce proper output.
> >
> > I think this is quite limiting for in place upgrades.
> >
> > Consider this scenario
> >
> >  * Host has QEMU 8.1.0 installed
> >  * VM is running QEMU 8.1.0
> >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> >  * Host is upgraded to QEMU 8.1.1
> >  * User attempts to hotplug a NIC to the running VM
> >
> > IIUC this last step is going to fail because we'll be loading
> > the EBF program from 8.1.1 and so its hash is different from
> > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > running.
> >
> >   Indeed we did not take in account the in-place upgrade.
> 
> 
> 
> > If some changes to the EBF program are not going to be back
> > compatible from the POV of the QEMU process, should we instead
> > be versioning the EBF program. eg so new QEMU will ship both
> > the old and new versions of the EBF program.
> 
> This does not seem to be an elegant option: QEMU theoretically can include
> different eBPF programs but it hardly can interface with each one of them.
> The code of QEMU (access to eBPF maps etc) includes header files which eBPF
> of the day is being built with them.
> 
> I see 2 options to address this issue (of course there are more)
> 1. Build and install qemu-rss-helper- executable. Libvirt will always
> have a correct name, so for the running instance it will use
> qemu-rss-helper-, for the new instance it will use
> qemu-rss-helper-

We'll get an ever growing number of program variants we need to
build & distribute with each new QEMU release.

> 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> will always retrieve the executable to the temporary file name and use it.
> So the retrieved helper will always be compatible with QEMU. I'm not sure
> what is the most portable way to do that.

QEMU is considered an untrusted process, so there's no way we're going
to ask it to give us an ELF binary and then execute that in privileged
context.

> Does one of these seem suitable?

Neither feels very appealing to me.

I've been trying to understand the eBPF code we're dealing with in a
little more detail.

IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
for the BPF program, and one or more FDs for the BPF maps that the
program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
the program FD.

The helper program that is proposed here calls ebpf_rss_load() to
load the program and get back a struct which gives access to the
4 FDs, which are then sent to the mgmt app, which forwards them
onto QEMU.

The ebpf_rss_load() method is making use of various structs that
are specific to the RSS program implementation, but does not seems
to do anything especially interesting.  It calls into rss_bpf__open()
which eventually gets around to calling rss_bpf__create_skeleton
which is where the interesting stuff happens.

This rss_bpf__create_skeleton() method is implemented in terms of
totally generic libbpf APIs, and has the actual blob that is the
BPF program.

Looking at what this does, I feel it should be trivial for a mgmt
app to implement equivalent logic to rss_bpf__create_skeleton in a
generic manner, if we could just expose the program blob and the
map names to the mgmt app. eg a simple json file

  {
 "maps": [
"tap_rss_map_configurations",
"tap_rss_map_indirection_table",
"tap_rss_map_toeplitz_key",
 ],
 "program": "the big blob encoded in base64..."
  }

if we installed that file are /usr/share/qemu/bpf/net-rss.json
then when a QEMU process is started, the mgmt app capture the
data in this JSON file. It now has enough info to create the
EBPF programs needed and pass the FDs over to QEMU. This would
be robust against QEMU software upgrades, and not tied to the
specific EBPF program imlp. We can add or remove maps / change
their names etc any time, as the details in the JSON descriptor
can be updated.  This avoids need for any special helper program
to be provided by QEMU with the problems that is throwing up
for us.

What am I missing ?  This seems pretty straightforward to
achieve from what I see.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-28 Thread Yuri Benditovich
On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé 
wrote:

> On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > Added a function to check the stamp in the helper.
> > eBPF helper should have a special symbol that generates during the build.
> > QEMU checks the helper and determines that it fits, so the helper
> > will produce proper output.
>
> I think this is quite limiting for in place upgrades.
>
> Consider this scenario
>
>  * Host has QEMU 8.1.0 installed
>  * VM is running QEMU 8.1.0
>  * QEMU 8.1.1 is released with a bug fix in the EBF program
>  * Host is upgraded to QEMU 8.1.1
>  * User attempts to hotplug a NIC to the running VM
>
> IIUC this last step is going to fail because we'll be loading
> the EBF program from 8.1.1 and so its hash is different from
> that expected by the QEMU 8.1.0 that the pre-existing VM is
> running.
>
>   Indeed we did not take in account the in-place upgrade.



> If some changes to the EBF program are not going to be back
> compatible from the POV of the QEMU process, should we instead
> be versioning the EBF program. eg so new QEMU will ship both
> the old and new versions of the EBF program.
>
>
This does not seem to be an elegant option: QEMU theoretically can include
different eBPF programs but it hardly can interface with each one of them.
The code of QEMU (access to eBPF maps etc) includes header files which eBPF
of the day is being built with them.

I see 2 options to address this issue (of course there are more)
1. Build and install qemu-rss-helper- executable. Libvirt will always
have a correct name, so for the running instance it will use
qemu-rss-helper-, for the new instance it will use
qemu-rss-helper-
2. Build the helper executable and link it inside qemu as a blob. Libvirt
will always retrieve the executable to the temporary file name and use it.
So the retrieved helper will always be compatible with QEMU. I'm not sure
what is the most portable way to do that.

Daniel,
Does one of these seem suitable?


> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>


Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-27 Thread Toke Høiland-Jørgensen
Andrew Melnichenko  writes:

> Hi all,
>
> On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé  
> wrote:
>>
>> On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
>> > Added a function to check the stamp in the helper.
>> > eBPF helper should have a special symbol that generates during the build.
>> > QEMU checks the helper and determines that it fits, so the helper
>> > will produce proper output.
>>
>> I think this is quite limiting for in place upgrades.
>>
>> Consider this scenario
>>
>>  * Host has QEMU 8.1.0 installed
>>  * VM is running QEMU 8.1.0
>>  * QEMU 8.1.1 is released with a bug fix in the EBF program
>>  * Host is upgraded to QEMU 8.1.1
>>  * User attempts to hotplug a NIC to the running VM
>>
>> IIUC this last step is going to fail because we'll be loading
>> the EBF program from 8.1.1 and so its hash is different from
>> that expected by the QEMU 8.1.0 that the pre-existing VM is
>> running.
>>
>> If some changes to the EBF program are not going to be back
>> compatible from the POV of the QEMU process, should we instead
>> be versioning the EBF program. eg so new QEMU will ship both
>> the old and new versions of the EBF program.
>>
>
> I think it's too complicated to maintain backward compatibility with
> eBPF programs in "one package".
> As we can see, the eBPF skeleton may be changed not only by bugfix but
> with changes in libbpf(g.e. libbpf 1.0.1+).

Hmm, what change in libbpf 1.0.1 affects the skeleton format?

> This may lead to issues when with a newer environment you can't
> consistently recreate the "old" skeleton.

This should be detectable, though? As in, if you know that a new version
breaks compatibility you just bump the version number regardless of the
underlying application version?

-Toke




Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-26 Thread Andrew Melnichenko
Hi all,

On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé  wrote:
>
> On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > Added a function to check the stamp in the helper.
> > eBPF helper should have a special symbol that generates during the build.
> > QEMU checks the helper and determines that it fits, so the helper
> > will produce proper output.
>
> I think this is quite limiting for in place upgrades.
>
> Consider this scenario
>
>  * Host has QEMU 8.1.0 installed
>  * VM is running QEMU 8.1.0
>  * QEMU 8.1.1 is released with a bug fix in the EBF program
>  * Host is upgraded to QEMU 8.1.1
>  * User attempts to hotplug a NIC to the running VM
>
> IIUC this last step is going to fail because we'll be loading
> the EBF program from 8.1.1 and so its hash is different from
> that expected by the QEMU 8.1.0 that the pre-existing VM is
> running.
>
> If some changes to the EBF program are not going to be back
> compatible from the POV of the QEMU process, should we instead
> be versioning the EBF program. eg so new QEMU will ship both
> the old and new versions of the EBF program.
>

I think it's too complicated to maintain backward compatibility with
eBPF programs in "one package".
As we can see, the eBPF skeleton may be changed not only by bugfix but
with changes in libbpf(g.e. libbpf 1.0.1+).
This may lead to issues when with a newer environment you can't
consistently recreate the "old" skeleton.

To solve the case - we need to "save" the halper from an old package somehow.

>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>



Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-02-20 Thread Daniel P . Berrangé
On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> Added a function to check the stamp in the helper.
> eBPF helper should have a special symbol that generates during the build.
> QEMU checks the helper and determines that it fits, so the helper
> will produce proper output.

I think this is quite limiting for in place upgrades.

Consider this scenario

 * Host has QEMU 8.1.0 installed
 * VM is running QEMU 8.1.0
 * QEMU 8.1.1 is released with a bug fix in the EBF program
 * Host is upgraded to QEMU 8.1.1
 * User attempts to hotplug a NIC to the running VM

IIUC this last step is going to fail because we'll be loading
the EBF program from 8.1.1 and so its hash is different from
that expected by the QEMU 8.1.0 that the pre-existing VM is
running.

If some changes to the EBF program are not going to be back
compatible from the POV of the QEMU process, should we instead
be versioning the EBF program. eg so new QEMU will ship both
the old and new versions of the EBF program. 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH 3/5] qmp: Added the helper stamp check.

2023-02-19 Thread Andrew Melnychenko
Added a function to check the stamp in the helper.
eBPF helper should have a special symbol that generates during the build.
QEMU checks the helper and determines that it fits, so the helper
will produce proper output.

Signed-off-by: Andrew Melnychenko 
---
 meson.build|  10 +
 monitor/meson.build|   1 +
 monitor/qemu-ebpf-rss-helper-stamp-utils.c | 322 +
 monitor/qemu-ebpf-rss-helper-stamp-utils.h |  39 +++
 4 files changed, 372 insertions(+)
 create mode 100644 monitor/qemu-ebpf-rss-helper-stamp-utils.c
 create mode 100644 monitor/qemu-ebpf-rss-helper-stamp-utils.h

diff --git a/meson.build b/meson.build
index a76c855312..b409912aed 100644
--- a/meson.build
+++ b/meson.build
@@ -2868,6 +2868,16 @@ foreach d : hx_headers
 endforeach
 genh += hxdep
 
+ebpf_rss_helper_stamp = custom_target(
+'qemu-ebpf-rss-helper-stamp.h',
+output : 'qemu-ebpf-rss-helper-stamp.h',
+input : 'ebpf/rss.bpf.skeleton.h',
+command : [python, '-c', 'import hashlib; print(\'#define 
QEMU_EBPF_RSS_HELPER_STAMP 
qemuEbpfRssHelperStamp_{}\'.format(hashlib.sha1(open(\'@INPUT@\', 
\'rb\').read()).hexdigest()))'],
+capture: true,
+)
+
+genh += ebpf_rss_helper_stamp
+
 ###
 # Collect sources #
 ###
diff --git a/monitor/meson.build b/monitor/meson.build
index ccb4d1a8e6..36de73414b 100644
--- a/monitor/meson.build
+++ b/monitor/meson.build
@@ -6,6 +6,7 @@ softmmu_ss.add(files(
   'hmp.c',
 ))
 softmmu_ss.add([spice_headers, files('qmp-cmds.c')])
+softmmu_ss.add(files('qemu-ebpf-rss-helper-stamp-utils.c'))
 
 specific_ss.add(when: 'CONFIG_SOFTMMU',
if_true: [files( 'hmp-cmds-target.c', 'hmp-target.c'), spice])
diff --git a/monitor/qemu-ebpf-rss-helper-stamp-utils.c 
b/monitor/qemu-ebpf-rss-helper-stamp-utils.c
new file mode 100644
index 00..23efc36ef0
--- /dev/null
+++ b/monitor/qemu-ebpf-rss-helper-stamp-utils.c
@@ -0,0 +1,322 @@
+/*
+ * QEMU helper stamp check utils.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Description: This file mostly implements helper stamp checking.
+ *  The stamp is implemented in a similar way as in qemu modules.
+ *  The helper should contain a specific symbol.
+ *  Not in a similar way is symbol checking - here we parse
+ *  the ELF file. For now, only eBPF helper contains
+ *  the stamp, and the stamp is generated from
+ *  sha1 ebpf/rss.bpf.skeleton.h (see meson.build).
+ */
+
+#include "qemu/osdep.h"
+#include "elf.h"
+#include "qemu-ebpf-rss-helper-stamp-utils.h"
+
+#include 
+
+#ifdef CONFIG_LINUX
+
+static void *file_allocate_and_read(int fd, off_t off, size_t size)
+{
+void *data;
+int err;
+
+if (fd < 0) {
+return NULL;
+}
+
+err = lseek(fd, off, SEEK_SET);
+if (err < 0) {
+return NULL;
+}
+
+data = g_new0(char, size);
+if (data == NULL) {
+return NULL;
+}
+
+err = read(fd, data, size);
+if (err < 0) {
+g_free(data);
+return NULL;
+}
+
+return data;
+}
+
+static Elf64_Shdr *elf64_get_section_table(int fd, Elf64_Ehdr *elf_header)
+{
+if (elf_header == NULL) {
+return NULL;
+}
+return (Elf64_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff,
+ elf_header->e_shnum * elf_header->e_shentsize);
+}
+
+static Elf32_Shdr *elf32_get_section_table(int fd, Elf32_Ehdr *elf_header)
+{
+if (elf_header == NULL) {
+return NULL;
+}
+return (Elf32_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff,
+ elf_header->e_shnum * elf_header->e_shentsize);
+}
+
+static void *elf64_get_section_data(int fd, const Elf64_Shdr* section_header)
+{
+if (fd < 0 || section_header == NULL) {
+return NULL;
+}
+return file_allocate_and_read(fd, section_header->sh_offset,
+  section_header->sh_size);
+}
+
+static void *elf32_get_section_data(int fd, const Elf32_Shdr* section_header)
+{
+if (fd < 0 || section_header == NULL) {
+return NULL;
+}
+return file_allocate_and_read(fd, section_header->sh_offset,
+  section_header->sh_size);
+}
+
+static bool elf64_check_symbol_in_symbol_table(int fd,
+   Elf64_Shdr *section_table,
+   Elf64_Shdr *symbol_section,
+   const char *symbol)
+{
+Elf64_Sym *symbol_table;
+char *string_table;
+uint32_t i;
+bool ret = false;
+
+symbol_table = (Elf64_Sym *) elf64_get_section_data(fd, symbol_section);
+if (symbol_table == NULL) {
+return 

[PATCH 3/5] qmp: Added the helper stamp check.

2021-07-13 Thread Andrew Melnychenko
Added function to check the stamp in the helper.
eBPF helper should have a special symbol that generates during build.
QEMU checks the helper and determinates that it fits, so the helper
will produce proper output.

Signed-off-by: Andrew Melnychenko 
---
 meson.build   |  10 +
 monitor/meson.build   |   1 +
 monitor/qemu-helper-stamp-utils.c | 297 ++
 monitor/qemu-helper-stamp-utils.h |  24 +++
 4 files changed, 332 insertions(+)
 create mode 100644 monitor/qemu-helper-stamp-utils.c
 create mode 100644 monitor/qemu-helper-stamp-utils.h

diff --git a/meson.build b/meson.build
index 626cf932c1..257e51d91b 100644
--- a/meson.build
+++ b/meson.build
@@ -1757,6 +1757,16 @@ foreach d : hx_headers
 endforeach
 genh += hxdep
 
+helper_stamp = custom_target(
+'qemu-helper-stamp.h',
+output : 'qemu-helper-stamp.h',
+input : 'ebpf/rss.bpf.skeleton.h',
+command : [python, '-c', 'import hashlib; print(\'#define 
QEMU_HELPER_STAMP qemuHelperStamp_{}\'.format(hashlib.sha1(open(\'@INPUT@\', 
\'rb\').read()).hexdigest()))'],
+capture: true,
+)
+
+genh += helper_stamp
+
 ###
 # Collect sources #
 ###
diff --git a/monitor/meson.build b/monitor/meson.build
index 6d00985ace..2b6b39549b 100644
--- a/monitor/meson.build
+++ b/monitor/meson.build
@@ -5,5 +5,6 @@ softmmu_ss.add(files(
   'hmp.c',
 ))
 softmmu_ss.add([spice_headers, files('qmp-cmds.c')])
+softmmu_ss.add(files('qemu-helper-stamp-utils.c'))
 
 specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: [files('misc.c'), spice])
diff --git a/monitor/qemu-helper-stamp-utils.c 
b/monitor/qemu-helper-stamp-utils.c
new file mode 100644
index 00..d34c3b94c5
--- /dev/null
+++ b/monitor/qemu-helper-stamp-utils.c
@@ -0,0 +1,297 @@
+/*
+ * QEMU helper stamp check utils.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Description: This file mostly implements helper stamp checking.
+ *  The stamp is implemented in a similar way as in qemu modules.
+ *  The helper should contain a specific symbol.
+ *  Not in a similar way is symbol checking - here we parse
+ *  the ELF file. For now(10.07.2021), only eBPF helper contains
+ *  the stamp, and the stamp is generated from
+ *  sha1 ebpf/rss.bpf.skeleton.h (see meson.build).
+ */
+
+#include "qemu/osdep.h"
+#include "elf.h"
+#include "qemu-helper-stamp-utils.h"
+
+#include 
+
+#ifdef CONFIG_LINUX
+
+static void *file_allocate_and_read(int fd, off_t off, size_t size)
+{
+void *data;
+int err;
+
+if (fd < 0) {
+return NULL;
+}
+
+err = lseek(fd, off, SEEK_SET);
+if (err < 0) {
+return NULL;
+}
+
+data = g_new0(char, size);
+if (data == NULL) {
+return NULL;
+}
+
+err = read(fd, data, size);
+if (err < 0) {
+g_free(data);
+return NULL;
+}
+
+return data;
+}
+
+static Elf64_Shdr *elf64_get_section_table(int fd, Elf64_Ehdr *elf_header)
+{
+if (elf_header == NULL) {
+return NULL;
+}
+return (Elf64_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff,
+ elf_header->e_shnum * elf_header->e_shentsize);
+}
+
+static Elf32_Shdr *elf32_get_section_table(int fd, Elf32_Ehdr *elf_header)
+{
+if (elf_header == NULL) {
+return NULL;
+}
+return (Elf32_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff,
+ elf_header->e_shnum * elf_header->e_shentsize);
+}
+
+static void *elf64_get_section_data(int fd, const Elf64_Shdr* section_header)
+{
+if (fd < 0 || section_header == NULL) {
+return NULL;
+}
+return file_allocate_and_read(fd, section_header->sh_offset,
+  section_header->sh_size);
+}
+
+static void *elf32_get_section_data(int fd, const Elf32_Shdr* section_header)
+{
+if (fd < 0 || section_header == NULL) {
+return NULL;
+}
+return file_allocate_and_read(fd, section_header->sh_offset,
+  section_header->sh_size);
+}
+
+static bool elf64_check_symbol_in_symbol_table(int fd,
+   Elf64_Shdr *section_table,
+   Elf64_Shdr *symbol_section,
+   const char *symbol)
+{
+Elf64_Sym *symbol_table;
+char *string_table;
+uint32_t i;
+bool ret = false;
+
+symbol_table = (Elf64_Sym *) elf64_get_section_data(fd, symbol_section);
+if (symbol_table == NULL) {
+return false;
+}
+
+string_table = (char *) elf64_get_section_data(
+fd, section_table + symbol_section->sh_link);
+if (string_table == NULL) {
+