interface hotplug q35 machine type
Hello, I see that as part of libvirt's documentation [0] the q35 machine type will feature at most 1 hotplugged PCIe device - default - and users must prepare in advance according to their expectations of how many ifaces will be hotplugged: """ Slots on the pcie-root controller do not support hotplug, so the device will be hotplugged into the pcie-root-port controller. If you plan to hotplug more than a single PCI Express device, you should add a suitable number of pcie-root-port controllers when defining the guest: for example, add ```xml ``` if you expect to hotplug up to three PCI Express devices, either emulated or assigned from the host. """ Is there any alternative to this ? For our use case, I'm considering mimicking Openstack's implementation - [1] - and expose a knob that indicates what is the number of PCIe root ports to be used upon the domain definition. I wonder how open would the community be to having a machine type alias that would provide a "better" default - in the sense that it would have more root port controllers. [0] - https://libvirt.org/pci-hotplug.html#x86_64-q35 [1] - https://blueprints.launchpad.net/nova/+spec/configure-amount-of-pcie-ports
consume existing tap device when libvirt / qemu run as different users
Hello, I'm having some doubts about consuming an existing - already configured - tap device from libvirt (with `managed='no' ` attribute set). In KubeVirt, we want to have the consumer side of the tap device run without the NET_ADMIN capability, which requires the UID / GID of the tap creator / opener to match, as per the kernel code in [0]. As such, we create the tap device (with the qemu user / group on behalf of qemu), which will ultimately be the tap consumer. This leads me to question: why is libvirt opening / calling `ioctl(..., TUNSETIFF, ...) ` on the tap device when it already exists - [1] & [2] ? Why can't the tap device (already configured) be left alone, and let qemu consume it ? The above is problematic for KubeVirt, since our setup currently has libvirt running as root (while qemu runs as a different user), which is preventing us from removing NET_ADMIN (libvirt & qemu run as different users). Thanks in advance for your time, Miguel [0] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/tun.c?id=4ef8451b332662d004df269d4cdeb7d9f31419b5#n574 [1] - https://github.com/libvirt/libvirt/blob/99a1cfc43889c6d425a64013a12b234dde8cff1e/src/qemu/qemu_interface.c#L453 [2] - https://github.com/libvirt/libvirt/blob/v6.0.0/src/util/virnetdevtap.c#L274
Re: consuming pre-created tap - with multiqueue
On Wed, Sep 23, 2020 at 6:59 PM Daniel P. Berrangé wrote: > > On Wed, Sep 23, 2020 at 05:44:28PM +0100, Daniel P. Berrangé wrote: > > On Tue, Sep 22, 2020 at 01:48:08PM +0200, Miguel Duarte de Mora Barroso > > wrote: > > > Hello, > > > > > > On KubeVirt, we are trying to pre-create a tap device, then instruct > > > libvirt to consume it (via the type=ethernet , managed='no' > > > attributes). > > > > > > It works as expected, **unless** when we create a multi-queue tap device. > > > > > > The difference when creating the tap device is that we set the > > > multi-queue flag; libvirt throws the following error when consuming > > > it: > > > > > > ``` > > > LibvirtError(Code=38, Domain=0, Message='Unable to create tap device > > > tap0: Invalid argument') > > > ``` > > > > > > After digging a bit on the libvirt code (we're using libvirt 6.0.0), I > > > see this on the logs (immediately before the error): > > > {"component":"virt-launcher","level":"info","msg":"Enabling > > > IFF_VNET_HDR","pos":"virNetDevProbeVnetHdr:190","subcomponent":"libvirt","thread":"33" > > > ,"timestamp":"2020-09-22T10:34:29.335000Z"} > > > > > > I do not understand how it can try to set the VNET_HDR flag, since I > > > have not set it when I created it, which, as per [0] should only > > > happen when requested. Here's the tap device I'm creating: (output of > > > `ip tuntap show`) > > > - tap0: tap persist0x100 user 107 group 107 > > > > IIUC the kernel code correctly, the VNET_HDR flag is not required > > to be set when you first create the tap device - it appears to be > > permitted any time you open a file descriptor for it. > > > > AFAIK, there is no problem with VNET_HDR, as it is a standard flag > > we've set on all tap devices on Linux for 10 years. > > Looking at the kernel code, you need to set the MULTI_QUEUE flag > at time you create the device and also set it when opening the > device. In tun_set_iff(): > > > if (!!(ifr->ifr_flags & IFF_MULTI_QUEUE) != > !!(tun->flags & IFF_MULTI_QUEUE)) > return -EINVAL; > > > so if you've configured QEMU to use multiqueue, the you need > to use: > > $ ip tuntap add dev mytap mode tap vnet_hdr multi_queue > > actually vnet_hdr doesn't matter as that can be set on the fly > but multi_queue is mandatory. Without it, I get the same EINVAL > error as you mention. > Right, sorry for the noise; I found out one of our tests was abusing the API, which caused the tap device to be created with multi-queue, then when attempting to consume it, we were requesting a single vcpu. My bad for not spotting this sooner. Thanks for the reply. > > Regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| >
consuming pre-created tap - with multiqueue
Hello, On KubeVirt, we are trying to pre-create a tap device, then instruct libvirt to consume it (via the type=ethernet , managed='no' attributes). It works as expected, **unless** when we create a multi-queue tap device. The difference when creating the tap device is that we set the multi-queue flag; libvirt throws the following error when consuming it: ``` LibvirtError(Code=38, Domain=0, Message='Unable to create tap device tap0: Invalid argument') ``` After digging a bit on the libvirt code (we're using libvirt 6.0.0), I see this on the logs (immediately before the error): {"component":"virt-launcher","level":"info","msg":"Enabling IFF_VNET_HDR","pos":"virNetDevProbeVnetHdr:190","subcomponent":"libvirt","thread":"33" ,"timestamp":"2020-09-22T10:34:29.335000Z"} I do not understand how it can try to set the VNET_HDR flag, since I have not set it when I created it, which, as per [0] should only happen when requested. Here's the tap device I'm creating: (output of `ip tuntap show`) - tap0: tap persist0x100 user 107 group 107 I'm confused by this, since on [1] it says that only one flag can be used (persist *or* vnet_hdr). Also, from my limited understanding, in order to open a pre-created tap device, it must be set to persistent. Am I right (in the sense that these flags cannot co-exist) ? If so, why would libvirt be setting the VNET_HDR flag on the tap device ? [0] - https://github.com/libvirt/libvirt/blob/v6.0.0/src/util/virnetdevtap.c#L261 [1] - https://github.com/libvirt/libvirt/blob/v6.0.0/src/util/virnetdevtap.c#L209
Re: plug pre-created tap devices to libvirt guests
On Tue, Jun 30, 2020 at 12:59 PM Miguel Duarte de Mora Barroso wrote: > > On Mon, Apr 6, 2020 at 4:03 PM Laine Stump wrote: > > > > On 4/6/20 9:54 AM, Daniel P. Berrangé wrote: > > > On Mon, Apr 06, 2020 at 03:47:01PM +0200, Miguel Duarte de Mora Barroso > > > wrote: > > >> Hi all, > > >> > > >> I'm aware that it is possible to plug pre-created macvtap devices to > > >> libvirt guests - tracked in RFE [0]. > > >> > > >> My interpretation of the wording in [1] and [2] is that it is also > > >> possible to plug pre-created tap devices into libvirt guests - that > > >> would be a requirement to allow kubevirt to run with less capabilities > > >> in the pods that encapsulate the VMs. > > >> > > >> I took a look at the libvirt code ([3] & [4]), and, from my limited > > >> understanding, I got the impression that plugging existing interfaces > > >> via `managed='no' ` is only possible for macvtap interfaces. > > > > > > No, it works for standard tap devices as well. > > > > > > The reason the BZs and commit logs talk mostly about macvtap rather than > > tap is because 1) that's what kubevirt people had asked for and 2) it > > already *mostly* worked for tap devices, so most of the work was related > > to macvtap (my memory is already fuzzy, but I think there were a couple > > privileged operations we still tried to do for standard tap devices even > > if they were precreated (standard disclaimer: I often misremember, so > > this memory could be wrong! But definitely precreated tap devices do work). > > > > It's been a while since I've started this thread, but lately I've > understood better how tap devices work, and that new insight makes me > wonder about a couple of things. > > Our ultimate goal In kubevirt is to consume a pre-created tap device > by a kubernetes pod that doesn't have the NET_ADMIN capability. > > After looking at the current libvirt code, I don't think that is > currently supported, since we'll *always* enter the > `virNetDevTapCreate` function in [1] (I'm interested in the *tap* > scenario). > > The tap device is effectively created in that function - [2] - by > opening the clone device (/dev/net/tun), and calling `ioctl(fd, > TUNSETIFF,...)` in it. AFAIK, both of those operations *require* the > NET_ADMIN capability. If I'm correct, this means that the current > libvirt implementation makes our goals impossible to achieve. > > I'd first like to know if I read the code correctly, and if I'm > ultimately right - i.e. since libvirt's implementation for consuming a > pre-existent tap device opens the clone device, where ever libvirt is > run on will *always* require the NET_ADMIN capability. Adding the links that I've forgotten to add ... [0] - https://bugzilla.redhat.com/show_bug.cgi?id=1723367 [1] - https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L443 [2] - https://github.com/libvirt/libvirt/blob/master/src/util/virnetdevtap.c#L243 > > > > > > I think though that when someone from kubevirt actually tried using a > > precreated macvtap device, they found that their precreated device > > wasn't visible at all to the unprivileged libvirtd in the pod, because > > it was in a different network namespace, or something like that. So > > there may still be more work to do (or, again, my info might be out of > > date and they figured out a proper solution). > > > > > > >> > > >> Would you be able to shed some light into this ? Is it possible on > > >> libvirt-5.6.0 to plug pre-created tap devices to libvirt guests ? > > >> > > >> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=1723367 > > > This links to the following message, which illustrates how to use > > > pre-create > > > tap and macvtap devices: > > > > > >https://www.redhat.com/archives/libvir-list/2019-August/msg01256.html > > > > > > Laine: it would be useful to add something like this short guide to the > > > knowledge base docs > > > > > > You mean the wiki? Sure, I can do that. > > > > > > (BTW - that was admirable reading / searching / responding - 7 minutes > > and it wasn't even your patch! How do you do that? :-)) > > > > > > On Mon, Apr 6, 2020 at 4:03 PM Laine Stump wrote: > > > > On 4/6/20 9:54 AM, Daniel P. Berrangé wrote: > > > On Mon, Apr 06, 2020 at 03:47:01PM +0200, Miguel Duarte de Mora Barroso > > > wrote: > > >> Hi all, > > >> > > >
Re: plug pre-created tap devices to libvirt guests
On Mon, Apr 6, 2020 at 4:03 PM Laine Stump wrote: > > On 4/6/20 9:54 AM, Daniel P. Berrangé wrote: > > On Mon, Apr 06, 2020 at 03:47:01PM +0200, Miguel Duarte de Mora Barroso > > wrote: > >> Hi all, > >> > >> I'm aware that it is possible to plug pre-created macvtap devices to > >> libvirt guests - tracked in RFE [0]. > >> > >> My interpretation of the wording in [1] and [2] is that it is also > >> possible to plug pre-created tap devices into libvirt guests - that > >> would be a requirement to allow kubevirt to run with less capabilities > >> in the pods that encapsulate the VMs. > >> > >> I took a look at the libvirt code ([3] & [4]), and, from my limited > >> understanding, I got the impression that plugging existing interfaces > >> via `managed='no' ` is only possible for macvtap interfaces. > > > No, it works for standard tap devices as well. > > > The reason the BZs and commit logs talk mostly about macvtap rather than > tap is because 1) that's what kubevirt people had asked for and 2) it > already *mostly* worked for tap devices, so most of the work was related > to macvtap (my memory is already fuzzy, but I think there were a couple > privileged operations we still tried to do for standard tap devices even > if they were precreated (standard disclaimer: I often misremember, so > this memory could be wrong! But definitely precreated tap devices do work). > It's been a while since I've started this thread, but lately I've understood better how tap devices work, and that new insight makes me wonder about a couple of things. Our ultimate goal In kubevirt is to consume a pre-created tap device by a kubernetes pod that doesn't have the NET_ADMIN capability. After looking at the current libvirt code, I don't think that is currently supported, since we'll *always* enter the `virNetDevTapCreate` function in [1] (I'm interested in the *tap* scenario). The tap device is effectively created in that function - [2] - by opening the clone device (/dev/net/tun), and calling `ioctl(fd, TUNSETIFF,...)` in it. AFAIK, both of those operations *require* the NET_ADMIN capability. If I'm correct, this means that the current libvirt implementation makes our goals impossible to achieve. I'd first like to know if I read the code correctly, and if I'm ultimately right - i.e. since libvirt's implementation for consuming a pre-existent tap device opens the clone device, where ever libvirt is run on will *always* require the NET_ADMIN capability. > > I think though that when someone from kubevirt actually tried using a > precreated macvtap device, they found that their precreated device > wasn't visible at all to the unprivileged libvirtd in the pod, because > it was in a different network namespace, or something like that. So > there may still be more work to do (or, again, my info might be out of > date and they figured out a proper solution). > > > >> > >> Would you be able to shed some light into this ? Is it possible on > >> libvirt-5.6.0 to plug pre-created tap devices to libvirt guests ? > >> > >> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=1723367 > > This links to the following message, which illustrates how to use pre-create > > tap and macvtap devices: > > > >https://www.redhat.com/archives/libvir-list/2019-August/msg01256.html > > > > Laine: it would be useful to add something like this short guide to the > > knowledge base docs > > > You mean the wiki? Sure, I can do that. > > > (BTW - that was admirable reading / searching / responding - 7 minutes > and it wasn't even your patch! How do you do that? :-)) > > On Mon, Apr 6, 2020 at 4:03 PM Laine Stump wrote: > > On 4/6/20 9:54 AM, Daniel P. Berrangé wrote: > > On Mon, Apr 06, 2020 at 03:47:01PM +0200, Miguel Duarte de Mora Barroso > > wrote: > >> Hi all, > >> > >> I'm aware that it is possible to plug pre-created macvtap devices to > >> libvirt guests - tracked in RFE [0]. > >> > >> My interpretation of the wording in [1] and [2] is that it is also > >> possible to plug pre-created tap devices into libvirt guests - that > >> would be a requirement to allow kubevirt to run with less capabilities > >> in the pods that encapsulate the VMs. > >> > >> I took a look at the libvirt code ([3] & [4]), and, from my limited > >> understanding, I got the impression that plugging existing interfaces > >> via `managed='no' ` is only possible for macvtap interfaces. > > > No, it works for standard tap devices as well. > > > The reason the BZs and commit logs talk mostly about macvtap ra
Re: sync guest time
On Thu, Apr 30, 2020 at 2:15 PM Daniel P. Berrangé wrote: > > On Thu, Apr 30, 2020 at 01:52:12PM +0200, Miguel Duarte de Mora Barroso wrote: > > Hi, > > > > I'm seeing the following issue when attempting to update the guest's > > clock on a running fc32 guest (using guest agent): > > > > ``` > > [root@virt-launcher-vmi-masquerade-mh2xm /]# virsh domtime 1 --pretty > > Time: 2020-04-30 23:27:29 > > [root@virt-launcher-vmi-masquerade-mh2xm /]# virsh domtime 1 --sync > > error: internal error: unable to execute QEMU agent command > > 'guest-set-time': hwclock failed to set hardware clock to system time > > This error is ultimately coming from the QEMU guest agent inside > your guest. It spawns "hwclock" and this is failing for some reason. > You'll probably need to debug this inside the guest - strace the > QEMU guest agent, see where it fails, and then file a bug against > the distro for it. Eventually I found out that if I make the call *without* specifying the `libvirt.DOMAIN_TIME_SYNC` flag this works as I intend. I've read the docs and could not understand what's the purpose of this flag . It reads "Re-sync domain time from domain's RTC" on [0]. It begs the question: if I'm setting it to a fixed instant in time (which I am), why would I want it to sync with the domain's RTC ? Is there any obvious issue that will appear from calling `virDomainSetTime` (defined at [1]) without the DOMAIN_TIME_SYNC flag specified ? I'm not sure if this (removing the DOMAIN_TIME_SYNC) is a fix, an ugly hack, or a disaster waiting to happen. > > > # now, this one works. > > [root@virt-launcher-vmi-masquerade-mh2xm /]# virsh domtime 1 --now > > > > [root@virt-launcher-vmi-masquerade-44v2x /]# virsh domtime 1 --pretty > > Time: 2020-04-30 11:15:45 > > This doesn't run hwclock as its merely reading the current time,. > > > Is there any workaround I could try ? Am I doing something wrong here ? > > I don't think you're doing anything wrong. This just looks like a guest > OS bug to me. Thanks for the prompt reply! [0] - https://github.com/libvirt/libvirt/blob/bef10f6eaa93db649c36468143ce6556444a2e25/include/libvirt/libvirt-domain.h#L4712 [1] - https://github.com/libvirt/libvirt/blob/master/src/libvirt-domain.c#L11292 > > Regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| >
sync guest time
Hi, I'm seeing the following issue when attempting to update the guest's clock on a running fc32 guest (using guest agent): ``` [root@virt-launcher-vmi-masquerade-mh2xm /]# virsh domtime 1 --pretty Time: 2020-04-30 23:27:29 [root@virt-launcher-vmi-masquerade-mh2xm /]# virsh domtime 1 --sync error: internal error: unable to execute QEMU agent command 'guest-set-time': hwclock failed to set hardware clock to system time # now, this one works. [root@virt-launcher-vmi-masquerade-mh2xm /]# virsh domtime 1 --now [root@virt-launcher-vmi-masquerade-44v2x /]# virsh domtime 1 --pretty Time: 2020-04-30 11:15:45 ``` This is the simplest reproducer I could come with; the original issue is this call to libvirt's `setTIme` in [0]. Is there any workaround I could try ? Am I doing something wrong here ? [0] - https://github.com/kubevirt/kubevirt/blob/6bb516148ce4c29825ae74f473e0220753c3767d/pkg/virt-launcher/virtwrap/manager.go#L533
plug pre-created tap devices to libvirt guests
Hi all, I'm aware that it is possible to plug pre-created macvtap devices to libvirt guests - tracked in RFE [0]. My interpretation of the wording in [1] and [2] is that it is also possible to plug pre-created tap devices into libvirt guests - that would be a requirement to allow kubevirt to run with less capabilities in the pods that encapsulate the VMs. I took a look at the libvirt code ([3] & [4]), and, from my limited understanding, I got the impression that plugging existing interfaces via `managed='no' ` is only possible for macvtap interfaces. Would you be able to shed some light into this ? Is it possible on libvirt-5.6.0 to plug pre-created tap devices to libvirt guests ? [0] - https://bugzilla.redhat.com/show_bug.cgi?id=1723367 [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1723367#c2 [2] - https://bugzilla.redhat.com/show_bug.cgi?id=1723367#c3 [3] - https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L434 [4] - https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L443
Re: Create VM w/ cache=none on tmpfs
On Fri, Mar 27, 2020 at 1:12 PM Daniel P. Berrangé wrote: > > On Fri, Mar 27, 2020 at 12:31:07PM +0100, Miguel Duarte de Mora Barroso wrote: > > Hi, > > > > I've seen that in the past, libvirt couldn't start VMs when the disk > > image was stored on a file system that doesn't support direct I/O > > having the 'cache=none' configuration [0]. > > > > On the KubeVirt project, we have some storage tests on a particular > > provider which does just that - try to create / start a VM whose disk > > is on tmpfs and whose definition features 'cache=none'. > > > > The behavior we're seeing is that libvirt throws this warning: > > ``` > > Unexpected Warning event received: > > testvmig4zsxc2f8swkxv22xkhx2vrb4ppqbfdfgfgqh5gq8plqzrv5,853ff3d9-70d4-43c5-b9ff-4d5815ea557d: > > server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, > > Message='internal error: process exited while connecting to monitor: > > 2020-03-25T10:09:21.656238Z qemu-kvm: -drive > > file=/var/run/kubevirt-ephemeral-disks/disk-data/disk0/disk.qcow2,format=qcow2,if=none,id=drive-ua-disk0,cache=none: > > file system may not support O_DIRECT\n2020-03-25T10:09:21.656391Z > > qemu-kvm: -drive > > file=/var/run/kubevirt-ephemeral-disks/disk-data/disk0/disk.qcow2,format=qcow2,if=none,id=drive-ua-disk0,cache=none: > > Could not open backing file: Could not open > > '/var/run/kubevirt-private/vmi-disks/disk0/disk.img': Invalid > > argument')" > > ``` > > > > But actually proceeds, and is able to start the VM - but seems it > > coerces the cache value to writeThrough. > > Are you sure it is really continuing to start the VM - those log messages > strongly suggest that is not the case. Actually no, you're right; it fails, and does not proceed; kubevirt itself re-queues the request, this time without specifying that attribute (I have yet to figure out why / where). Two different requests are clearly seen in the libvirt logs in the pod that encapsulates libvirt. Thanks for the reply. > >"internal error: process exited while connecting to monitor:" > > means libvirt lost its connection to the QMP monitor, which means QEMU > has shutdown. > > Similarly the error message about being unable to open the disk is > something that QEMU treats as a fatal error AFAIK. > > > Is this the expected behavior ? e.g. cache = none can't be used when > > the disk images are on a tmpfs file system ? I know it was, not sure > > about now (libvirt-5.6.0-7) ... > > Yes, cache=none is incompatible with tmpfs, so you'll need to pick > a different cache setting. > > Regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| >
Create VM w/ cache=none on tmpfs
Hi, I've seen that in the past, libvirt couldn't start VMs when the disk image was stored on a file system that doesn't support direct I/O having the 'cache=none' configuration [0]. On the KubeVirt project, we have some storage tests on a particular provider which does just that - try to create / start a VM whose disk is on tmpfs and whose definition features 'cache=none'. The behavior we're seeing is that libvirt throws this warning: ``` Unexpected Warning event received: testvmig4zsxc2f8swkxv22xkhx2vrb4ppqbfdfgfgqh5gq8plqzrv5,853ff3d9-70d4-43c5-b9ff-4d5815ea557d: server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-03-25T10:09:21.656238Z qemu-kvm: -drive file=/var/run/kubevirt-ephemeral-disks/disk-data/disk0/disk.qcow2,format=qcow2,if=none,id=drive-ua-disk0,cache=none: file system may not support O_DIRECT\n2020-03-25T10:09:21.656391Z qemu-kvm: -drive file=/var/run/kubevirt-ephemeral-disks/disk-data/disk0/disk.qcow2,format=qcow2,if=none,id=drive-ua-disk0,cache=none: Could not open backing file: Could not open '/var/run/kubevirt-private/vmi-disks/disk0/disk.img': Invalid argument')" ``` But actually proceeds, and is able to start the VM - but seems it coerces the cache value to writeThrough. Is this the expected behavior ? e.g. cache = none can't be used when the disk images are on a tmpfs file system ? I know it was, not sure about now (libvirt-5.6.0-7) ... [0] - https://bugs.launchpad.net/nova/+bug/959637