[virtio-dev] RE: [PATCH] pci-iov: Add support for unmanaged SR-IOV
> From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Saturday, March 3, 2018 2:14 AM > > On Fri, 2 Mar 2018 06:54:17 + > "Tian, Kevin" wrote: > > > > From: Alex Williamson > > > Sent: Friday, March 2, 2018 4:22 AM > > > > > > > > I am pretty sure that you are describing is true of some, but not for > > > > all. I think the Amazon solutions and the virtio solution are doing > > > > hard partitioning of the part. I will leave it to those guys to speak > > > > for themselves since I don't know anything about the hardware design > > > > of those parts. > > > > > > I think we'd need device specific knowledge and enablement to be able > > > to take advantage of any hardware partitioning, otherwise we need to > > > assume the pf is privileged, as implemented in other sriov devices. > > > > > > I'm also trying to imagine whether there's a solution via the new > > > vfio/mdev interface, where the mdev vendor driver would bind to the > pf > > > and effectively present itself as the mdev device. The vendor driver > > > could provide sriov capabilities and bear the burden of ensuring that > > > the pf is used cooperatively. The only existing mdev vendor drivers are > > > vGPUs and rely on on-device DMA translation and isolation, such as > > > through GTTs, but there have been some thoughts on providing IOMMU > > > based > > > isolation of mdev/sriov mixed devices (assuming DMA is even required > > > for userspace management of the pf in this use case). [Cc Kirti] > > > Thanks, > > > > > > > Hope not distracting this thread, but above sounds like an interesting > > idea. Actually we ever brainstormed similar thought for another > > potential usage - supporting VF live migration. We are already working > > on an generic extension to allow state save/restore of mdev instance. > > If vendor driver could further wrap pf as a mdev instance, it could > > leverage the same framework for a clean state migration on VF. based > > on mmap callback the vendor driver can easily switch back-and-forth > > between pass through and trap/emulation of the VF resources. Of > > course doing so alone doesn't address all the demands of VF live > > migration (e.g. dirty page tracking still requires other techniques), > > but it does pave a way toward a general framework to support VF > > live migration. > > > > If above is feasible, finally we could use one mdev framework to > > manage both mdev and pf/vf assignment, while providing added > > values which are difficult to achieve today. :-) > > mdev drivers may be the first to support migration, but I wonder if a > full mdev implementation is necessary for it. Once the vfio api is > define, device specific extensions to vfio-pci might be able to > implement migration more directly. Thanks, > > Alex yes technically a full mdev implementation is not necessary. If device specific extensions will be placed within vfio module, it's obviously straightforward. What I thought earlier was in case vfio wants to stay device-agnostic then we probably want device specific extensions in vendor driver which is loaded but in a dummy mode which simply do basic PCI initialization as vfio-pci and then wrap vf as mdev (since vfio-pci is not the vf driver in this scenario). It's especially useful for vendor drivers which aim to support both mdev and sr-iov by sharing common state mgmt. knowledge, but looks an overkill to other vendor drivers. Possibly finally we'll allow both - simple device extensions in vfio-pci and complex device extensions in vendor drivers through vfio-mdev. Thanks Kevin - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 1/3] pci-iov: Add support for unmanaged SR-IOV
On Fri, Mar 2, 2018 at 3:59 PM, Alex Williamson wrote: > On Fri, 02 Mar 2018 15:44:25 -0800 > Alexander Duyck wrote: > >> From: Alexander Duyck >> >> This patch is meant to add some basic functionality to support for SR-IOV >> on devices when the VFs are not managed by the kernel. The functions >> provided here can be used by drivers such as vfio-pci and virtio to enable >> SR-IOV on devices that are either managed by userspace, or by some sort of >> firmware entity respectively. >> >> A new sysfs value called sriov_unmanaged_autoprobe has been added. This >> value is used as the drivers_autoprobe setting of the VFs when they are >> being managed by an external entity such as userspace or device firmware >> instead of being managed by the kernel. >> >> One side effect of this change is that the sriov_drivers_autoprobe and >> sriov_unmanaged_autoprobe will only apply their updates when SR-IOV is >> disabled. Attempts to update them when SR-IOV is in use will only update >> the local value and will not update sriov->autoprobe. >> >> Signed-off-by: Alexander Duyck >> --- >> Documentation/ABI/testing/sysfs-bus-pci | 17 ++ >> drivers/pci/iov.c | 37 >> +++ >> drivers/pci/pci-driver.c|2 +- >> drivers/pci/pci-sysfs.c | 29 >> drivers/pci/pci.h |4 +++ >> include/linux/pci.h |1 + >> 6 files changed, 88 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/ABI/testing/sysfs-bus-pci >> b/Documentation/ABI/testing/sysfs-bus-pci >> index 44d4b2be92fd..ff0b6c19cb1a 100644 >> --- a/Documentation/ABI/testing/sysfs-bus-pci >> +++ b/Documentation/ABI/testing/sysfs-bus-pci >> @@ -323,3 +323,20 @@ Description: >> >> This is similar to /sys/bus/pci/drivers_autoprobe, but >> affects only the VFs associated with a specific PF. >> + >> +What:/sys/bus/pci/devices/.../sriov_unmanaged_autoprobe >> +Date:March 2018 >> +Contact: Alexander Duyck >> +Description: >> + This file is associated with the PF of a device that >> + supports SR-IOV. It determines whether newly-enabled VFs >> + are immediately bound to a driver when the PF driver does >> + not manage the VFs itself. It initially contains 0, which >> + means the kernel will not automatically bind VFs to a driver. >> + If an application writes 1 to the file before enabling VFs, >> + the kernel will bind VFs to a compatible driver immediately >> + after they are enabled. >> + >> + This overrides /sys/bus/pci/devices/.../sriov_drivers_autoprobe >> + when a PF driver is not present to manage a device, or the PF >> + driver does not provide functionality to support SR-IOV. > > > Given a pf, how does a user determine whether it is managed or unmanaged > and therefore which autoprobe attributes are in effect? Thanks, > > Alex Basically it comes down to what driver is loaded on it. For now vfio-pci and virtio would be the only two using the "unmanaged" version of things. Really you don't know which autoprobe is in effect until SR-IOV is enabled by whatever driver. As such you should really be setting both the drivers_autoprobe and the unmanaged_autoprobe based on the expected use case. - Alex - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 2, 2018 at 3:12 PM, Samudrala, Sridhar wrote: > On 3/2/2018 1:11 PM, Siwei Liu wrote: >> >> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala >> wrote: >>> >>> This patch enables virtio_net to switch over to a VF datapath when a VF >>> netdev is present with the same MAC address. It allows live migration >>> of a VM with a direct attached VF without the need to setup a bond/team >>> between a VF and virtio net device in the guest. >>> >>> The hypervisor needs to enable only one datapath at any time so that >>> packets don't get looped back to the VM over the other datapath. When a >>> VF >>> is plugged, the virtio datapath link state can be marked as down. The >>> hypervisor needs to unplug the VF device from the guest on the source >>> host >>> and reset the MAC filter of the VF to initiate failover of datapath to >>> virtio before starting the migration. After the migration is completed, >>> the destination hypervisor sets the MAC filter on the VF and plugs it >>> back >>> to the guest to switch over to VF datapath. >>> >>> When BACKUP feature is enabled, an additional netdev(bypass netdev) is >>> created that acts as a master device and tracks the state of the 2 lower >>> netdevs. The original virtio_net netdev is marked as 'backup' netdev and >>> a >>> passthru device with the same MAC is registered as 'active' netdev. >>> >>> This patch is based on the discussion initiated by Jesse on this thread. >>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 >>> >>> Signed-off-by: Sridhar Samudrala >>> Signed-off-by: Alexander Duyck >>> Reviewed-by: Jesse Brandeburg >>> --- >>> drivers/net/virtio_net.c | 683 >>> ++- >>> 1 file changed, 682 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>> index bcd13fe906ca..f2860d86c952 100644 >>> --- a/drivers/net/virtio_net.c >>> +++ b/drivers/net/virtio_net.c >>> @@ -30,6 +30,8 @@ >>> #include >>> #include >>> #include >>> +#include >>> +#include >>> #include >>> #include >>> >>> @@ -206,6 +208,9 @@ struct virtnet_info { >>> u32 speed; >>> >>> unsigned long guest_offloads; >>> + >>> + /* upper netdev created when BACKUP feature enabled */ >>> + struct net_device *bypass_netdev; >>> }; >>> >>> struct padded_vnet_hdr { >>> @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, >>> struct netdev_bpf *xdp) >>> } >>> } >>> >>> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf, >>> + size_t len) >>> +{ >>> + struct virtnet_info *vi = netdev_priv(dev); >>> + int ret; >>> + >>> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP)) >>> + return -EOPNOTSUPP; >>> + >>> + ret = snprintf(buf, len, "_bkup"); >>> + if (ret >= len) >>> + return -EOPNOTSUPP; >>> + >>> + return 0; >>> +} >>> + >> >> What if the systemd/udevd is not new enough to enforce the >> n naming? Would virtio_bypass get a different name >> than the original virtio_net? Should we detect this earlier and fall >> back to legacy mode without creating the bypass netdev and ensalving >> the VF? > > > If udev doesn't support renaming of the devices, then the upper bypass > device > should get the original name and the lower virtio netdev will get the next > name. If you got two virtio-net's (say e.g. eth0 and eth1) before the update, the virtio-bypass interface on the first virtio gets eth0 and the backup netdev would get eth1? Then the IP address originally on eth1 gets configurd on the backup interface? > Hopefully the distros updating the kernel will also move to the new > systemd/udev. This is not reliable. I'd opt for a new udev API for this, and fall back to what it was if don't see fit. > > > >> >>> static const struct net_device_ops virtnet_netdev = { >>> .ndo_open= virtnet_open, >>> .ndo_stop= virtnet_close, >>> @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = >>> { >>> .ndo_xdp_xmit = virtnet_xdp_xmit, >>> .ndo_xdp_flush = virtnet_xdp_flush, >>> .ndo_features_check = passthru_features_check, >>> + .ndo_get_phys_port_name = virtnet_get_phys_port_name, >>> }; >>> >>> static void virtnet_config_changed_work(struct work_struct *work) >>> @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device >>> *vdev) >>> return 0; >>> } >>> >>> +/* START of functions supporting VIRTIO_NET_F_BACKUP feature. >>> + * When BACKUP feature is enabled, an additional netdev(bypass netdev) >>> + * is created that acts as a master device and tracks the state of the >>> + * 2 lower netdevs. The original virtio_net netdev is registered as >>> + * 'backup' netdev and a passthru device with the same MAC is registered >>> + * as 'active' netdev. >>> + *
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin wrote: > On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote: >> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala >> wrote: >> > This patch enables virtio_net to switch over to a VF datapath when a VF >> > netdev is present with the same MAC address. It allows live migration >> > of a VM with a direct attached VF without the need to setup a bond/team >> > between a VF and virtio net device in the guest. >> > >> > The hypervisor needs to enable only one datapath at any time so that >> > packets don't get looped back to the VM over the other datapath. When a VF >> > is plugged, the virtio datapath link state can be marked as down. The >> > hypervisor needs to unplug the VF device from the guest on the source host >> > and reset the MAC filter of the VF to initiate failover of datapath to >> > virtio before starting the migration. After the migration is completed, >> > the destination hypervisor sets the MAC filter on the VF and plugs it back >> > to the guest to switch over to VF datapath. >> > >> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is >> > created that acts as a master device and tracks the state of the 2 lower >> > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a >> > passthru device with the same MAC is registered as 'active' netdev. >> > >> > This patch is based on the discussion initiated by Jesse on this thread. >> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 >> > >> > Signed-off-by: Sridhar Samudrala >> > Signed-off-by: Alexander Duyck >> > Reviewed-by: Jesse Brandeburg >> > --- >> > drivers/net/virtio_net.c | 683 >> > ++- >> > 1 file changed, 682 insertions(+), 1 deletion(-) >> > >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> > index bcd13fe906ca..f2860d86c952 100644 >> > --- a/drivers/net/virtio_net.c >> > +++ b/drivers/net/virtio_net.c >> > @@ -30,6 +30,8 @@ >> > #include >> > #include >> > #include >> > +#include >> > +#include >> > #include >> > #include >> > >> > @@ -206,6 +208,9 @@ struct virtnet_info { >> > u32 speed; >> > >> > unsigned long guest_offloads; >> > + >> > + /* upper netdev created when BACKUP feature enabled */ >> > + struct net_device *bypass_netdev; >> > }; >> > >> > struct padded_vnet_hdr { >> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, >> > struct netdev_bpf *xdp) >> > } >> > } >> > >> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf, >> > + size_t len) >> > +{ >> > + struct virtnet_info *vi = netdev_priv(dev); >> > + int ret; >> > + >> > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP)) >> > + return -EOPNOTSUPP; >> > + >> > + ret = snprintf(buf, len, "_bkup"); >> > + if (ret >= len) >> > + return -EOPNOTSUPP; >> > + >> > + return 0; >> > +} >> > + >> >> What if the systemd/udevd is not new enough to enforce the >> n naming? Would virtio_bypass get a different name >> than the original virtio_net? > > You mean people using ethX names? Any hardware config change breaks > these, I don't think that can be helped. I don't like the way to rely on .ndo_get_phys_port_name - it's fragile and it does not completely solve the problem it tries to address. Imagine what can end up with if getting an old udevd, or users already have exsiting explicit udev rules around phys_port_name. It does not give you the an ack in saying "yes, I know you're the bypass and you're the backup, please continue and I will give you both correct names", or an unacknowlegment saying "no, I don't know what these extra interfaces are, please go back and leave the VF device alone". We need new udev API for both feature negotiation and naming, or may even completely hide the lower interfaces. > >> Should we detect this earlier and fall >> back to legacy mode without creating the bypass netdev and ensalving >> the VF? > > I don't think we can do this with existing kernel/userspace APIs. That's why I ever said to make udev aware of this new type of combined device instead of doing hacks here and there around. Regards, -Siwei > > -- > MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH 3/3] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Alexander Duyck Hardware-realized virtio_pci devices can implement SR-IOV, so this patch enables its use. The device in question is an upcoming Intel NIC that implements both a virtio_net PF and virtio_net VFs. These are hardware realizations of what has been up to now been a software interface. The device in question has the following 4-part PCI IDs: PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe The patch currently needs no check for device ID, because the callback will never be made for devices that do not assert the capability or when run on a platform incapable of SR-IOV. One reason for this patch is because the hardware requires the vendor ID of a VF to be the same as the vendor ID of the PF that created it. So it seemed logical to simply have a fully-functioning virtio_net PF create the VFs. This patch makes that possible. Signed-off-by: Mark Rustad Signed-off-by: Alexander Duyck --- drivers/virtio/virtio_pci_common.c |4 1 file changed, 4 insertions(+) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 48d4d1cf1cb6..ca1549393255 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -584,6 +584,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) else virtio_pci_modern_remove(vp_dev); + pci_disable_sriov(pci_dev); pci_disable_device(pci_dev); put_device(dev); } @@ -596,6 +597,9 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) #ifdef CONFIG_PM_SLEEP .driver.pm = &virtio_pci_pm_ops, #endif +#ifdef CONFIG_PCI_IOV + .sriov_configure = pci_sriov_configure_unmanaged, +#endif }; module_pci_driver(virtio_pci_driver); - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH 2/3] vfio: Add support for unmanaged or userspace managed SR-IOV
From: Alexander Duyck This patch is meant to allow assignment of an SR-IOV enabled PF, as in VFs have been generated, with vfio-pci. My understanding is the primary use case for this is something like DPDK running the PF while the VFs are all assigned to guests. A secondary effect of this is that it provides an interface through which it would be possible to enable SR-IOV on drivers that may not have a physical function that actually manages the device. Enabling SR-IOV should be pretty straight forward. As long as there are no userspace processes currently controlling the interface the number of VFs can be changed, and VFs will be generated without drivers being loaded on the host. Once the userspace process begins controlling the interface the number of VFs cannot be updated via the sysfs until the control is released. Note the VFs will have drivers load on them in the host if the sriov_unmanaged_autoprobe is updated to a value of 1. However the behavior of the VFs in such a setup cannot be guaranteed as the PF will not be available until the userspace process starts and begins to manage the device. For now I am leaving the value as locked when the PF is being controlled from userspace as a form of synchronization. Basically this way we cannot have the number of VFs change out from under the process so it should not require any notification framework, and the configuration can just be read out via configuration space accesses. Signed-off-by: Alexander Duyck --- drivers/vfio/pci/vfio_pci.c | 57 +++ 1 file changed, 57 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index b0f759476900..3023bda39aa9 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1224,6 +1224,8 @@ static void vfio_pci_remove(struct pci_dev *pdev) VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); } + pci_disable_sriov(pdev); + if (!disable_idle_d3) pci_set_power_state(pdev, PCI_D0); } @@ -1260,12 +1262,67 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, .error_detected = vfio_pci_aer_err_detected, }; +static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn) +{ + struct vfio_pci_device *vdev; + struct vfio_device *device; + int err; + + device = vfio_device_get_from_dev(&pdev->dev); + if (device == NULL) + return -ENODEV; + + vdev = vfio_device_data(device); + if (vdev == NULL) { + vfio_device_put(device); + return -ENODEV; + } + + /* +* If a userspace process is already using this device just return +* busy and don't allow for any changes. +*/ + if (vdev->refcnt) { + pci_warn(pdev, +"PF is currently in use, blocked until released by user\n"); + return -EBUSY; + } + + err = pci_sriov_configure_unmanaged(pdev, nr_virtfn); + if (err <= 0) + return err; + + /* +* We are now leaving VFs in the control of some unknown PF entity. +* +* Best case is a well behaved userspace PF is expected and any VMs +* that the VFs will be assigned to are dependent on the userspace +* entity anyway. An example being NFV where maybe the PF is acting +* as an accelerated interface for a firewall or switch. +* +* Worst case is somebody really messed up and just enabled SR-IOV +* on a device they were planning to assign to a VM somwhere. +* +* In either case it is probably best for us to set the taint flag +* and warn the user since this could get really ugly really quick +* if this wasn't what they were planning to do. +*/ + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + pci_warn(pdev, +"Adding kernel taint for vfio-pci now managing SR-IOV PF device\n"); + + return nr_virtfn; +} + static struct pci_driver vfio_pci_driver = { .name = "vfio-pci", .id_table = NULL, /* only dynamic ids */ .probe = vfio_pci_probe, .remove = vfio_pci_remove, .err_handler= &vfio_err_handlers, +#ifdef CONFIG_PCI_IOV + .sriov_configure = vfio_pci_sriov_configure, +#endif }; struct vfio_devices { - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH 0/3] pci-iov: Add support for unmanaged SR-IOV
This series is meant to add support for SR-IOV on devices when the VFs are not managed by the kernel. Examples of recent patches attempting to do this include: virto - https://patchwork.kernel.org/patch/10241225/ pci-stub - https://patchwork.kernel.org/patch/10109935/ vfio - https://patchwork.kernel.org/patch/10103353/ uio - https://patchwork.kernel.org/patch/9974031/ Since this is quickly blowing up into a multi-driver problem it is probably best to implement this solution as generically as possible. This series is an attempt to do that. What we do with this patch set is provide a generic framework to enable SR-IOV in the case that the PF driver doesn't support managing the VFs itself. I based my patch set originally on the patch by Mark Rustad but there isn't much left after going through and cleaning out the bits that were no longer needed, and after incorporating the feedback from David Miller. At this point the only items to be fully reused was his patch description which is now present in patch 3 of the set. I have included the authors of the original 4 patches above in the Cc here. My hope is to get feedback and/or review on if this works for their use cases. My hope is that for now the pci-stub and uio driver approaches can be addressed using the current patch that enables vfio-pci support. The only limitation is that it is also setting the taint flag until we have a better solution. v2: Reduced scope back to just virtio_pci and vfio-pci Broke into 3 patch set from single patch Changed autoprobe behavior to always set when num_vfs is set non-zero Cc: Mark Rustad Cc: Maximilian Heyne Cc: Liang-Min Wang Cc: David Woodhouse --- Alexander Duyck (3): pci-iov: Add support for unmanaged SR-IOV vfio: Add support for unmanaged or userspace managed SR-IOV virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices Documentation/ABI/testing/sysfs-bus-pci | 17 + drivers/pci/iov.c | 37 drivers/pci/pci-driver.c|2 + drivers/pci/pci-sysfs.c | 29 drivers/pci/pci.h |4 ++ drivers/vfio/pci/vfio_pci.c | 57 +++ drivers/virtio/virtio_pci_common.c |4 ++ include/linux/pci.h |1 + 8 files changed, 149 insertions(+), 2 deletions(-) -- - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH 1/3] pci-iov: Add support for unmanaged SR-IOV
From: Alexander Duyck This patch is meant to add some basic functionality to support for SR-IOV on devices when the VFs are not managed by the kernel. The functions provided here can be used by drivers such as vfio-pci and virtio to enable SR-IOV on devices that are either managed by userspace, or by some sort of firmware entity respectively. A new sysfs value called sriov_unmanaged_autoprobe has been added. This value is used as the drivers_autoprobe setting of the VFs when they are being managed by an external entity such as userspace or device firmware instead of being managed by the kernel. One side effect of this change is that the sriov_drivers_autoprobe and sriov_unmanaged_autoprobe will only apply their updates when SR-IOV is disabled. Attempts to update them when SR-IOV is in use will only update the local value and will not update sriov->autoprobe. Signed-off-by: Alexander Duyck --- Documentation/ABI/testing/sysfs-bus-pci | 17 ++ drivers/pci/iov.c | 37 +++ drivers/pci/pci-driver.c|2 +- drivers/pci/pci-sysfs.c | 29 drivers/pci/pci.h |4 +++ include/linux/pci.h |1 + 6 files changed, 88 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 44d4b2be92fd..ff0b6c19cb1a 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -323,3 +323,20 @@ Description: This is similar to /sys/bus/pci/drivers_autoprobe, but affects only the VFs associated with a specific PF. + +What: /sys/bus/pci/devices/.../sriov_unmanaged_autoprobe +Date: March 2018 +Contact: Alexander Duyck +Description: + This file is associated with the PF of a device that + supports SR-IOV. It determines whether newly-enabled VFs + are immediately bound to a driver when the PF driver does + not manage the VFs itself. It initially contains 0, which + means the kernel will not automatically bind VFs to a driver. + If an application writes 1 to the file before enabling VFs, + the kernel will bind VFs to a compatible driver immediately + after they are enabled. + + This overrides /sys/bus/pci/devices/.../sriov_drivers_autoprobe + when a PF driver is not present to manage a device, or the PF + driver does not provide functionality to support SR-IOV. diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 677924ae0350..3dcec1fa86bd 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -446,6 +446,7 @@ static int sriov_init(struct pci_dev *dev, int pos) pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device); iov->pgsz = pgsz; iov->self = dev; + iov->autoprobe = true; iov->drivers_autoprobe = true; pci_read_config_dword(dev, pos + PCI_SRIOV_CAP, &iov->cap); pci_read_config_byte(dev, pos + PCI_SRIOV_FUNC_LINK, &iov->link); @@ -683,6 +684,9 @@ int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn) if (!dev->is_physfn) return -ENOSYS; + /* Update autoprobe setting to reflect managed device */ + dev->sriov->autoprobe = dev->sriov->drivers_autoprobe; + return sriov_enable(dev, nr_virtfn); } EXPORT_SYMBOL_GPL(pci_enable_sriov); @@ -807,3 +811,36 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev) return dev->sriov->total_VFs; } EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs); + +/** + * pci_sriov_configure_unmanaged - helper to configure unmanaged SR-IOV + * @dev: the PCI device + * @nr_virtfn: number of virtual functions to enable, 0 to disable + * + * Used to provide generic enable/disable SR-IOV option for devices + * that do not manage the VFs generated by their driver, or have no + * driver present. + */ +int pci_sriov_configure_unmanaged(struct pci_dev *dev, int nr_virtfn) +{ + int err; + + might_sleep(); + + if (!dev->is_physfn) + return -ENODEV; + + if (!nr_virtfn) { + sriov_disable(dev); + + return 0; + } + + /* Update autoprobe setting to reflect unmanaged device */ + dev->sriov->autoprobe = dev->sriov->unmanaged_autoprobe; + + err = sriov_enable(dev, nr_virtfn); + + return err ? err : nr_virtfn; +} +EXPORT_SYMBOL_GPL(pci_sriov_configure_unmanaged); diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 3bed6beda051..2cc68dff6130 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -398,7 +398,7 @@ void __weak pcibios_free_irq(struct pci_dev *dev) #ifdef CONFIG_PCI_IOV static inline bool pci_device_can_probe(struct pci_dev *pdev) { - return (!pdev->is_virtfn || pdev->ph
Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On 3/2/2018 1:11 PM, Siwei Liu wrote: On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala wrote: This patch enables virtio_net to switch over to a VF datapath when a VF netdev is present with the same MAC address. It allows live migration of a VM with a direct attached VF without the need to setup a bond/team between a VF and virtio net device in the guest. The hypervisor needs to enable only one datapath at any time so that packets don't get looped back to the VM over the other datapath. When a VF is plugged, the virtio datapath link state can be marked as down. The hypervisor needs to unplug the VF device from the guest on the source host and reset the MAC filter of the VF to initiate failover of datapath to virtio before starting the migration. After the migration is completed, the destination hypervisor sets the MAC filter on the VF and plugs it back to the guest to switch over to VF datapath. When BACKUP feature is enabled, an additional netdev(bypass netdev) is created that acts as a master device and tracks the state of the 2 lower netdevs. The original virtio_net netdev is marked as 'backup' netdev and a passthru device with the same MAC is registered as 'active' netdev. This patch is based on the discussion initiated by Jesse on this thread. https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 Signed-off-by: Sridhar Samudrala Signed-off-by: Alexander Duyck Reviewed-by: Jesse Brandeburg --- drivers/net/virtio_net.c | 683 ++- 1 file changed, 682 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index bcd13fe906ca..f2860d86c952 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -30,6 +30,8 @@ #include #include #include +#include +#include #include #include @@ -206,6 +208,9 @@ struct virtnet_info { u32 speed; unsigned long guest_offloads; + + /* upper netdev created when BACKUP feature enabled */ + struct net_device *bypass_netdev; }; struct padded_vnet_hdr { @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp) } } +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf, + size_t len) +{ + struct virtnet_info *vi = netdev_priv(dev); + int ret; + + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP)) + return -EOPNOTSUPP; + + ret = snprintf(buf, len, "_bkup"); + if (ret >= len) + return -EOPNOTSUPP; + + return 0; +} + What if the systemd/udevd is not new enough to enforce the n naming? Would virtio_bypass get a different name than the original virtio_net? Should we detect this earlier and fall back to legacy mode without creating the bypass netdev and ensalving the VF? If udev doesn't support renaming of the devices, then the upper bypass device should get the original name and the lower virtio netdev will get the next name. Hopefully the distros updating the kernel will also move to the new systemd/udev. static const struct net_device_ops virtnet_netdev = { .ndo_open= virtnet_open, .ndo_stop= virtnet_close, @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = { .ndo_xdp_xmit = virtnet_xdp_xmit, .ndo_xdp_flush = virtnet_xdp_flush, .ndo_features_check = passthru_features_check, + .ndo_get_phys_port_name = virtnet_get_phys_port_name, }; static void virtnet_config_changed_work(struct work_struct *work) @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device *vdev) return 0; } +/* START of functions supporting VIRTIO_NET_F_BACKUP feature. + * When BACKUP feature is enabled, an additional netdev(bypass netdev) + * is created that acts as a master device and tracks the state of the + * 2 lower netdevs. The original virtio_net netdev is registered as + * 'backup' netdev and a passthru device with the same MAC is registered + * as 'active' netdev. + */ + +/* bypass state maintained when BACKUP feature is enabled */ +struct virtnet_bypass_info { + /* passthru netdev with same MAC */ + struct net_device __rcu *active_netdev; + + /* virtio_net netdev */ + struct net_device __rcu *backup_netdev; + + /* active netdev stats */ + struct rtnl_link_stats64 active_stats; + + /* backup netdev stats */ + struct rtnl_link_stats64 backup_stats; + + /* aggregated stats */ + struct rtnl_link_stats64 bypass_stats; + + /* spinlock while updating stats */ + spinlock_t stats_lock; +}; + +static void virtnet_bypass_child_open(struct net_device *dev, + struct net_device *child_netdev) +{ + int err = dev_open(child_netdev); + + if (err) + netdev_warn(dev, "unable to open slave: %s: %
Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 2, 2018 at 1:31 PM, Michael S. Tsirkin wrote: > On Fri, Mar 02, 2018 at 12:44:56PM -0800, Siwei Liu wrote: >> On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin wrote: >> > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote: >> >> >> >> >> >> On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote: >> >> > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote: >> >> > > The design limits things to a 1:1 relationship since we just have the >> >> > > child and backup pointers, but I don't think I am seeing exception >> >> > > handling to prevent us from overwriting the child pointers so there >> >> > > may be a leak there. >> >> > > >> >> > > Thanks. >> >> > > >> >> > > - Alex >> >> > In fact maintaining a list in that case would be nicer, and >> >> > just use an arbitrary one. >> >> > E.g. one can see how a user wanting to swap device 1 for device 2 >> >> > might first add device 2 with same MAC then drop device 1. >> >> >> >> It should be possible to swap VF1 with VF2 by >> >> 1.- enabling virtio link >> >> 2.- unplugging VF1 >> >> 3.- plugging VF2 >> >> 4.- disabling virtio link >> >> >> > >> > True, but it isn't hard to avoid breakage if user >> > swapped steps 2 and 3. No need to make it more >> > fragile that it has to be. >> >> The migration case, VF2 is associated with another PF on another >> machine (destination), I wonder how it is possible. > > E.g. you want to remove the PF so you unplug the VF > then add another VF of the same PF. > >> Even with local plugging of VF2 on the same PF, the MAC address >> requirement (VF1's == VF2's) would fail the MAC address assignment on >> VF2. >> >> -Siwei > > Why would it fail? These are separate cards. OK. I realized that you may talk about assigning a VF on a diffferent PF (VF1 on PF1 while VF2 on PF2). And we might assign a pass-through device rather than a VF. Yes, it's indeed possible that may happen but I take it as a further step down (another patch maybe) as it would involve changes to notify the network with gratuituious ARP and/or unsolicited ND advertisement of the MAC address association with the new port. -Siwei > >> > >> > -- >> > MST >> > >> > - >> > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org >> > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org >> > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote: > On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala > wrote: > > This patch enables virtio_net to switch over to a VF datapath when a VF > > netdev is present with the same MAC address. It allows live migration > > of a VM with a direct attached VF without the need to setup a bond/team > > between a VF and virtio net device in the guest. > > > > The hypervisor needs to enable only one datapath at any time so that > > packets don't get looped back to the VM over the other datapath. When a VF > > is plugged, the virtio datapath link state can be marked as down. The > > hypervisor needs to unplug the VF device from the guest on the source host > > and reset the MAC filter of the VF to initiate failover of datapath to > > virtio before starting the migration. After the migration is completed, > > the destination hypervisor sets the MAC filter on the VF and plugs it back > > to the guest to switch over to VF datapath. > > > > When BACKUP feature is enabled, an additional netdev(bypass netdev) is > > created that acts as a master device and tracks the state of the 2 lower > > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a > > passthru device with the same MAC is registered as 'active' netdev. > > > > This patch is based on the discussion initiated by Jesse on this thread. > > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 > > > > Signed-off-by: Sridhar Samudrala > > Signed-off-by: Alexander Duyck > > Reviewed-by: Jesse Brandeburg > > --- > > drivers/net/virtio_net.c | 683 > > ++- > > 1 file changed, 682 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index bcd13fe906ca..f2860d86c952 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -30,6 +30,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > #include > > #include > > > > @@ -206,6 +208,9 @@ struct virtnet_info { > > u32 speed; > > > > unsigned long guest_offloads; > > + > > + /* upper netdev created when BACKUP feature enabled */ > > + struct net_device *bypass_netdev; > > }; > > > > struct padded_vnet_hdr { > > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, > > struct netdev_bpf *xdp) > > } > > } > > > > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf, > > + size_t len) > > +{ > > + struct virtnet_info *vi = netdev_priv(dev); > > + int ret; > > + > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP)) > > + return -EOPNOTSUPP; > > + > > + ret = snprintf(buf, len, "_bkup"); > > + if (ret >= len) > > + return -EOPNOTSUPP; > > + > > + return 0; > > +} > > + > > What if the systemd/udevd is not new enough to enforce the > n naming? Would virtio_bypass get a different name > than the original virtio_net? You mean people using ethX names? Any hardware config change breaks these, I don't think that can be helped. > Should we detect this earlier and fall > back to legacy mode without creating the bypass netdev and ensalving > the VF? I don't think we can do this with existing kernel/userspace APIs. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 02, 2018 at 12:56:21PM -0800, Samudrala, Sridhar wrote: > > > On 3/2/2018 12:44 PM, Siwei Liu wrote: > > On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin wrote: > > > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote: > > > > > > > > On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote: > > > > > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote: > > > > > > The design limits things to a 1:1 relationship since we just have > > > > > > the > > > > > > child and backup pointers, but I don't think I am seeing exception > > > > > > handling to prevent us from overwriting the child pointers so there > > > > > > may be a leak there. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > - Alex > > > > > In fact maintaining a list in that case would be nicer, and > > > > > just use an arbitrary one. > > > > > E.g. one can see how a user wanting to swap device 1 for device 2 > > > > > might first add device 2 with same MAC then drop device 1. > > > > It should be possible to swap VF1 with VF2 by > > > > 1.- enabling virtio link > > > > 2.- unplugging VF1 > > > > 3.- plugging VF2 > > > > 4.- disabling virtio link > > > > > > > True, but it isn't hard to avoid breakage if user > > > swapped steps 2 and 3. No need to make it more > > > fragile that it has to be. > > The migration case, VF2 is associated with another PF on another > > machine (destination), I wonder how it is possible. > > > > Even with local plugging of VF2 on the same PF, the MAC address > > requirement (VF1's == VF2's) would fail the MAC address assignment on > > VF2. > > > > > I didn't include updating the MAC filter step in the above sequence. > So definitely plugging 2 VFs with the same MAC address will be an issue. If these are two separate PFs then I don't see why - each has its own MAC filter. > Here is the more complete sequence of steps that are required to > enable live migration. Replacing VF1 by VF2 is not about migration. It's to remove PF from host e.g. for service. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 02, 2018 at 12:44:56PM -0800, Siwei Liu wrote: > On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin wrote: > > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote: > >> > >> > >> On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote: > >> > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote: > >> > > The design limits things to a 1:1 relationship since we just have the > >> > > child and backup pointers, but I don't think I am seeing exception > >> > > handling to prevent us from overwriting the child pointers so there > >> > > may be a leak there. > >> > > > >> > > Thanks. > >> > > > >> > > - Alex > >> > In fact maintaining a list in that case would be nicer, and > >> > just use an arbitrary one. > >> > E.g. one can see how a user wanting to swap device 1 for device 2 > >> > might first add device 2 with same MAC then drop device 1. > >> > >> It should be possible to swap VF1 with VF2 by > >> 1.- enabling virtio link > >> 2.- unplugging VF1 > >> 3.- plugging VF2 > >> 4.- disabling virtio link > >> > > > > True, but it isn't hard to avoid breakage if user > > swapped steps 2 and 3. No need to make it more > > fragile that it has to be. > > The migration case, VF2 is associated with another PF on another > machine (destination), I wonder how it is possible. E.g. you want to remove the PF so you unplug the VF then add another VF of the same PF. > Even with local plugging of VF2 on the same PF, the MAC address > requirement (VF1's == VF2's) would fail the MAC address assignment on > VF2. > > -Siwei Why would it fail? These are separate cards. > > > > -- > > MST > > > > - > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala wrote: > This patch enables virtio_net to switch over to a VF datapath when a VF > netdev is present with the same MAC address. It allows live migration > of a VM with a direct attached VF without the need to setup a bond/team > between a VF and virtio net device in the guest. > > The hypervisor needs to enable only one datapath at any time so that > packets don't get looped back to the VM over the other datapath. When a VF > is plugged, the virtio datapath link state can be marked as down. The > hypervisor needs to unplug the VF device from the guest on the source host > and reset the MAC filter of the VF to initiate failover of datapath to > virtio before starting the migration. After the migration is completed, > the destination hypervisor sets the MAC filter on the VF and plugs it back > to the guest to switch over to VF datapath. > > When BACKUP feature is enabled, an additional netdev(bypass netdev) is > created that acts as a master device and tracks the state of the 2 lower > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a > passthru device with the same MAC is registered as 'active' netdev. > > This patch is based on the discussion initiated by Jesse on this thread. > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 > > Signed-off-by: Sridhar Samudrala > Signed-off-by: Alexander Duyck > Reviewed-by: Jesse Brandeburg > --- > drivers/net/virtio_net.c | 683 > ++- > 1 file changed, 682 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index bcd13fe906ca..f2860d86c952 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -30,6 +30,8 @@ > #include > #include > #include > +#include > +#include > #include > #include > > @@ -206,6 +208,9 @@ struct virtnet_info { > u32 speed; > > unsigned long guest_offloads; > + > + /* upper netdev created when BACKUP feature enabled */ > + struct net_device *bypass_netdev; > }; > > struct padded_vnet_hdr { > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct > netdev_bpf *xdp) > } > } > > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf, > + size_t len) > +{ > + struct virtnet_info *vi = netdev_priv(dev); > + int ret; > + > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP)) > + return -EOPNOTSUPP; > + > + ret = snprintf(buf, len, "_bkup"); > + if (ret >= len) > + return -EOPNOTSUPP; > + > + return 0; > +} > + What if the systemd/udevd is not new enough to enforce the n naming? Would virtio_bypass get a different name than the original virtio_net? Should we detect this earlier and fall back to legacy mode without creating the bypass netdev and ensalving the VF? > static const struct net_device_ops virtnet_netdev = { > .ndo_open= virtnet_open, > .ndo_stop= virtnet_close, > @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = { > .ndo_xdp_xmit = virtnet_xdp_xmit, > .ndo_xdp_flush = virtnet_xdp_flush, > .ndo_features_check = passthru_features_check, > + .ndo_get_phys_port_name = virtnet_get_phys_port_name, > }; > > static void virtnet_config_changed_work(struct work_struct *work) > @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device > *vdev) > return 0; > } > > +/* START of functions supporting VIRTIO_NET_F_BACKUP feature. > + * When BACKUP feature is enabled, an additional netdev(bypass netdev) > + * is created that acts as a master device and tracks the state of the > + * 2 lower netdevs. The original virtio_net netdev is registered as > + * 'backup' netdev and a passthru device with the same MAC is registered > + * as 'active' netdev. > + */ > + > +/* bypass state maintained when BACKUP feature is enabled */ > +struct virtnet_bypass_info { > + /* passthru netdev with same MAC */ > + struct net_device __rcu *active_netdev; > + > + /* virtio_net netdev */ > + struct net_device __rcu *backup_netdev; > + > + /* active netdev stats */ > + struct rtnl_link_stats64 active_stats; > + > + /* backup netdev stats */ > + struct rtnl_link_stats64 backup_stats; > + > + /* aggregated stats */ > + struct rtnl_link_stats64 bypass_stats; > + > + /* spinlock while updating stats */ > + spinlock_t stats_lock; > +}; > + > +static void virtnet_bypass_child_open(struct net_device *dev, > + struct net_device *child_netdev) > +{ > + int err = dev_open(child_netdev); > + > + if (err) > + netdev_warn(dev, "unable to open slave: %s: %d\n", > + child_netdev->name, err); > +} > +
Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On 3/2/2018 12:44 PM, Siwei Liu wrote: On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin wrote: On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote: On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote: On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote: The design limits things to a 1:1 relationship since we just have the child and backup pointers, but I don't think I am seeing exception handling to prevent us from overwriting the child pointers so there may be a leak there. Thanks. - Alex In fact maintaining a list in that case would be nicer, and just use an arbitrary one. E.g. one can see how a user wanting to swap device 1 for device 2 might first add device 2 with same MAC then drop device 1. It should be possible to swap VF1 with VF2 by 1.- enabling virtio link 2.- unplugging VF1 3.- plugging VF2 4.- disabling virtio link True, but it isn't hard to avoid breakage if user swapped steps 2 and 3. No need to make it more fragile that it has to be. The migration case, VF2 is associated with another PF on another machine (destination), I wonder how it is possible. Even with local plugging of VF2 on the same PF, the MAC address requirement (VF1's == VF2's) would fail the MAC address assignment on VF2. I didn't include updating the MAC filter step in the above sequence. So definitely plugging 2 VFs with the same MAC address will be an issue. Here is the more complete sequence of steps that are required to enable live migration. Source Hypervisor - Bring up the virtio link - Hot Unplug VF from the VM - Delete FDB entry for the MAC on the Bridge associated with virtio/tap - Remove the MAC filter associated with the VF - Migrate VM to destination Destination Hypervisor (after migration is completed) - Set MAC filter for the VF - Hot Plug VF to the VM - Bring down the virtio link Thanks Sridhar - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 2, 2018 at 11:42 AM, Michael S. Tsirkin wrote: > On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote: >> >Yeah, this code essentially calls out the "shareable" code with a >> >comment at the start and end of the section what defines the >> >virtio_bypass functionality. It would just be a matter of mostly >> >cutting and pasting to put it into a separate driver module. >> >> Please put it there and unite the use of it with netvsc. > > Surely, adding this to other drivers (e.g. might this be handy for xen > too?) can be left for a separate patchset. Let's get one device merged > first. Agreed. -Siwei > > -- > MST > > - > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin wrote: > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote: >> >> >> On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote: >> > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote: >> > > The design limits things to a 1:1 relationship since we just have the >> > > child and backup pointers, but I don't think I am seeing exception >> > > handling to prevent us from overwriting the child pointers so there >> > > may be a leak there. >> > > >> > > Thanks. >> > > >> > > - Alex >> > In fact maintaining a list in that case would be nicer, and >> > just use an arbitrary one. >> > E.g. one can see how a user wanting to swap device 1 for device 2 >> > might first add device 2 with same MAC then drop device 1. >> >> It should be possible to swap VF1 with VF2 by >> 1.- enabling virtio link >> 2.- unplugging VF1 >> 3.- plugging VF2 >> 4.- disabling virtio link >> > > True, but it isn't hard to avoid breakage if user > swapped steps 2 and 3. No need to make it more > fragile that it has to be. The migration case, VF2 is associated with another PF on another machine (destination), I wonder how it is possible. Even with local plugging of VF2 on the same PF, the MAC address requirement (VF1's == VF2's) would fail the MAC address assignment on VF2. -Siwei > > -- > MST > > - > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 4/4] virtio-net: add linkspeed and duplex settings to virtio-net
On Fri, Mar 02, 2018 at 11:59:00AM -0500, Jason Baron wrote: > On 03/02/2018 02:14 AM, Jason Wang wrote: > > > > > > On 2018年03月02日 11:46, Jason Baron wrote: > >> Although linkspeed and duplex can be set in a linux guest via 'ethtool > >> -s', > >> this requires custom ethtool commands for virtio-net by default. > >> > >> Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows > >> the hypervisor to export a linkspeed and duplex setting. The user can > >> subsequently overwrite it later if desired via: 'ethtool -s'. > >> > >> Linkspeed and duplex settings can be set as: > >> '-device virtio-net,speed=1,duplex=full' > > > > I was thinking whether or not it's better to decide the duplex by the > > type of backends. > > > > E.g userspace and vhost-kernel implement a in fact half duplex. But dpdk > > implement a full duplex. > > Interesting - could this be derived only from the backend 'type'. IE: > NET_CLIENT_DRIVER_TAP, NET_CLIENT_DRIVER_VHOST_USER... > > > I was also thinking this could be specified as 'duplex=backend', in > addition to the proposed 'duplex=full' or 'duplex=half'? > > Thanks, > > -Jason I'd say it would make more sense to teach backends to obey what's specified by the user. E.g. if vhost gets a duplex config, create two threads. But I think all that's for future, we can just fake it for now - the current uses don't seem to particularly care about whether virtio actually is or isn't a duplex. > > > > Thanks > > > >> > >> where speed is [0...INT_MAX], and duplex is ["half"|"full"]. > >> > >> Signed-off-by: Jason Baron > >> Cc: "Michael S. Tsirkin" > >> Cc: Jason Wang > >> Cc:virtio-dev@lists.oasis-open.org > >> --- > > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 4/4] virtio-net: add linkspeed and duplex settings to virtio-net
On Fri, Mar 02, 2018 at 03:14:01PM +0800, Jason Wang wrote: > > > On 2018年03月02日 11:46, Jason Baron wrote: > > Although linkspeed and duplex can be set in a linux guest via 'ethtool -s', > > this requires custom ethtool commands for virtio-net by default. > > > > Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows > > the hypervisor to export a linkspeed and duplex setting. The user can > > subsequently overwrite it later if desired via: 'ethtool -s'. > > > > Linkspeed and duplex settings can be set as: > > '-device virtio-net,speed=1,duplex=full' > > I was thinking whether or not it's better to decide the duplex by the type > of backends. > > E.g userspace and vhost-kernel implement a in fact half duplex. But dpdk > implement a full duplex. > > Thanks OTOH it's a priority for some people to be able to support migration between different backend types. Breaking that won't be nice. > > > > where speed is [0...INT_MAX], and duplex is ["half"|"full"]. > > > > Signed-off-by: Jason Baron > > Cc: "Michael S. Tsirkin" > > Cc: Jason Wang > > Cc:virtio-dev@lists.oasis-open.org > > --- - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote: > > > On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote: > > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote: > > > The design limits things to a 1:1 relationship since we just have the > > > child and backup pointers, but I don't think I am seeing exception > > > handling to prevent us from overwriting the child pointers so there > > > may be a leak there. > > > > > > Thanks. > > > > > > - Alex > > In fact maintaining a list in that case would be nicer, and > > just use an arbitrary one. > > E.g. one can see how a user wanting to swap device 1 for device 2 > > might first add device 2 with same MAC then drop device 1. > > It should be possible to swap VF1 with VF2 by > 1.- enabling virtio link > 2.- unplugging VF1 > 3.- plugging VF2 > 4.- disabling virtio link > True, but it isn't hard to avoid breakage if user swapped steps 2 and 3. No need to make it more fragile that it has to be. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH] README.md: add hints for contributors
Add links to admin repo and mailing lists. Signed-off-by: Michael S. Tsirkin --- README.md | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 7dadb68..a4fdc4e 100644 --- a/README.md +++ b/README.md @@ -27,13 +27,19 @@ + Further Description of this Repository +Providing Feedback +See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio#feedback";> +https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio#feedback +Note for Contributors +Contributors should review TC specific +process rules under "Further Description of this Repository" +in https://github.com/oasis-tcs/virtio-admin";>https://github.com/oasis-tcs/virtio-admin. -[Any narrative content may be provided here by the TC, for example, if the Members wish to provide an extended statement of purpose.] - - - Contact Please send questions or comments about https://www.oasis-open.org/resources/tcadmin/github-repositories-for-oasis-tc-members-chartered-work";>OASIS TC GitHub repositories to mailto:ro...@oasis-open.org";>Robin Cover and mailto:chet.ens...@oasis-open.org";>Chet Ensign. For questions about content in this repository, please contact the TC Chair or Co-Chairs as listed on the the virtio TC's https://www.oasis-open.org/committees/virtio/";>home page. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote: On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote: The design limits things to a 1:1 relationship since we just have the child and backup pointers, but I don't think I am seeing exception handling to prevent us from overwriting the child pointers so there may be a leak there. Thanks. - Alex In fact maintaining a list in that case would be nicer, and just use an arbitrary one. E.g. one can see how a user wanting to swap device 1 for device 2 might first add device 2 with same MAC then drop device 1. It should be possible to swap VF1 with VF2 by - enabling virtio link - unplugging VF1 - plugging VF2 - disabling virtio link - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote: > >Yeah, this code essentially calls out the "shareable" code with a > >comment at the start and end of the section what defines the > >virtio_bypass functionality. It would just be a matter of mostly > >cutting and pasting to put it into a separate driver module. > > Please put it there and unite the use of it with netvsc. Surely, adding this to other drivers (e.g. might this be handy for xen too?) can be left for a separate patchset. Let's get one device merged first. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote: > The design limits things to a 1:1 relationship since we just have the > child and backup pointers, but I don't think I am seeing exception > handling to prevent us from overwriting the child pointers so there > may be a leak there. > > Thanks. > > - Alex In fact maintaining a list in that case would be nicer, and just use an arbitrary one. E.g. one can see how a user wanting to swap device 1 for device 2 might first add device 2 with same MAC then drop device 1. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v3 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
On Fri, Mar 02, 2018 at 04:47:29PM +0800, Wei Wang wrote: > diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h > index af49e19..16a2aae 100644 > --- a/include/sysemu/balloon.h > +++ b/include/sysemu/balloon.h ... > +typedef void (QEMUBalloonFreePageStart)(void *opaque); > +typedef void (QEMUBalloonFreePageStop)(void *opaque); So I think the rule is that no bitmap sync must happen between these two, otherwise a hint might arrive and override the sync output. Should be documented I think. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v3 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
On Fri, Mar 02, 2018 at 04:47:29PM +0800, Wei Wang wrote: > The new feature enables the virtio-balloon device to receive hints of > guest free pages from the free page vq. Callers call the > free_page_start API to start the reporting, which creates a thread to > poll for free page hints. The free_page_stop API stops the reporting and > makes the thread exit. > > Signed-off-by: Wei Wang > Signed-off-by: Liang Li > CC: Michael S. Tsirkin > CC: Dr. David Alan Gilbert > CC: Juan Quintela > --- > balloon.c | 49 +-- > hw/virtio/virtio-balloon.c | 172 > +--- > include/hw/virtio/virtio-balloon.h | 14 +- > include/standard-headers/linux/virtio_balloon.h | 7 + > include/sysemu/balloon.h| 15 ++- > 5 files changed, 228 insertions(+), 29 deletions(-) > > diff --git a/balloon.c b/balloon.c > index d8dd6fe..b0b0749 100644 > --- a/balloon.c > +++ b/balloon.c > @@ -36,6 +36,9 @@ > > static QEMUBalloonEvent *balloon_event_fn; > static QEMUBalloonStatus *balloon_stat_fn; > +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn; > +static QEMUBalloonFreePageStart *balloon_free_page_start_fn; > +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn; > static void *balloon_opaque; > static bool balloon_inhibited; > > @@ -64,19 +67,42 @@ static bool have_balloon(Error **errp) > return true; > } > > -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, > - QEMUBalloonStatus *stat_func, void *opaque) > +bool balloon_free_page_support(void) > { > -if (balloon_event_fn || balloon_stat_fn || balloon_opaque) { > -/* We're already registered one balloon handler. How many can > - * a guest really have? > - */ > -return -1; > +return balloon_free_page_support_fn && > + balloon_free_page_support_fn(balloon_opaque); > +} > + > +void balloon_free_page_start(void) > +{ > +balloon_free_page_start_fn(balloon_opaque); > +} > + > +void balloon_free_page_stop(void) > +{ > +balloon_free_page_stop_fn(balloon_opaque); > +} > + > +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn, > + QEMUBalloonStatus *stat_fn, > + QEMUBalloonFreePageSupport > *free_page_support_fn, > + QEMUBalloonFreePageStart *free_page_start_fn, > + QEMUBalloonFreePageStop *free_page_stop_fn, > + void *opaque) > +{ > +if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn > || > +balloon_free_page_start_fn || balloon_free_page_stop_fn || > +balloon_opaque) { > +/* We already registered one balloon handler. */ > +return; > } > -balloon_event_fn = event_func; > -balloon_stat_fn = stat_func; > + > +balloon_event_fn = event_fn; > +balloon_stat_fn = stat_fn; > +balloon_free_page_support_fn = free_page_support_fn; > +balloon_free_page_start_fn = free_page_start_fn; > +balloon_free_page_stop_fn = free_page_stop_fn; > balloon_opaque = opaque; > -return 0; > } > > void qemu_remove_balloon_handler(void *opaque) > @@ -86,6 +112,9 @@ void qemu_remove_balloon_handler(void *opaque) > } > balloon_event_fn = NULL; > balloon_stat_fn = NULL; > +balloon_free_page_support_fn = NULL; > +balloon_free_page_start_fn = NULL; > +balloon_free_page_stop_fn = NULL; > balloon_opaque = NULL; > } > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index 4822449..4607879 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -31,6 +31,7 @@ > > #include "hw/virtio/virtio-bus.h" > #include "hw/virtio/virtio-access.h" > +#include "migration/misc.h" > > #define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT) > > @@ -308,6 +309,100 @@ out: > } > } > > +static void *virtio_balloon_poll_free_page_hints(void *opaque) > +{ > +VirtQueueElement *elem; > +VirtIOBalloon *dev = opaque; > +VirtQueue *vq = dev->free_page_vq; > +uint32_t id; > +size_t size; > + > +/* > + * Poll the vq till the status changed to STOP. This happens when > + * the guest finishes reporting hints or the migration thread actively > + * stops the reporting. > + */ > +while (dev->free_page_report_status != FREE_PAGE_REPORT_S_STOP) { > +elem = virtqueue_pop(vq, sizeof(VirtQueueElement)); > +if (!elem) { > +continue; > +} > + > +if (elem->out_num) { > +size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, > sizeof(id)); > +virtqueue_push(vq, elem, size); > +g_free(elem); > +if (unlikely(size != sizeof(id))) { > +warn_report("%s: received an incorrect cmd id", __func__); > +break;
[virtio-dev] Re: [PATCH 2/4] rocker: drop local duplex definitions
On Thu, Mar 01, 2018 at 10:46:34PM -0500, Jason Baron wrote: > Make use of duplex definitions from net/eth.h. > > Signed-off-by: Jason Baron > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Jiri Pirko > Cc: virtio-dev@lists.oasis-open.org > --- > hw/net/rocker/rocker_fp.c | 6 +- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/hw/net/rocker/rocker_fp.c b/hw/net/rocker/rocker_fp.c > index 4b3c984..13a14a0 100644 > --- a/hw/net/rocker/rocker_fp.c > +++ b/hw/net/rocker/rocker_fp.c > @@ -16,17 +16,13 @@ > > #include "qemu/osdep.h" > #include "net/clients.h" > +#include "net/eth.h" > > #include "rocker.h" > #include "rocker_hw.h" > #include "rocker_fp.h" > #include "rocker_world.h" > > -enum duplex { > -DUPLEX_HALF = 0, > -DUPLEX_FULL > -}; > - This is hardware emulation, so I actually suspect it should be renamed ROCKER_FP_HALF/ROCKER_FP_FULL. > struct fp_port { > Rocker *r; > World *world; > -- > 2.7.4 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 1/4] eth: add speed and duplex definitions
On Thu, Mar 01, 2018 at 10:46:33PM -0500, Jason Baron wrote: > Pull in definitions for SPEED_UNKNOWN, DUPLEX_UNKNOWN, DUPLEX_HALF, > and DUPLEX_FULL. > > Signed-off-by: Jason Baron > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: virtio-dev@lists.oasis-open.org > --- > include/net/eth.h | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/include/net/eth.h b/include/net/eth.h > index 09054a5..9843678 100644 > --- a/include/net/eth.h > +++ b/include/net/eth.h > @@ -417,4 +417,11 @@ bool > eth_parse_ipv6_hdr(const struct iovec *pkt, int pkt_frags, > size_t ip6hdr_off, eth_ip6_hdr_info *info); > > +/* ethtool defines - from linux/ethtool.h */ > +#define SPEED_UNKNOWN -1 > + > +#define DUPLEX_HALF 0x00 > +#define DUPLEX_FULL 0x01 > +#define DUPLEX_UNKNOWN 0xff > + > #endif While that's not a lot, I think we should import linux/ethtool.h into include/standard-headers/linux/ using scripts/update-linux-headers.sh > -- > 2.7.4 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 2, 2018 at 8:37 AM, Samudrala, Sridhar wrote: > > > On 3/2/2018 8:20 AM, Jiri Pirko wrote: > > Fri, Mar 02, 2018 at 04:26:25PM CET, alexander.du...@gmail.com wrote: > > On Fri, Mar 2, 2018 at 12:36 AM, Jiri Pirko wrote: > > Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote: > > This patch enables virtio_net to switch over to a VF datapath when a VF > netdev is present with the same MAC address. It allows live migration > of a VM with a direct attached VF without the need to setup a bond/team > between a VF and virtio net device in the guest. > > The hypervisor needs to enable only one datapath at any time so that > packets don't get looped back to the VM over the other datapath. When a VF > is plugged, the virtio datapath link state can be marked as down. The > hypervisor needs to unplug the VF device from the guest on the source host > and reset the MAC filter of the VF to initiate failover of datapath to > virtio before starting the migration. After the migration is completed, > the destination hypervisor sets the MAC filter on the VF and plugs it back > to the guest to switch over to VF datapath. > > When BACKUP feature is enabled, an additional netdev(bypass netdev) is > created that acts as a master device and tracks the state of the 2 lower > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a > passthru device with the same MAC is registered as 'active' netdev. > > This patch is based on the discussion initiated by Jesse on this thread. > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 > > Signed-off-by: Sridhar Samudrala > Signed-off-by: Alexander Duyck > Reviewed-by: Jesse Brandeburg > --- > drivers/net/virtio_net.c | 683 > ++- > > As I wrote to the discussion of the other version of this patchset, > I strongly believe you need to have a common part in net/core/ that will > be shared by both virtio_net and netvsc. There there should be explicit > limits for this in-driver bonding solution, like max 1 vf slave, etc. > > Yeah, this code essentially calls out the "shareable" code with a > comment at the start and end of the section what defines the > virtio_bypass functionality. It would just be a matter of mostly > cutting and pasting to put it into a separate driver module. > > Please put it there and unite the use of it with netvsc. > > Based on Stephen's responses, it is not clear if netvsc would be OK to move > to > the 3 netdev solution at this time. When another paravirtual driver would > like > to use this solution, we can do refactoring at that time. > > The design limits things to a 1:1 relationship since we just have the > child and backup pointers, but I don't think I am seeing exception > handling to prevent us from overwriting the child pointers so there > may be a leak there. > > > We only allow one active netdev registered as a child. There is a check to > make sure that if an active netdev is registered, another one with same MAC > is not allowed. I don't think there is a leak. > > vbi = netdev_priv(dev); > backup = (child_netdev->dev.parent == dev->dev.parent); > if (backup ? rtnl_dereference(vbi->backup_netdev) : > rtnl_dereference(vbi->active_netdev)) { > netdev_info(dev, > "%s attempting to join bypass dev when %s > already present\n", > child_netdev->name, backup ? "backup" : > "active"); > return NOTIFY_DONE; > } > > /* Avoid non pci devices as active netdev */ > if (!backup && (!child_netdev->dev.parent || > !dev_is_pci(child_netdev->dev.parent))) > return NOTIFY_DONE; > > Thanks > Sridhar I overlooked that part. Thanks. - Alex - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 4/4] virtio-net: add linkspeed and duplex settings to virtio-net
On 03/02/2018 02:14 AM, Jason Wang wrote: > > > On 2018年03月02日 11:46, Jason Baron wrote: >> Although linkspeed and duplex can be set in a linux guest via 'ethtool >> -s', >> this requires custom ethtool commands for virtio-net by default. >> >> Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows >> the hypervisor to export a linkspeed and duplex setting. The user can >> subsequently overwrite it later if desired via: 'ethtool -s'. >> >> Linkspeed and duplex settings can be set as: >> '-device virtio-net,speed=1,duplex=full' > > I was thinking whether or not it's better to decide the duplex by the > type of backends. > > E.g userspace and vhost-kernel implement a in fact half duplex. But dpdk > implement a full duplex. Interesting - could this be derived only from the backend 'type'. IE: NET_CLIENT_DRIVER_TAP, NET_CLIENT_DRIVER_VHOST_USER... I was also thinking this could be specified as 'duplex=backend', in addition to the proposed 'duplex=full' or 'duplex=half'? Thanks, -Jason > > Thanks > >> >> where speed is [0...INT_MAX], and duplex is ["half"|"full"]. >> >> Signed-off-by: Jason Baron >> Cc: "Michael S. Tsirkin" >> Cc: Jason Wang >> Cc:virtio-dev@lists.oasis-open.org >> --- > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On 3/2/2018 8:20 AM, Jiri Pirko wrote: Fri, Mar 02, 2018 at 04:26:25PM CET, alexander.du...@gmail.com wrote: On Fri, Mar 2, 2018 at 12:36 AM, Jiri Pirko wrote: Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote: This patch enables virtio_net to switch over to a VF datapath when a VF netdev is present with the same MAC address. It allows live migration of a VM with a direct attached VF without the need to setup a bond/team between a VF and virtio net device in the guest. The hypervisor needs to enable only one datapath at any time so that packets don't get looped back to the VM over the other datapath. When a VF is plugged, the virtio datapath link state can be marked as down. The hypervisor needs to unplug the VF device from the guest on the source host and reset the MAC filter of the VF to initiate failover of datapath to virtio before starting the migration. After the migration is completed, the destination hypervisor sets the MAC filter on the VF and plugs it back to the guest to switch over to VF datapath. When BACKUP feature is enabled, an additional netdev(bypass netdev) is created that acts as a master device and tracks the state of the 2 lower netdevs. The original virtio_net netdev is marked as 'backup' netdev and a passthru device with the same MAC is registered as 'active' netdev. This patch is based on the discussion initiated by Jesse on this thread. https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 Signed-off-by: Sridhar Samudrala Signed-off-by: Alexander Duyck Reviewed-by: Jesse Brandeburg --- drivers/net/virtio_net.c | 683 ++- As I wrote to the discussion of the other version of this patchset, I strongly believe you need to have a common part in net/core/ that will be shared by both virtio_net and netvsc. There there should be explicit limits for this in-driver bonding solution, like max 1 vf slave, etc. Yeah, this code essentially calls out the "shareable" code with a comment at the start and end of the section what defines the virtio_bypass functionality. It would just be a matter of mostly cutting and pasting to put it into a separate driver module. Please put it there and unite the use of it with netvsc. Based on Stephen's responses, it is not clear if netvsc would be OK to move to the 3 netdev solution at this time. When another paravirtual driver would like to use this solution, we can do refactoring at that time. The design limits things to a 1:1 relationship since we just have the child and backup pointers, but I don't think I am seeing exception handling to prevent us from overwriting the child pointers so there may be a leak there. We only allow one active netdev registered as a child. There is a check to make sure that if an active netdev is registered, another one with same MAC is not allowed. I don't think there is a leak. vbi = netdev_priv(dev); backup = (child_netdev->dev.parent == dev->dev.parent); if (backup ? rtnl_dereference(vbi->backup_netdev) : rtnl_dereference(vbi->active_netdev)) { netdev_info(dev, "%s attempting to join bypass dev when %s already present\n", child_netdev->name, backup ? "backup" : "active"); return NOTIFY_DONE; } /* Avoid non pci devices as active netdev */ if (!backup && (!child_netdev->dev.parent || !dev_is_pci(child_netdev->dev.parent))) return NOTIFY_DONE; Thanks Sridhar
[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Fri, Mar 2, 2018 at 12:36 AM, Jiri Pirko wrote: > Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote: >>This patch enables virtio_net to switch over to a VF datapath when a VF >>netdev is present with the same MAC address. It allows live migration >>of a VM with a direct attached VF without the need to setup a bond/team >>between a VF and virtio net device in the guest. >> >>The hypervisor needs to enable only one datapath at any time so that >>packets don't get looped back to the VM over the other datapath. When a VF >>is plugged, the virtio datapath link state can be marked as down. The >>hypervisor needs to unplug the VF device from the guest on the source host >>and reset the MAC filter of the VF to initiate failover of datapath to >>virtio before starting the migration. After the migration is completed, >>the destination hypervisor sets the MAC filter on the VF and plugs it back >>to the guest to switch over to VF datapath. >> >>When BACKUP feature is enabled, an additional netdev(bypass netdev) is >>created that acts as a master device and tracks the state of the 2 lower >>netdevs. The original virtio_net netdev is marked as 'backup' netdev and a >>passthru device with the same MAC is registered as 'active' netdev. >> >>This patch is based on the discussion initiated by Jesse on this thread. >>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 >> >>Signed-off-by: Sridhar Samudrala >>Signed-off-by: Alexander Duyck >>Reviewed-by: Jesse Brandeburg >>--- >> drivers/net/virtio_net.c | 683 >> ++- > > As I wrote to the discussion of the other version of this patchset, > I strongly believe you need to have a common part in net/core/ that will > be shared by both virtio_net and netvsc. There there should be explicit > limits for this in-driver bonding solution, like max 1 vf slave, etc. Yeah, this code essentially calls out the "shareable" code with a comment at the start and end of the section what defines the virtio_bypass functionality. It would just be a matter of mostly cutting and pasting to put it into a separate driver module. The design limits things to a 1:1 relationship since we just have the child and backup pointers, but I don't think I am seeing exception handling to prevent us from overwriting the child pointers so there may be a leak there. Thanks. - Alex - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH v2 1/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
On 02/07/2018 09:04 AM, Michael S. Tsirkin wrote: On Tue, Feb 06, 2018 at 07:08:17PM +0800, Wei Wang wrote: The new feature enables the virtio-balloon device to receive the hint of guest free pages from the free page vq, and clears the corresponding bits of the free page from the dirty bitmap, so that those free pages are not transferred by the migration thread. Signed-off-by: Wei Wang Signed-off-by: Liang Li CC: Michael S. Tsirkin CC: Juan Quintela --- balloon.c | 39 +-- hw/virtio/virtio-balloon.c | 145 +--- include/hw/virtio/virtio-balloon.h | 11 +- include/migration/misc.h| 3 + include/standard-headers/linux/virtio_balloon.h | 7 ++ include/sysemu/balloon.h| 12 +- migration/ram.c | 10 ++ 7 files changed, 198 insertions(+), 29 deletions(-) diff --git a/balloon.c b/balloon.c index 1d720ff..0f0b30c 100644 --- a/balloon.c +++ b/balloon.c @@ -36,6 +36,8 @@ static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn; +static QEMUBalloonFreePagePoll *balloon_free_page_poll_fn; static void *balloon_opaque; static bool balloon_inhibited; @@ -64,19 +66,34 @@ static bool have_balloon(Error **errp) return true; } -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, - QEMUBalloonStatus *stat_func, void *opaque) +bool balloon_free_page_support(void) { -if (balloon_event_fn || balloon_stat_fn || balloon_opaque) { -/* We're already registered one balloon handler. How many can - * a guest really have? - */ -return -1; +return balloon_free_page_support_fn && + balloon_free_page_support_fn(balloon_opaque); +} + +void balloon_free_page_poll(void) +{ +balloon_free_page_poll_fn(balloon_opaque); +} + +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn, + QEMUBalloonStatus *stat_fn, + QEMUBalloonFreePageSupport *free_page_support_fn, + QEMUBalloonFreePagePoll *free_page_poll_fn, + void *opaque) +{ +if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn || +balloon_free_page_poll_fn || balloon_opaque) { +/* We already registered one balloon handler. */ +return; } -balloon_event_fn = event_func; -balloon_stat_fn = stat_func; + +balloon_event_fn = event_fn; +balloon_stat_fn = stat_fn; +balloon_free_page_support_fn = free_page_support_fn; +balloon_free_page_poll_fn = free_page_poll_fn; balloon_opaque = opaque; -return 0; } void qemu_remove_balloon_handler(void *opaque) @@ -86,6 +103,8 @@ void qemu_remove_balloon_handler(void *opaque) } balloon_event_fn = NULL; balloon_stat_fn = NULL; +balloon_free_page_support_fn = NULL; +balloon_free_page_poll_fn = NULL; balloon_opaque = NULL; } diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 14e08d2..b424d4e 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -23,6 +23,7 @@ #include "hw/virtio/virtio-balloon.h" #include "sysemu/kvm.h" #include "exec/address-spaces.h" +#include "exec/ram_addr.h" #include "qapi/visitor.h" #include "qapi-event.h" #include "trace.h" @@ -30,6 +31,7 @@ #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" +#include "migration/misc.h" #define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT) @@ -305,6 +307,87 @@ out: } } +static void virtio_balloon_poll_free_page_hints(VirtIOBalloon *dev) +{ +VirtQueueElement *elem; +VirtQueue *vq = dev->free_page_vq; +VirtIODevice *vdev = VIRTIO_DEVICE(dev); +bool page_poisoning = virtio_vdev_has_feature(vdev, + VIRTIO_BALLOON_F_PAGE_POISON); +uint32_t id; + +/* Poll the vq till a stop cmd id is received */ +while (dev->free_page_report_status != FREE_PAGE_REPORT_S_STOP) { +elem = virtqueue_pop(vq, sizeof(VirtQueueElement)); +if (!elem) { +continue; +} + +if (elem->out_num) { +iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(uint32_t)); +virtqueue_push(vq, elem, sizeof(id)); +g_free(elem); +if (id == dev->free_page_report_cmd_id) { +dev->free_page_report_status = FREE_PAGE_REPORT_S_START; +} else { +dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP; So if migration on source times out, then you migrate to another destination, this will cancel the in-progress reporting due to while loop above. Probably not what was intended. +break;
[virtio-dev] [PATCH v3 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
The new feature enables the virtio-balloon device to receive hints of guest free pages from the free page vq. Callers call the free_page_start API to start the reporting, which creates a thread to poll for free page hints. The free_page_stop API stops the reporting and makes the thread exit. Signed-off-by: Wei Wang Signed-off-by: Liang Li CC: Michael S. Tsirkin CC: Dr. David Alan Gilbert CC: Juan Quintela --- balloon.c | 49 +-- hw/virtio/virtio-balloon.c | 172 +--- include/hw/virtio/virtio-balloon.h | 14 +- include/standard-headers/linux/virtio_balloon.h | 7 + include/sysemu/balloon.h| 15 ++- 5 files changed, 228 insertions(+), 29 deletions(-) diff --git a/balloon.c b/balloon.c index d8dd6fe..b0b0749 100644 --- a/balloon.c +++ b/balloon.c @@ -36,6 +36,9 @@ static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn; +static QEMUBalloonFreePageStart *balloon_free_page_start_fn; +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn; static void *balloon_opaque; static bool balloon_inhibited; @@ -64,19 +67,42 @@ static bool have_balloon(Error **errp) return true; } -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, - QEMUBalloonStatus *stat_func, void *opaque) +bool balloon_free_page_support(void) { -if (balloon_event_fn || balloon_stat_fn || balloon_opaque) { -/* We're already registered one balloon handler. How many can - * a guest really have? - */ -return -1; +return balloon_free_page_support_fn && + balloon_free_page_support_fn(balloon_opaque); +} + +void balloon_free_page_start(void) +{ +balloon_free_page_start_fn(balloon_opaque); +} + +void balloon_free_page_stop(void) +{ +balloon_free_page_stop_fn(balloon_opaque); +} + +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn, + QEMUBalloonStatus *stat_fn, + QEMUBalloonFreePageSupport *free_page_support_fn, + QEMUBalloonFreePageStart *free_page_start_fn, + QEMUBalloonFreePageStop *free_page_stop_fn, + void *opaque) +{ +if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn || +balloon_free_page_start_fn || balloon_free_page_stop_fn || +balloon_opaque) { +/* We already registered one balloon handler. */ +return; } -balloon_event_fn = event_func; -balloon_stat_fn = stat_func; + +balloon_event_fn = event_fn; +balloon_stat_fn = stat_fn; +balloon_free_page_support_fn = free_page_support_fn; +balloon_free_page_start_fn = free_page_start_fn; +balloon_free_page_stop_fn = free_page_stop_fn; balloon_opaque = opaque; -return 0; } void qemu_remove_balloon_handler(void *opaque) @@ -86,6 +112,9 @@ void qemu_remove_balloon_handler(void *opaque) } balloon_event_fn = NULL; balloon_stat_fn = NULL; +balloon_free_page_support_fn = NULL; +balloon_free_page_start_fn = NULL; +balloon_free_page_stop_fn = NULL; balloon_opaque = NULL; } diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 4822449..4607879 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -31,6 +31,7 @@ #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" +#include "migration/misc.h" #define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT) @@ -308,6 +309,100 @@ out: } } +static void *virtio_balloon_poll_free_page_hints(void *opaque) +{ +VirtQueueElement *elem; +VirtIOBalloon *dev = opaque; +VirtQueue *vq = dev->free_page_vq; +uint32_t id; +size_t size; + +/* + * Poll the vq till the status changed to STOP. This happens when + * the guest finishes reporting hints or the migration thread actively + * stops the reporting. + */ +while (dev->free_page_report_status != FREE_PAGE_REPORT_S_STOP) { +elem = virtqueue_pop(vq, sizeof(VirtQueueElement)); +if (!elem) { +continue; +} + +if (elem->out_num) { +size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id)); +virtqueue_push(vq, elem, size); +g_free(elem); +if (unlikely(size != sizeof(id))) { +warn_report("%s: received an incorrect cmd id", __func__); +break; +} +if (id == dev->free_page_report_cmd_id) { +dev->free_page_report_status = FREE_PAGE_REPORT_S_START; +} else if (dev->free_page_report_status == + FREE_PAGE_REPORT_S_START) { +/* + * Stop the optimization only when it has started. This
[virtio-dev] [PATCH v3 1/3] migration: API to clear bits of guest free pages from the dirty bitmap
This patch adds an API to clear bits corresponding to guest free pages from the dirty bitmap. Spilt the free page block if it crosses the QEMU RAMBlock boundary. Signed-off-by: Wei Wang CC: Dr. David Alan Gilbert CC: Juan Quintela CC: Michael S. Tsirkin --- include/migration/misc.h | 2 ++ migration/ram.c | 20 2 files changed, 22 insertions(+) diff --git a/include/migration/misc.h b/include/migration/misc.h index 77fd4f5..fae1acf 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -14,11 +14,13 @@ #ifndef MIGRATION_MISC_H #define MIGRATION_MISC_H +#include "exec/cpu-common.h" #include "qemu/notify.h" /* migration/ram.c */ void ram_mig_init(void); +void qemu_guest_free_page_hint(void *addr, size_t len); /* migration/block.c */ diff --git a/migration/ram.c b/migration/ram.c index 5e33e5c..769a0f6 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2189,6 +2189,26 @@ static int ram_init_all(RAMState **rsp) return 0; } +void qemu_guest_free_page_hint(void *addr, size_t len) +{ +RAMBlock *block; +ram_addr_t offset; +size_t used_len, start, npages; + +for (used_len = len; len > 0; len -= used_len) { +block = qemu_ram_block_from_host(addr, false, &offset); +if (unlikely(offset + len > block->used_length)) { +used_len = block->used_length - offset; +addr += used_len; +} + +start = offset >> TARGET_PAGE_BITS; +npages = used_len >> TARGET_PAGE_BITS; +bitmap_clear(block->bmap, start, npages); +ram_state->migration_dirty_pages -= npages; +} +} + /* * Each of ram_save_setup, ram_save_iterate and ram_save_complete has * long-running RCU critical section. When rcu-reclaims in the code -- 1.8.3.1 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH v3 0/3] virtio-balloon: free page hint reporting support
This is the deivce part implementation to add a new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device receives the guest free page hints from the driver and clears the corresponding bits in the dirty bitmap, so that those free pages are not transferred by the migration thread to the destination. - Test Environment Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Guest: 8G RAM, 4 vCPU Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second - Test Results - Idle Guest Live Migration Time (results are averaged over 10 runs): - Optimization v.s. Legacy = 259ms vs 1769ms --> ~86% reduction - Guest with Linux Compilation Workload (make bzImage -j4): - Live Migration Time (average) Optimization v.s. Legacy = 1290ms v.s. 2634ms --> ~51% reduction - Linux Compilation Time Optimization v.s. Legacy = 5min v.s. 5min3s --> no obvious difference - Source Code - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git ChangeLog: v2->v3: 1) virtio-balloon - virtio_balloon_free_page_start: poll the hints using a new thread; - use cmd id between [0x8000, UINT_MAX]; - virtio_balloon_poll_free_page_hints: - stop the optimization only when it has started; - don't skip free pages when !poison_val; - add poison_val to vmsd to migrate; - virtio_balloon_get_features: add the F_PAGE_POISON feature when host has F_FREE_PAGE_HINT; - remove the timer patch which is not needed now. 2) migration - new api, qemu_guest_free_page_hint; - rs->free_page_support set only in the precopy case; - use the new balloon APIs. v1->v2: 1) virtio-balloon - use subsections to save free_page_report_cmd_id; - poll the free page vq after sending a cmd id to the driver; - change the free page vq size to VIRTQUEUE_MAX_SIZE; - virtio_balloon_poll_free_page_hints: handle the corner case that the free page block reported from the driver may cross the RAMBlock boundary. 2) migration/ram.c - use balloon_free_page_poll to start the optimization Wei Wang (3): migration: API to clear bits of guest free pages from the dirty bitmap virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT migration: use the free page hint feature from balloon balloon.c | 49 +-- hw/virtio/virtio-balloon.c | 172 +--- include/hw/virtio/virtio-balloon.h | 14 +- include/migration/misc.h| 2 + include/standard-headers/linux/virtio_balloon.h | 7 + include/sysemu/balloon.h| 15 ++- migration/ram.c | 39 +- 7 files changed, 268 insertions(+), 30 deletions(-) -- 1.8.3.1 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH v3 3/3] migration: use the free page hint feature from balloon
Start the free page optimization when the bulk stage starts. In case the guest is slow in reporting, actively stops it when the bulk stage ends. The optimization avoids sending guest free pages during the bulk stage. Currently, the optimization is added to precopy only. Signed-off-by: Wei Wang CC: Dr. David Alan Gilbert CC: Juan Quintela CC: Michael S. Tsirkin --- migration/ram.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 769a0f6..f6af7e6 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -51,6 +51,7 @@ #include "qemu/rcu_queue.h" #include "migration/colo.h" #include "migration/block.h" +#include "sysemu/balloon.h" /***/ /* ram save/restore */ @@ -208,6 +209,8 @@ struct RAMState { uint32_t last_version; /* We are in the first round */ bool ram_bulk_stage; +/* The free pages optimization feature is supported */ +bool free_page_support; /* How many times we have dirty too many pages */ int dirty_rate_high_cnt; /* these variables are used for bitmap sync */ @@ -775,7 +778,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, unsigned long *bitmap = rb->bmap; unsigned long next; -if (rs->ram_bulk_stage && start > 0) { +if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) { next = start + 1; } else { next = find_next_bit(bitmap, size, start); @@ -1225,6 +1228,10 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again) /* Flag that we've looped */ pss->complete_round = true; rs->ram_bulk_stage = false; +if (rs->free_page_support) { +balloon_free_page_stop(); +rs->free_page_support = false; +} if (migrate_use_xbzrle()) { /* If xbzrle is on, stop using the data compression at this * point. In theory, xbzrle can do better than compression. @@ -1656,6 +1663,8 @@ static void ram_state_reset(RAMState *rs) rs->last_page = 0; rs->last_version = ram_list.version; rs->ram_bulk_stage = true; +rs->free_page_support = balloon_free_page_support() & +!migration_in_postcopy(); } #define MAX_WAIT 50 /* ms, half buffered_file limit */ @@ -2235,6 +2244,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) return -1; } } + +if ((*rsp)->free_page_support) { +balloon_free_page_start(); +} + (*rsp)->f = f; rcu_read_lock(); @@ -2329,6 +2343,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) ret = qemu_file_get_error(f); if (ret < 0) { +if (rs->ram_bulk_stage && rs->free_page_support) { +balloon_free_page_stop(); +} return ret; } -- 1.8.3.1 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org