Re: [PATCH V2 19/19] vdpa: introduce virtio pci driver
On 12/4/20 7:20 AM, Stefano Garzarella wrote: > On Fri, Dec 04, 2020 at 12:03:53PM +0800, Jason Wang wrote: >> This patch introduce a vDPA driver for virtio-pci device. It bridges >> the virtio-pci control command to the vDPA bus. This will be used for >> features prototyping and testing. >> >> Note that get/restore virtqueue state is not supported which needs >> extension on the virtio specification. >> >> Signed-off-by: Jason Wang >> --- >> drivers/vdpa/Kconfig | 6 + >> drivers/vdpa/Makefile | 1 + >> drivers/vdpa/virtio_pci/Makefile | 2 + >> drivers/vdpa/virtio_pci/vp_vdpa.c | 455 ++ >> 4 files changed, 464 insertions(+) >> create mode 100644 drivers/vdpa/virtio_pci/Makefile >> create mode 100644 drivers/vdpa/virtio_pci/vp_vdpa.c >> >> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig >> index 358f6048dd3c..18cad14f533e 100644 >> --- a/drivers/vdpa/Kconfig >> +++ b/drivers/vdpa/Kconfig >> @@ -48,4 +48,10 @@ config MLX5_VDPA_NET >> be executed by the hardware. It also supports a variety of stateless >> offloads depending on the actual device used and firmware version. >> >> +config VP_VDPA >> + tristate "Virtio PCI bridge vDPA driver" >> + depends on PCI_MSI && VIRTIO_PCI_MODERN >> + help >> + This kernel module that bridges virtio PCI device to vDPA bus. > ^ > Without 'that' maybe sound better, but I'm not a native speaker :-) Yes, drop 'that', please. >> + >> endif # VDPA -- ~Randy ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 16/19] virtio-pci: introduce modern device module
Hi Jason-- On 12/3/20 8:03 PM, Jason Wang wrote: > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig > index 7b41130d3f35..d1a6bd2a975f 100644 > --- a/drivers/virtio/Kconfig > +++ b/drivers/virtio/Kconfig > @@ -12,6 +12,14 @@ config ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS > This option is selected if the architecture may need to enforce > VIRTIO_F_ACCESS_PLATFORM > > +config VIRTIO_PCI_MODERN > + tristate "Modern Virtio PCI Device" > + depends on PCI > + help > + Modern PCI device implementation. This module implement the implements > + basic probe and control for devices which is based on modern are > + PCI device with possible vendor specific extensions. devices > + cheers. -- ~Randy ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper
On 04/12/20 16:49, Sasha Levin wrote: On Fri, Dec 04, 2020 at 09:27:28AM +0100, Paolo Bonzini wrote: On 01/12/20 00:59, Sasha Levin wrote: It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal. The maintainers are _already_ marking patches with "Cc: stable". That They're not, though. Some forget, some subsystems don't mark anything, some don't mark it as it's not stable material when it lands in their tree but then it turns out to be one if it sits there for too long. That means some subsystems will be worse as far as stable release support goes. That's not a problem: - some subsystems have people paid to do backports to LTS releases when patches don't apply; others don't, if the patch doesn't apply the bug is simply not fixed in LTS releases - some subsystems are worse than others even in "normal" releases :) (plus backports) is where the burden on maintainers should start and end. I don't see the need to second guess them. This is similar to describing our CI infrastructure as "second guessing": why are we second guessing authors and maintainers who are obviously doing the right thing by testing their patches and reporting issues to them? No, it's not the same. CI helps finding bugs before you have to waste time spending bisecting regressions across thousands of commits. The lack of stable tags _can_ certainly be a problem, but it solves itself sooner or later when people upgrade their kernel. Are you saying that you have always gotten stable tags right? never missed a stable tag where one should go? Of course I did, just like I have introduced bugs. But at least I try to do my best both at adding stable tags and at not introducing bugs. Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper
On Fri, 2020-12-04 at 10:49 -0500, Sasha Levin wrote: > On Fri, Dec 04, 2020 at 09:27:28AM +0100, Paolo Bonzini wrote: > > On 01/12/20 00:59, Sasha Levin wrote: > > > > > > It's quite easy to NAK a patch too, just reply saying "no" and it'll be > > > dropped (just like this patch was dropped right after your first reply) > > > so the burden on maintainers is minimal. > > > > The maintainers are _already_ marking patches with "Cc: stable". That > > They're not, though. Some forget, some subsystems don't mark anything, > some don't mark it as it's not stable material when it lands in their > tree but then it turns out to be one if it sits there for too long. > > > (plus backports) is where the burden on maintainers should start and > > end. I don't see the need to second guess them. > > This is similar to describing our CI infrastructure as "second > guessing": why are we second guessing authors and maintainers who are > obviously doing the right thing by testing their patches and reporting > issues to them? > > Are you saying that you have always gotten stable tags right? never > missed a stable tag where one should go? I think this simply adds to the burden of being a maintainer without all that much value. I think the primary value here would be getting people to upgrade to current versions rather than backporting to nominally stable and relatively actively changed old versions. This is very much related to this thread about trivial patches and maintainer burdening: https://lore.kernel.org/lkml/1c7d7fde126bc0acf825766de64bf2f9b888f216.ca...@hansenpartnership.com/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 0/8] vhost: allow userspace to control vq cpu affinity
Hi Mike, On Fri, Dec 04, 2020 at 01:56:25AM -0600, Mike Christie wrote: These patches were made over mst's vhost branch. The following patches, made over mst's vhost branch, allow userspace to set each vq's cpu affinity. Currently, with cgroups the worker thread inherits the affinity settings, but we are at the mercy of the CPU scheduler for where the vq's IO will be executed on. This can result in the scheduler sometimes hammering a couple queues on the host instead of spreading it out like how the guest's app might have intended if it was mq aware. This version of the patches is not what you guys were talking about initially like with the interface that was similar to nbd's old (3.x kernel days) NBD_DO_IT ioctl where userspace calls down to the kernel and we run from that context. These patches instead just allow userspace to tell the kernel which CPU a vq should run on. We then use the kernel's workqueue code to handle the thread management. I agree that reusing kernel's workqueue code would be a good strategy. One concern is how easy it is to implement an adaptive polling strategy using workqueues. From what I've seen, adding some polling of both backend and virtqueue helps to eliminate interrupts and reduce latency. Anyway, I'll take a closer look at your patches next week. :-) Thanks, Stefano I wanted to post this version first, because it is flexible in that userspace can set things up so devs/vqs share threads/CPUs and we don't have to worry about replicating a bunch of features that the workqueue code already has like dynamic thread creation, blocked work detection, idle thread detection and thread reaping, and it also has an interface to control how many threads can be created and which CPUs work can run on if we want to further restrict that from userspace. Note that these patches have been lightly tested. I more wanted to get comments on the overall approach, because I know it's not really what you were thinking about. But while I worked on being able to share threads with multiple devices, I kept coming back to the existing workqueue code and thinking I'll just copy and paste that. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper
On Fri, Dec 04, 2020 at 09:27:28AM +0100, Paolo Bonzini wrote: On 01/12/20 00:59, Sasha Levin wrote: It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal. The maintainers are _already_ marking patches with "Cc: stable". That They're not, though. Some forget, some subsystems don't mark anything, some don't mark it as it's not stable material when it lands in their tree but then it turns out to be one if it sits there for too long. (plus backports) is where the burden on maintainers should start and end. I don't see the need to second guess them. This is similar to describing our CI infrastructure as "second guessing": why are we second guessing authors and maintainers who are obviously doing the right thing by testing their patches and reporting issues to them? Are you saying that you have always gotten stable tags right? never missed a stable tag where one should go? -- Thanks, Sasha ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 19/19] vdpa: introduce virtio pci driver
On Fri, Dec 04, 2020 at 12:03:53PM +0800, Jason Wang wrote: This patch introduce a vDPA driver for virtio-pci device. It bridges the virtio-pci control command to the vDPA bus. This will be used for features prototyping and testing. Note that get/restore virtqueue state is not supported which needs extension on the virtio specification. Signed-off-by: Jason Wang --- drivers/vdpa/Kconfig | 6 + drivers/vdpa/Makefile | 1 + drivers/vdpa/virtio_pci/Makefile | 2 + drivers/vdpa/virtio_pci/vp_vdpa.c | 455 ++ 4 files changed, 464 insertions(+) create mode 100644 drivers/vdpa/virtio_pci/Makefile create mode 100644 drivers/vdpa/virtio_pci/vp_vdpa.c diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index 358f6048dd3c..18cad14f533e 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -48,4 +48,10 @@ config MLX5_VDPA_NET be executed by the hardware. It also supports a variety of stateless offloads depending on the actual device used and firmware version. +config VP_VDPA + tristate "Virtio PCI bridge vDPA driver" + depends on PCI_MSI && VIRTIO_PCI_MODERN + help + This kernel module that bridges virtio PCI device to vDPA bus. ^ Without 'that' maybe sound better, but I'm not a native speaker :-) + endif # VDPA diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index d160e9b63a66..67fe7f3d6943 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ obj-$(CONFIG_IFCVF)+= ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ +obj-$(CONFIG_VP_VDPA)+= virtio_pci/ diff --git a/drivers/vdpa/virtio_pci/Makefile b/drivers/vdpa/virtio_pci/Makefile new file mode 100644 index ..231088d3af7d --- /dev/null +++ b/drivers/vdpa/virtio_pci/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_VP_VDPA) += vp_vdpa.o diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c new file mode 100644 index ..fa0321e77fb3 --- /dev/null +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -0,0 +1,455 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * vDPA bridge driver for modern virtio-pci device + * + * Copyright (c) 2020, Red Hat Inc. All rights reserved. + * Author: Jason Wang + * + * Based on virtio_pci_modern.c. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define VP_VDPA_QUEUE_MAX 256 +#define VP_VDPA_DRIVER_NAME "vp_vdpa" + +struct vp_vring { + void __iomem *notify; + char msix_name[256]; Can we use a macro for the msix_name size, since we use 256 in multiple places? + struct vdpa_callback cb; + int irq; +}; + +struct vp_vdpa { + struct vdpa_device vdpa; + struct virtio_pci_modern_device mdev; + struct vp_vring *vring; + struct vdpa_callback cb; ^ It is not relevant, but 'config_cb' is maybe clearer to read. The rest looks good. Thanks, Stefano + char msix_name[256]; + int config_irq; + int queues; + int vectors; +}; + +static struct vp_vdpa *vdpa_to_vp(struct vdpa_device *vdpa) +{ + return container_of(vdpa, struct vp_vdpa, vdpa); +} + +static struct virtio_pci_modern_device *vdpa_to_mdev(struct vdpa_device *vdpa) +{ + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); + + return _vdpa->mdev; +} + +static u64 vp_vdpa_get_features(struct vdpa_device *vdpa) +{ + struct virtio_pci_modern_device *mdev = vdpa_to_mdev(vdpa); + + return vp_modern_get_features(mdev); +} + +static int vp_vdpa_set_features(struct vdpa_device *vdpa, u64 features) +{ + struct virtio_pci_modern_device *mdev = vdpa_to_mdev(vdpa); + + vp_modern_set_features(mdev, features); + + return 0; +} + +static u8 vp_vdpa_get_status(struct vdpa_device *vdpa) +{ + struct virtio_pci_modern_device *mdev = vdpa_to_mdev(vdpa); + + return vp_modern_get_status(mdev); +} + +static void vp_vdpa_free_irq(struct vp_vdpa *vp_vdpa) +{ + struct virtio_pci_modern_device *mdev = _vdpa->mdev; + struct pci_dev *pdev = mdev->pci_dev; + int i; + + for (i = 0; i < vp_vdpa->queues; i++) { + if (vp_vdpa->vring[i].irq != VIRTIO_MSI_NO_VECTOR) { + vp_modern_queue_vector(mdev, i, VIRTIO_MSI_NO_VECTOR); + devm_free_irq(>dev, vp_vdpa->vring[i].irq, + _vdpa->vring[i]); + vp_vdpa->vring[i].irq = VIRTIO_MSI_NO_VECTOR; + } + } + + if (vp_vdpa->config_irq != VIRTIO_MSI_NO_VECTOR) { + vp_modern_config_vector(mdev, VIRTIO_MSI_NO_VECTOR); + devm_free_irq(>dev, vp_vdpa->config_irq, vp_vdpa); + vp_vdpa->config_irq = VIRTIO_MSI_NO_VECTOR; + } + + if (vp_vdpa->vectors) {
Re: [PATCH V2 17/19] vdpa: set the virtqueue num during register
On Fri, Dec 04, 2020 at 12:03:51PM +0800, Jason Wang wrote: This patch delay the queue number setting to vDPA device registering. This allows us to probe the virtqueue numbers between device allocation and registering. Signed-off-by: Jason Wang --- drivers/vdpa/ifcvf/ifcvf_main.c | 5 ++--- drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 ++--- drivers/vdpa/vdpa.c | 8 drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 ++-- include/linux/vdpa.h | 7 +++ 5 files changed, 13 insertions(+), 16 deletions(-) Reviewed-by: Stefano Garzarella diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index 8b4028556cb6..d65f3221d8ed 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -438,8 +438,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) } adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, - dev, _vdpa_ops, - IFCVF_MAX_QUEUE_PAIRS * 2); + dev, _vdpa_ops); if (adapter == NULL) { IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); return -ENOMEM; @@ -463,7 +462,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) vf->vring[i].irq = -EINVAL; - ret = vdpa_register_device(>vdpa); + ret = vdpa_register_device(>vdpa, IFCVF_MAX_QUEUE_PAIRS * 2); if (ret) { IFCVF_ERR(pdev, "Failed to register ifcvf to vdpa bus"); goto err; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 1fa6fcac8299..3c3eb2a02f76 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1940,8 +1940,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev) max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues); max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS); - ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, _vdpa_ops, -2 * mlx5_vdpa_max_qps(max_vqs)); + ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, _vdpa_ops); if (IS_ERR(ndev)) return ndev; @@ -1968,7 +1967,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev) if (err) goto err_res; - err = vdpa_register_device(>vdev); + err = vdpa_register_device(>vdev, 2 * mlx5_vdpa_max_qps(max_vqs)); if (err) goto err_reg; diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index a69ffc991e13..ba89238f9898 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -61,7 +61,6 @@ static void vdpa_release_dev(struct device *d) * initialized but before registered. * @parent: the parent device * @config: the bus operations that is supported by this device - * @nvqs: number of virtqueues supported by this device * @size: size of the parent structure that contains private data * * Driver should use vdpa_alloc_device() wrapper macro instead of @@ -72,7 +71,6 @@ static void vdpa_release_dev(struct device *d) */ struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - int nvqs, size_t size) { struct vdpa_device *vdev; @@ -99,7 +97,6 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, vdev->index = err; vdev->config = config; vdev->features_valid = false; - vdev->nvqs = nvqs; err = dev_set_name(>dev, "vdpa%u", vdev->index); if (err) @@ -122,11 +119,14 @@ EXPORT_SYMBOL_GPL(__vdpa_alloc_device); * vdpa_register_device - register a vDPA device * Callers must have a succeed call of vdpa_alloc_device() before. * @vdev: the vdpa device to be registered to vDPA bus + * @nvqs: number of virtqueues supported by this device * * Returns an error when fail to add to vDPA bus */ -int vdpa_register_device(struct vdpa_device *vdev) +int vdpa_register_device(struct vdpa_device *vdev, int nvqs) { + vdev->nvqs = nvqs; + return device_add(>dev); } EXPORT_SYMBOL_GPL(vdpa_register_device); diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 6a90fdb9cbfc..b129cb4dd013 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -357,7 +357,7 @@ static struct vdpasim *vdpasim_create(void) else ops = _net_config_ops; - vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, VDPASIM_VQ_NUM); + vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops); if (!vdpasim) goto err_alloc; @@ -393,7 +393,7 @@ static struct vdpasim
Re: [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper
On 01/12/20 00:59, Sasha Levin wrote: It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal. The maintainers are _already_ marking patches with "Cc: stable". That (plus backports) is where the burden on maintainers should start and end. I don't see the need to second guess them. Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 5/8] vhost: allow userspace to bind vqs to CPUs
On 2020/12/4 下午3:56, Mike Christie wrote: +static long vhost_vring_set_cpu(struct vhost_dev *d, struct vhost_virtqueue *vq, + void __user *argp) +{ + struct vhost_vring_state s; + int ret = 0; + + if (vq->private_data) + return -EBUSY; + + if (copy_from_user(, argp, sizeof s)) + return -EFAULT; + + if (s.num == -1) { + vq->cpu = s.num; + return 0; + } + + if (s.num >= nr_cpu_ids) + return -EINVAL; + + if (!d->ops || !d->ops->get_workqueue) + return -EINVAL; + + if (!d->wq) + d->wq = d->ops->get_workqueue(); + if (!d->wq) + return -EINVAL; + + vq->cpu = s.num; + return ret; +} So one question here. Who is in charge of doing this set_cpu? Note that sched_setaffinity(2) requires CAP_SYS_NICE to work, so I wonder whether or not it's legal for unprivileged Qemu to do this. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization