RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft > Hypervisor when Linux runs as the root partition. Implement an IRQ > domain to handle mapping and unmapping of IO-APIC interrupts. > > Signed-off-by: Wei Liu > --- > arch/x86/hyperv/irqdomain.c | 54 ++ > arch/x86/include/asm/mshyperv.h | 4 + > drivers/iommu/hyperv-iommu.c| 179 +++- > 3 files changed, 233 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > index 19637cd60231..8e2b4e478b70 100644 > --- a/arch/x86/hyperv/irqdomain.c > +++ b/arch/x86/hyperv/irqdomain.c > @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void) > } > > #endif /* CONFIG_PCI_MSI */ > + > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > *entry) > +{ > + union hv_device_id device_id; > + > + device_id.as_uint64 = 0; > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; > + device_id.ioapic.ioapic_id = (u8)ioapic_id; > + > + return hv_unmap_interrupt(device_id.as_uint64, entry) & > HV_HYPERCALL_RESULT_MASK; The masking is already done in hv_unmap_interrupt. > +} > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt); > + > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector, > + struct hv_interrupt_entry *entry) > +{ > + unsigned long flags; > + struct hv_input_map_device_interrupt *input; > + struct hv_output_map_device_interrupt *output; > + union hv_device_id device_id; > + struct hv_device_interrupt_descriptor *intr_desc; > + u16 status; > + > + device_id.as_uint64 = 0; > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; > + device_id.ioapic.ioapic_id = (u8)ioapic_id; > + > + local_irq_save(flags); > + input = *this_cpu_ptr(hyperv_pcpu_input_arg); > + output = *this_cpu_ptr(hyperv_pcpu_output_arg); > + memset(input, 0, sizeof(*input)); > + intr_desc = >interrupt_descriptor; > + input->partition_id = hv_current_partition_id; > + input->device_id = device_id.as_uint64; > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED; > + intr_desc->target.vector = vector; > + intr_desc->vector_count = 1; > + > + if (level) > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL; > + else > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; > + > + __set_bit(vcpu, (unsigned long *)_desc->target.vp_mask); > + > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, > output) & > + HV_HYPERCALL_RESULT_MASK; > + local_irq_restore(flags); > + > + *entry = output->interrupt_entry; > + > + return status; As a cross-check, I was comparing this code against hv_map_msi_interrupt(). They are mostly parallel, though some of the assignments are done in a different order. It's a nit, but making them as parallel as possible would be nice. :-) Same 64 vCPU comment applies here as well. > +} > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt); > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h > index ccc849e25d5e..345d7c6f8c37 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union > hv_msi_entry *msi_entry, > > struct irq_domain *hv_create_pci_msi_domain(void); > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector, > + struct hv_interrupt_entry *entry); > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > *entry); > + > #else /* CONFIG_HYPERV */ > static inline void hyperv_init(void) {} > static inline void hyperv_setup_mmu_ops(void) {} > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c > index b7db6024e65c..6d35e4c303c6 100644 > --- a/drivers/iommu/hyperv-iommu.c > +++ b/drivers/iommu/hyperv-iommu.c > @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops > = { > .free = hyperv_irq_remapping_free, > }; > > +static const struct irq_domain_ops hyperv_root_ir_domain_ops; > static int __init hyperv_prepare_irq_remapping(void) > { > struct fwnode_handle *fn; > int i; > + const char *name; > + const struct irq_domain_ops *ops; > > if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) || > x86_init.hyper.msi_ext_dest_id() || > - !x2apic_supported() || hv_root_partition) > + !x2apic_supported()) Any reason that the check for hv_root_partition was added in patch #4 of this series, and then removed here? Could patch #4 just be dropped? > return -ENODEV; > > - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0); > + if (hv_root_partition) { > + name = "HYPERV-ROOT-IR"; > + ops =
RE: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition
From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM > > When Linux runs as the root partition on Microsoft Hypervisor, its > interrupts are remapped. Linux will need to explicitly map and unmap > interrupts for hardware. > > Implement an MSI domain to issue the correct hypercalls. And initialize > this irqdomain as the default MSI irq domain. > > Signed-off-by: Sunil Muthuswamy > Co-Developed-by: Sunil Muthuswamy > Signed-off-by: Wei Liu > --- > v4: Fix compilation issue when CONFIG_PCI_MSI is not set. > v3: build irqdomain.o for 32bit as well. I'm not clear on the intent for 32-bit builds. Given that hv_proc.c is built only for 64-bit, I'm assuming running Linux in the root partition is only functional for 64-bit builds. So is the goal simply that 32-bit builds will compile correctly? Seems like maybe there should be a CONFIG option for running Linux in the root partition, and that option would force 64-bit. > v2: This patch is simplified due to upstream changes. > --- > arch/x86/hyperv/Makefile| 2 +- > arch/x86/hyperv/hv_init.c | 9 + > arch/x86/hyperv/irqdomain.c | 332 > arch/x86/include/asm/mshyperv.h | 2 + > 4 files changed, 344 insertions(+), 1 deletion(-) > create mode 100644 arch/x86/hyperv/irqdomain.c > > diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile > index 565358020921..48e2c51464e8 100644 > --- a/arch/x86/hyperv/Makefile > +++ b/arch/x86/hyperv/Makefile > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0-only > -obj-y:= hv_init.o mmu.o nested.o > +obj-y:= hv_init.o mmu.o nested.o irqdomain.o > obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o > > ifdef CONFIG_X86_64 > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index ad8e77859b32..1cb2f7d1850a 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -484,6 +484,15 @@ void __init hyperv_init(void) > > BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull); > > +#ifdef CONFIG_PCI_MSI > + /* > + * If we're running as root, we want to create our own PCI MSI domain. > + * We can't set this in hv_pci_init because that would be too late. > + */ > + if (hv_root_partition) > + x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain; > +#endif > + > return; > > remove_cpuhp_state: > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > new file mode 100644 > index ..19637cd60231 > --- /dev/null > +++ b/arch/x86/hyperv/irqdomain.c > @@ -0,0 +1,332 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// > +// Irqdomain for Linux to run as the root partition on Microsoft Hypervisor. > +// > +// Authors: > +// Sunil Muthuswamy > +// Wei Liu I think the // comment style should only be used for the SPDX line. > + > +#include > +#include > +#include > + > +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry) > +{ > + unsigned long flags; > + struct hv_input_unmap_device_interrupt *input; > + struct hv_interrupt_entry *intr_entry; > + u16 status; > + > + local_irq_save(flags); > + input = *this_cpu_ptr(hyperv_pcpu_input_arg); > + > + memset(input, 0, sizeof(*input)); > + intr_entry = >interrupt_entry; > + input->partition_id = hv_current_partition_id; > + input->device_id = id; > + *intr_entry = *old_entry; > + > + status = hv_do_rep_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, 0, 0, > input, NULL) & > + HV_HYPERCALL_RESULT_MASK; > + local_irq_restore(flags); > + > + return status; > +} > + > +#ifdef CONFIG_PCI_MSI > +struct rid_data { > + struct pci_dev *bridge; > + u32 rid; > +}; > + > +static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data) > +{ > + struct rid_data *rd = data; > + u8 bus = PCI_BUS_NUM(rd->rid); > + > + if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) { > + rd->bridge = pdev; > + rd->rid = alias; > + } > + > + return 0; > +} > + > +static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev) > +{ > + union hv_device_id dev_id; > + struct rid_data data = { > + .bridge = NULL, > + .rid = PCI_DEVID(dev->bus->number, dev->devfn) > + }; > + > + pci_for_each_dma_alias(dev, get_rid_cb, ); > + > + dev_id.as_uint64 = 0; > + dev_id.device_type = HV_DEVICE_TYPE_PCI; > + dev_id.pci.segment = pci_domain_nr(dev->bus); > + > + dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid); > + dev_id.pci.bdf.device = PCI_SLOT(data.rid); > + dev_id.pci.bdf.function = PCI_FUNC(data.rid); > + dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE; > + > + if (data.bridge) { > + int pos; > + > + /* > + * Microsoft Hypervisor requires a bus range when the bridge is > + * running in PCI-X mode. > +
Re: [RFC v3 06/11] vhost-vdpa: Add an opaque pointer for vhost IOTLB
On 2021/1/20 下午3:52, Yongji Xie wrote: On Wed, Jan 20, 2021 at 2:24 PM Jason Wang wrote: On 2021/1/19 下午12:59, Xie Yongji wrote: Add an opaque pointer for vhost IOTLB to store the corresponding vma->vm_file and offset on the DMA mapping. Let's split the patch into two. 1) opaque pointer 2) vma stuffs OK. It will be used in VDUSE case later. Suggested-by: Jason Wang Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 11 --- drivers/vhost/iotlb.c| 5 ++- drivers/vhost/vdpa.c | 66 +++- drivers/vhost/vhost.c| 4 +-- include/linux/vdpa.h | 3 +- include/linux/vhost_iotlb.h | 8 - 6 files changed, 79 insertions(+), 18 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 03c796873a6b..1ffcef67954f 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -279,7 +279,7 @@ static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, */ spin_lock(>iommu_lock); ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1, - pa, dir_to_perm(dir)); + pa, dir_to_perm(dir), NULL); Maybe its better to introduce vhost_iotlb_add_range_ctx() which can accepts the opaque (context). And let vhost_iotlb_add_range() just call that. If so, we need export both vhost_iotlb_add_range() and vhost_iotlb_add_range_ctx() which will be used in VDUSE driver. Is it a bit redundant? Probably not, we do something similar in virtio core: void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len, void **ctx) { struct vring_virtqueue *vq = to_vvq(_vq); return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) : virtqueue_get_buf_ctx_split(_vq, len, ctx); } EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx); void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len) { return virtqueue_get_buf_ctx(_vq, len, NULL); } EXPORT_SYMBOL_GPL(virtqueue_get_buf); spin_unlock(>iommu_lock); if (ret) return DMA_MAPPING_ERROR; @@ -317,7 +317,7 @@ static void *vdpasim_alloc_coherent(struct device *dev, size_t size, ret = vhost_iotlb_add_range(iommu, (u64)pa, (u64)pa + size - 1, - pa, VHOST_MAP_RW); + pa, VHOST_MAP_RW, NULL); if (ret) { *dma_addr = DMA_MAPPING_ERROR; kfree(addr); @@ -625,7 +625,8 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, for (map = vhost_iotlb_itree_first(iotlb, start, last); map; map = vhost_iotlb_itree_next(map, start, last)) { ret = vhost_iotlb_add_range(vdpasim->iommu, map->start, - map->last, map->addr, map->perm); + map->last, map->addr, + map->perm, NULL); if (ret) goto err; } @@ -639,14 +640,14 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, } static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size, -u64 pa, u32 perm) +u64 pa, u32 perm, void *opaque) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); int ret; spin_lock(>iommu_lock); ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa, - perm); + perm, NULL); spin_unlock(>iommu_lock); return ret; diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c index 0fd3f87e913c..3bd5bd06cdbc 100644 --- a/drivers/vhost/iotlb.c +++ b/drivers/vhost/iotlb.c @@ -42,13 +42,15 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_map_free); * @last: last of IOVA range * @addr: the address that is mapped to @start * @perm: access permission of this range + * @opaque: the opaque pointer for the IOTLB mapping * * Returns an error last is smaller than start or memory allocation * fails */ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last, - u64 addr, unsigned int perm) + u64 addr, unsigned int perm, + void *opaque) { struct vhost_iotlb_map *map; @@ -71,6 +73,7 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, map->last = last; map->addr = addr; map->perm = perm; + map->opaque = opaque; iotlb->nmaps++; vhost_iotlb_itree_insert(map, >root); diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 36b6950ba37f..e83e5be7cec8 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -488,6
Re: [RFC v3 05/11] vdpa: shared virtual addressing support
On 2021/1/20 下午3:10, Yongji Xie wrote: On Wed, Jan 20, 2021 at 1:55 PM Jason Wang wrote: On 2021/1/19 下午12:59, Xie Yongji wrote: This patches introduces SVA (Shared Virtual Addressing) support for vDPA device. During vDPA device allocation, vDPA device driver needs to indicate whether SVA is supported by the device. Then vhost-vdpa bus driver will not pin user page and transfer userspace virtual address instead of physical address during DMA mapping. Suggested-by: Jason Wang Signed-off-by: Xie Yongji --- drivers/vdpa/ifcvf/ifcvf_main.c | 2 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- drivers/vdpa/vdpa.c | 5 - drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++- drivers/vhost/vdpa.c | 35 +++ include/linux/vdpa.h | 10 +++--- 6 files changed, 38 insertions(+), 19 deletions(-) diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index 23474af7da40..95c4601f82f5 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -439,7 +439,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, dev, _vdpa_ops, - IFCVF_MAX_QUEUE_PAIRS * 2, NULL); + IFCVF_MAX_QUEUE_PAIRS * 2, NULL, false); if (adapter == NULL) { IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); return -ENOMEM; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 77595c81488d..05988d6907f2 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1959,7 +1959,7 @@ static int mlx5v_probe(struct auxiliary_device *adev, max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS); ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, _vdpa_ops, - 2 * mlx5_vdpa_max_qps(max_vqs), NULL); + 2 * mlx5_vdpa_max_qps(max_vqs), NULL, false); if (IS_ERR(ndev)) return PTR_ERR(ndev); diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 32bd48baffab..50cab930b2e5 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -72,6 +72,7 @@ static void vdpa_release_dev(struct device *d) * @nvqs: number of virtqueues supported by this device * @size: size of the parent structure that contains private data * @name: name of the vdpa device; optional. + * @sva: indicate whether SVA (Shared Virtual Addressing) is supported * * Driver should use vdpa_alloc_device() wrapper macro instead of * using this directly. @@ -81,7 +82,8 @@ static void vdpa_release_dev(struct device *d) */ struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - int nvqs, size_t size, const char *name) + int nvqs, size_t size, const char *name, + bool sva) { struct vdpa_device *vdev; int err = -EINVAL; @@ -108,6 +110,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, vdev->config = config; vdev->features_valid = false; vdev->nvqs = nvqs; + vdev->sva = sva; if (name) err = dev_set_name(>dev, "%s", name); diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 85776e4e6749..03c796873a6b 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -367,7 +367,8 @@ static struct vdpasim *vdpasim_create(const char *name) else ops = _net_config_ops; - vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, VDPASIM_VQ_NUM, name); + vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, + VDPASIM_VQ_NUM, name, false); if (!vdpasim) goto err_alloc; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 4a241d380c40..36b6950ba37f 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -486,21 +486,25 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) { struct vhost_dev *dev = >vdev; + struct vdpa_device *vdpa = v->vdpa; struct vhost_iotlb *iotlb = dev->iotlb; struct vhost_iotlb_map *map; struct page *page; unsigned long pfn, pinned; while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) { - pinned = map->size >> PAGE_SHIFT; - for (pfn = map->addr >> PAGE_SHIFT; - pinned > 0; pfn++, pinned--) { - page = pfn_to_page(pfn); - if (map->perm & VHOST_ACCESS_WO)
Re: [RFC v3 01/11] eventfd: track eventfd_signal() recursion depth separately in different cases
On 2021/1/20 下午2:52, Yongji Xie wrote: On Wed, Jan 20, 2021 at 12:24 PM Jason Wang wrote: On 2021/1/19 下午12:59, Xie Yongji wrote: Now we have a global percpu counter to limit the recursion depth of eventfd_signal(). This can avoid deadlock or stack overflow. But in stack overflow case, it should be OK to increase the recursion depth if needed. So we add a percpu counter in eventfd_ctx to limit the recursion depth for deadlock case. Then it could be fine to increase the global percpu counter later. I wonder whether or not it's worth to introduce percpu for each eventfd. How about simply check if eventfd_signal_count() is greater than 2? It can't avoid deadlock in this way. I may miss something but the count is to avoid recursive eventfd call. So for VDUSE what we suffers is e.g the interrupt injection path: userspace write IRQFD -> vq->cb() -> another IRQFD. It looks like increasing EVENTFD_WAKEUP_DEPTH should be sufficient? Thanks So we need a percpu counter for each eventfd to limit the recursion depth for deadlock cases. And using a global percpu counter to avoid stack overflow. Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v3 03/11] vdpa: Remove the restriction that only supports virtio-net devices
On 2021/1/20 下午7:08, Stefano Garzarella wrote: On Wed, Jan 20, 2021 at 11:46:38AM +0800, Jason Wang wrote: On 2021/1/19 下午12:59, Xie Yongji wrote: With VDUSE, we should be able to support all kinds of virtio devices. Signed-off-by: Xie Yongji --- drivers/vhost/vdpa.c | 29 +++-- 1 file changed, 3 insertions(+), 26 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 29ed4173f04e..448be7875b6d 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "vhost.h" @@ -185,26 +186,6 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) return 0; } -static int vhost_vdpa_config_validate(struct vhost_vdpa *v, - struct vhost_vdpa_config *c) -{ - long size = 0; - - switch (v->virtio_id) { - case VIRTIO_ID_NET: - size = sizeof(struct virtio_net_config); - break; - } - - if (c->len == 0) - return -EINVAL; - - if (c->len > size - c->off) - return -E2BIG; - - return 0; -} I think we should use a separate patch for this. For the vdpa-blk simulator I had the same issues and I'm adding a .get_config_size() callback to vdpa devices. Do you think make sense or is better to remove this check in vhost/vdpa, delegating the boundaries checks to get_config/set_config callbacks. A question here. How much value could we gain from get_config_size() consider we can let vDPA parent to validate the length in its get_config(). Thanks Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] Fix "ordering" comment typos
From: Bjorn Helgaas Fix comment typos in "ordering". Signed-off-by: Bjorn Helgaas --- arch/s390/include/asm/facility.h | 2 +- drivers/gpu/drm/qxl/qxl_drv.c| 2 +- drivers/net/wireless/intel/iwlwifi/fw/file.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) Unless somebody objects, I'll just merge these typo fixes via the PCI tree. diff --git a/arch/s390/include/asm/facility.h b/arch/s390/include/asm/facility.h index 68c476b20b57..91b5d714d28f 100644 --- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -44,7 +44,7 @@ static inline int __test_facility(unsigned long nr, void *facilities) } /* - * The test_facility function uses the bit odering where the MSB is bit 0. + * The test_facility function uses the bit ordering where the MSB is bit 0. * That makes it easier to query facility bits with the bit number as * documented in the Principles of Operation. */ diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c index 6e7f16f4cec7..dab190a547cc 100644 --- a/drivers/gpu/drm/qxl/qxl_drv.c +++ b/drivers/gpu/drm/qxl/qxl_drv.c @@ -141,7 +141,7 @@ static void qxl_drm_release(struct drm_device *dev) /* * TODO: qxl_device_fini() call should be in qxl_pci_remove(), -* reodering qxl_modeset_fini() + qxl_device_fini() calls is +* reordering qxl_modeset_fini() + qxl_device_fini() calls is * non-trivial though. */ qxl_modeset_fini(qdev); diff --git a/drivers/net/wireless/intel/iwlwifi/fw/file.h b/drivers/net/wireless/intel/iwlwifi/fw/file.h index 597bc88479ba..04fbfe5cbeb0 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/file.h +++ b/drivers/net/wireless/intel/iwlwifi/fw/file.h @@ -866,7 +866,7 @@ struct iwl_fw_dbg_trigger_time_event { * tx_bar: tid bitmap to configure on what tid the trigger should occur * when a BAR is send (for an Rx BlocAck session). * frame_timeout: tid bitmap to configure on what tid the trigger should occur - * when a frame times out in the reodering buffer. + * when a frame times out in the reordering buffer. */ struct iwl_fw_dbg_trigger_ba { __le16 rx_ba_start; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 0/3] VMCI: Queue pair bug fixes
On Wed, Jan 20, 2021 at 08:32:04AM -0800, Jorgen Hansen wrote: > This series contains three bug fixes for the queue pair > implementation in the VMCI driver. > > v1 -> v2: > - format patches as a series > - use min_t instead of min to ensure size_t comparison > (issue pointed out by kernel test robot ) > > Jorgen Hansen (3): > VMCI: Stop log spew when qp allocation isn't possible > VMCI: Use set_page_dirty_lock() when unregistering guest memory > VMCI: Enforce queuepair max size for IOCTL_VMCI_QUEUEPAIR_ALLOC > > drivers/misc/vmw_vmci/vmci_queue_pair.c | 16 ++-- > include/linux/vmw_vmci_defs.h | 4 ++-- > 2 files changed, 12 insertions(+), 8 deletions(-) > > -- > 2.6.2 > Please in the future properly thread your emails so that tools like 'b4' can pick them all up at once. thanks, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 1/5] drm/qxl: use drmm_mode_config_init
Signed-off-by: Gerd Hoffmann Reviewed-by: Daniel Vetter Acked-by: Thomas Zimmermann --- drivers/gpu/drm/qxl/qxl_display.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 012bce0cdb65..38d6b596094d 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -1195,7 +1195,9 @@ int qxl_modeset_init(struct qxl_device *qdev) int i; int ret; - drm_mode_config_init(>ddev); + ret = drmm_mode_config_init(>ddev); + if (ret) + return ret; ret = qxl_create_monitors_object(qdev); if (ret) @@ -1228,5 +1230,4 @@ int qxl_modeset_init(struct qxl_device *qdev) void qxl_modeset_fini(struct qxl_device *qdev) { qxl_destroy_monitors_object(qdev); - drm_mode_config_cleanup(>ddev); } -- 2.29.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 2/5] drm/qxl: unpin release objects
Balances the qxl_create_bo(..., pinned=true, ...); call in qxl_release_bo_alloc(). Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/qxl/qxl_release.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c index c52412724c26..28013fd1f8ea 100644 --- a/drivers/gpu/drm/qxl/qxl_release.c +++ b/drivers/gpu/drm/qxl/qxl_release.c @@ -347,6 +347,7 @@ int qxl_alloc_release_reserved(struct qxl_device *qdev, unsigned long size, mutex_lock(>release_mutex); if (qdev->current_release_bo_offset[cur_idx] + 1 >= releases_per_bo[cur_idx]) { + qxl_bo_unpin(qdev->current_release_bo[cur_idx]); qxl_bo_unref(>current_release_bo[cur_idx]); qdev->current_release_bo_offset[cur_idx] = 0; qdev->current_release_bo[cur_idx] = NULL; -- 2.29.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 5/5] drm/qxl: properly free qxl releases
Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/qxl/qxl_drv.h | 1 + drivers/gpu/drm/qxl/qxl_kms.c | 22 -- drivers/gpu/drm/qxl/qxl_release.c | 2 ++ 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h index 01354b43c413..1c57b587b6a7 100644 --- a/drivers/gpu/drm/qxl/qxl_drv.h +++ b/drivers/gpu/drm/qxl/qxl_drv.h @@ -214,6 +214,7 @@ struct qxl_device { spinlock_t release_lock; struct idr release_idr; uint32_trelease_seqno; + atomic_trelease_count; spinlock_t release_idr_lock; struct mutexasync_io_mutex; unsigned int last_sent_io_cmd; diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c index 4a60a52ab62e..f177f72bfc12 100644 --- a/drivers/gpu/drm/qxl/qxl_kms.c +++ b/drivers/gpu/drm/qxl/qxl_kms.c @@ -25,6 +25,7 @@ #include #include +#include #include #include @@ -286,8 +287,25 @@ int qxl_device_init(struct qxl_device *qdev, void qxl_device_fini(struct qxl_device *qdev) { - qxl_bo_unref(>current_release_bo[0]); - qxl_bo_unref(>current_release_bo[1]); + int cur_idx, try; + + for (cur_idx = 0; cur_idx < 3; cur_idx++) { + if (!qdev->current_release_bo[cur_idx]) + continue; + qxl_bo_unpin(qdev->current_release_bo[cur_idx]); + qxl_bo_unref(>current_release_bo[cur_idx]); + qdev->current_release_bo_offset[cur_idx] = 0; + qdev->current_release_bo[cur_idx] = NULL; + } + + /* +* Ask host to release resources (+fill release ring), +* then wait for the release actually happening. +*/ + qxl_io_notify_oom(qdev); + for (try = 0; try < 20 && atomic_read(>release_count) > 0; try++) + msleep(20); + qxl_gem_fini(qdev); qxl_bo_fini(qdev); flush_work(>gc_work); diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c index 28013fd1f8ea..43a5436853b7 100644 --- a/drivers/gpu/drm/qxl/qxl_release.c +++ b/drivers/gpu/drm/qxl/qxl_release.c @@ -196,6 +196,7 @@ qxl_release_free(struct qxl_device *qdev, qxl_release_free_list(release); kfree(release); } + atomic_dec(>release_count); } static int qxl_release_bo_alloc(struct qxl_device *qdev, @@ -344,6 +345,7 @@ int qxl_alloc_release_reserved(struct qxl_device *qdev, unsigned long size, *rbo = NULL; return idr_ret; } + atomic_inc(>release_count); mutex_lock(>release_mutex); if (qdev->current_release_bo_offset[cur_idx] + 1 >= releases_per_bo[cur_idx]) { -- 2.29.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 3/5] drm/qxl: release shadow on shutdown
In case we have a shadow surface on shutdown release it so it doesn't leak. Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/qxl/qxl_display.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 38d6b596094d..60331e31861a 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -1229,5 +1229,9 @@ int qxl_modeset_init(struct qxl_device *qdev) void qxl_modeset_fini(struct qxl_device *qdev) { + if (qdev->dumb_shadow_bo) { + drm_gem_object_put(>dumb_shadow_bo->tbo.base); + qdev->dumb_shadow_bo = NULL; + } qxl_destroy_monitors_object(qdev); } -- 2.29.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 4/5] drm/qxl: handle shadow in primary destroy
qxl_primary_atomic_disable must check whenever the framebuffer bo has a shadow surface and in case it has check the shadow primary status. Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/qxl/qxl_display.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 60331e31861a..f5ee8cd72b5b 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -562,6 +562,8 @@ static void qxl_primary_atomic_disable(struct drm_plane *plane, if (old_state->fb) { struct qxl_bo *bo = gem_to_qxl_bo(old_state->fb->obj[0]); + if (bo->shadow) + bo = bo->shadow; if (bo->is_primary) { qxl_io_destroy_primary(qdev); bo->is_primary = false; -- 2.29.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1] mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE
On Tue, Jan 26, 2021 at 12:58:29PM +0100, David Hildenbrand wrote: > Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and > "mhp_flags". As discussed recently [1], "mhp" is our internal > acronym for memory hotplug now. > > [1] > https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c3...@redhat.com/ > > Cc: Andrew Morton > Cc: "K. Y. Srinivasan" > Cc: Haiyang Zhang > Cc: Stephen Hemminger > Cc: Wei Liu > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Boris Ostrovsky > Cc: Juergen Gross > Cc: Stefano Stabellini > Cc: Pankaj Gupta > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: Anshuman Khandual > Cc: Wei Yang > Cc: linux-hyp...@vger.kernel.org > Cc: virtualization@lists.linux-foundation.org > Cc: xen-de...@lists.xenproject.org > Signed-off-by: David Hildenbrand Acked-by: Michael S. Tsirkin > --- > drivers/hv/hv_balloon.c| 2 +- > drivers/virtio/virtio_mem.c| 2 +- > drivers/xen/balloon.c | 2 +- > include/linux/memory_hotplug.h | 2 +- > mm/memory_hotplug.c| 2 +- > 5 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c > index 8c471823a5af..2f776d78e3c1 100644 > --- a/drivers/hv/hv_balloon.c > +++ b/drivers/hv/hv_balloon.c > @@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned > long size, > > nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); > ret = add_memory(nid, PFN_PHYS((start_pfn)), > - (HA_CHUNK << PAGE_SHIFT), MEMHP_MERGE_RESOURCE); > + (HA_CHUNK << PAGE_SHIFT), MHP_MERGE_RESOURCE); > > if (ret) { > pr_err("hot_add memory failed error is %d\n", ret); > diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c > index 85a272c9978e..148bea39b09a 100644 > --- a/drivers/virtio/virtio_mem.c > +++ b/drivers/virtio/virtio_mem.c > @@ -623,7 +623,7 @@ static int virtio_mem_add_memory(struct virtio_mem *vm, > uint64_t addr, > /* Memory might get onlined immediately. */ > atomic64_add(size, >offline_size); > rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name, > -MEMHP_MERGE_RESOURCE); > +MHP_MERGE_RESOURCE); > if (rc) { > atomic64_sub(size, >offline_size); > dev_warn(>vdev->dev, "adding memory failed: %d\n", rc); > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index b57b2067ecbf..671c71245a7b 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void) > mutex_unlock(_mutex); > /* add_memory_resource() requires the device_hotplug lock */ > lock_device_hotplug(); > - rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE); > + rc = add_memory_resource(nid, resource, MHP_MERGE_RESOURCE); > unlock_device_hotplug(); > mutex_lock(_mutex); > > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index 3d99de0db2dd..4b834f5d032e 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -53,7 +53,7 @@ typedef int __bitwise mhp_t; > * with this flag set, the resource pointer must no longer be used as it > * might be stale, or the resource might have changed. > */ > -#define MEMHP_MERGE_RESOURCE ((__force mhp_t)BIT(0)) > +#define MHP_MERGE_RESOURCE ((__force mhp_t)BIT(0)) > > /* > * Extended parameters for memory hotplug: > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 710e469fb3a1..ae497e3ff77c 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1153,7 +1153,7 @@ int __ref add_memory_resource(int nid, struct resource > *res, mhp_t mhp_flags) >* In case we're allowed to merge the resource, flag it and trigger >* merging now that adding succeeded. >*/ > - if (mhp_flags & MEMHP_MERGE_RESOURCE) > + if (mhp_flags & MHP_MERGE_RESOURCE) > merge_system_ram_resource(res); > > /* online pages if requested */ > -- > 2.29.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH iproute2-next 2/2] vdpa: Add vdpa tool
> From: David Ahern > Sent: Tuesday, January 26, 2021 9:53 AM > > Looks fine. A few comments below around code re-use. > > On 1/22/21 4:26 AM, Parav Pandit wrote: > > diff --git a/vdpa/vdpa.c b/vdpa/vdpa.c new file mode 100644 index > > ..942524b7 > > --- /dev/null > > +++ b/vdpa/vdpa.c > > @@ -0,0 +1,828 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include "mnl_utils.h" > > + > > +#include "version.h" > > +#include "json_print.h" > > +#include "utils.h" > > + > > +static int g_indent_level; > > + > > +#define INDENT_STR_STEP 2 > > +#define INDENT_STR_MAXLEN 32 > > +static char g_indent_str[INDENT_STR_MAXLEN + 1] = ""; > > The indent code has a lot of parallels with devlink -- including helpers below > around indent_inc and _dec. Please take a look at how to refactor and re- > use. > Ok. Devlink has some more convoluted code with next line etc. But I will see if I can consolidate without changing the devlink's flow/logic. > > + > > +struct vdpa_socket { > > + struct mnl_socket *nl; > > + char *buf; > > + uint32_t family; > > + unsigned int seq; > > +}; > > + > > +static int vdpa_socket_sndrcv(struct vdpa_socket *nlg, const struct > nlmsghdr *nlh, > > + mnl_cb_t data_cb, void *data) { > > + int err; > > + > > + err = mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len); > > + if (err < 0) { > > + perror("Failed to send data"); > > + return -errno; > > + } > > + > > + err = mnlu_socket_recv_run(nlg->nl, nlh->nlmsg_seq, nlg->buf, > MNL_SOCKET_BUFFER_SIZE, > > + data_cb, data); > > + if (err < 0) { > > + fprintf(stderr, "vdpa answers: %s\n", strerror(errno)); > > + return -errno; > > + } > > + return 0; > > +} > > + > > +static int get_family_id_attr_cb(const struct nlattr *attr, void > > +*data) { > > + int type = mnl_attr_get_type(attr); > > + const struct nlattr **tb = data; > > + > > + if (mnl_attr_type_valid(attr, CTRL_ATTR_MAX) < 0) > > + return MNL_CB_ERROR; > > + > > + if (type == CTRL_ATTR_FAMILY_ID && > > + mnl_attr_validate(attr, MNL_TYPE_U16) < 0) > > + return MNL_CB_ERROR; > > + tb[type] = attr; > > + return MNL_CB_OK; > > +} > > + > > +static int get_family_id_cb(const struct nlmsghdr *nlh, void *data) { > > + struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh); > > + struct nlattr *tb[CTRL_ATTR_MAX + 1] = {}; > > + uint32_t *p_id = data; > > + > > + mnl_attr_parse(nlh, sizeof(*genl), get_family_id_attr_cb, tb); > > + if (!tb[CTRL_ATTR_FAMILY_ID]) > > + return MNL_CB_ERROR; > > + *p_id = mnl_attr_get_u16(tb[CTRL_ATTR_FAMILY_ID]); > > + return MNL_CB_OK; > > +} > > + > > +static int family_get(struct vdpa_socket *nlg) { > > + struct genlmsghdr hdr = {}; > > + struct nlmsghdr *nlh; > > + int err; > > + > > + hdr.cmd = CTRL_CMD_GETFAMILY; > > + hdr.version = 0x1; > > + > > + nlh = mnlu_msg_prepare(nlg->buf, GENL_ID_CTRL, > > + NLM_F_REQUEST | NLM_F_ACK, > > + , sizeof(hdr)); > > + > > + mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME, > VDPA_GENL_NAME); > > + > > + err = mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len); > > + if (err < 0) > > + return err; > > + > > + err = mnlu_socket_recv_run(nlg->nl, nlh->nlmsg_seq, nlg->buf, > > + MNL_SOCKET_BUFFER_SIZE, > > + get_family_id_cb, >family); > > + return err; > > +} > > + > > +static int vdpa_socket_open(struct vdpa_socket *nlg) { > > + int err; > > + > > + nlg->buf = malloc(MNL_SOCKET_BUFFER_SIZE); > > + if (!nlg->buf) > > + goto err_buf_alloc; > > + > > + nlg->nl = mnlu_socket_open(NETLINK_GENERIC); > > + if (!nlg->nl) > > + goto err_socket_open; > > + > > + err = family_get(nlg); > > + if (err) > > + goto err_socket; > > + > > + return 0; > > + > > +err_socket: > > + mnl_socket_close(nlg->nl); > > +err_socket_open: > > + free(nlg->buf); > > +err_buf_alloc: > > + return -1; > > +} > > The above 4 functions duplicate a lot of devlink functionality. Please create > a > helper in lib/mnl_utils.c that can be used in both. > Will do. > > + > > +static unsigned int strslashcount(char *str) { > > + unsigned int count = 0; > > + char *pos = str; > > + > > + while ((pos = strchr(pos, '/'))) { > > + count++; > > + pos++; > > + } > > + return count; > > +} > > you could make that a generic function (e.g., str_char_count) by passing '/' > as > an input. > Yes. > > + > > +static int strslashrsplit(char *str, const char **before, const char > > +**after) { > > + char *slash; > > + > > + slash = strrchr(str, '/'); > > + if (!slash) > > + return -EINVAL; > > + *slash = '\0'; > > + *before = str; > > +
Re: [PATCH v2 10/11] drm: Use state helper instead of the plane state pointer
On Thu, Jan 21, 2021 at 05:35:35PM +0100, Maxime Ripard wrote: > Many drivers reference the plane->state pointer in order to get the > current plane state in their atomic_update or atomic_disable hooks, > which would be the new plane state in the global atomic state since > _swap_state happened when those hooks are run. > > Use the drm_atomic_get_new_plane_state helper to get that state to make it > more obvious. > > This was made using the coccinelle script below: > > @ plane_atomic_func @ > identifier helpers; > identifier func; > @@ > > ( > static const struct drm_plane_helper_funcs helpers = { > ..., > .atomic_disable = func, > ..., > }; > | > static const struct drm_plane_helper_funcs helpers = { > ..., > .atomic_update = func, > ..., > }; > ) > > @ adds_new_state @ > identifier plane_atomic_func.func; > identifier plane, state; > identifier new_state; > @@ > > func(struct drm_plane *plane, struct drm_atomic_state *state) > { > ... > - struct drm_plane_state *new_state = plane->state; > + struct drm_plane_state *new_state = > drm_atomic_get_new_plane_state(state, plane); > ... > } > > @ include depends on adds_new_state @ > @@ > > #include > > @ no_include depends on !include && adds_new_state @ > @@ > > + #include > #include > > Signed-off-by: Maxime Ripard Looks great. Reviewed-by: Ville Syrjälä -- Ville Syrjälä Intel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v1] mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE
Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and "mhp_flags". As discussed recently [1], "mhp" is our internal acronym for memory hotplug now. [1] https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c3...@redhat.com/ Cc: Andrew Morton Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Wei Liu Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: Pankaj Gupta Cc: Michal Hocko Cc: Oscar Salvador Cc: Anshuman Khandual Cc: Wei Yang Cc: linux-hyp...@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-de...@lists.xenproject.org Signed-off-by: David Hildenbrand --- drivers/hv/hv_balloon.c| 2 +- drivers/virtio/virtio_mem.c| 2 +- drivers/xen/balloon.c | 2 +- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c| 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 8c471823a5af..2f776d78e3c1 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); ret = add_memory(nid, PFN_PHYS((start_pfn)), - (HA_CHUNK << PAGE_SHIFT), MEMHP_MERGE_RESOURCE); + (HA_CHUNK << PAGE_SHIFT), MHP_MERGE_RESOURCE); if (ret) { pr_err("hot_add memory failed error is %d\n", ret); diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 85a272c9978e..148bea39b09a 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -623,7 +623,7 @@ static int virtio_mem_add_memory(struct virtio_mem *vm, uint64_t addr, /* Memory might get onlined immediately. */ atomic64_add(size, >offline_size); rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name, - MEMHP_MERGE_RESOURCE); + MHP_MERGE_RESOURCE); if (rc) { atomic64_sub(size, >offline_size); dev_warn(>vdev->dev, "adding memory failed: %d\n", rc); diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index b57b2067ecbf..671c71245a7b 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void) mutex_unlock(_mutex); /* add_memory_resource() requires the device_hotplug lock */ lock_device_hotplug(); - rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE); + rc = add_memory_resource(nid, resource, MHP_MERGE_RESOURCE); unlock_device_hotplug(); mutex_lock(_mutex); diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 3d99de0db2dd..4b834f5d032e 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -53,7 +53,7 @@ typedef int __bitwise mhp_t; * with this flag set, the resource pointer must no longer be used as it * might be stale, or the resource might have changed. */ -#define MEMHP_MERGE_RESOURCE ((__force mhp_t)BIT(0)) +#define MHP_MERGE_RESOURCE ((__force mhp_t)BIT(0)) /* * Extended parameters for memory hotplug: diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 710e469fb3a1..ae497e3ff77c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1153,7 +1153,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) * In case we're allowed to merge the resource, flag it and trigger * merging now that adding succeeded. */ - if (mhp_flags & MEMHP_MERGE_RESOURCE) + if (mhp_flags & MHP_MERGE_RESOURCE) merge_system_ram_resource(res); /* online pages if requested */ -- 2.29.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v3 00/13] virtio/vsock: introduce SOCK_SEQPACKET support
Hi Arseny, thanks for this new series! I'm a bit busy but I hope to review it tomorrow or on Thursday. Stefano On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote: This patchset impelements support of SOCK_SEQPACKET for virtio transport. As SOCK_SEQPACKET guarantees to save record boundaries, so to do it, new packet operation was added: it marks start of record (with record length in header), such packet doesn't carry any data. To send record, packet with start marker is sent first, then all data is sent as usual 'RW' packets. On receiver's side, length of record is known from packet with start record marker. Now as packets of one socket are not reordered neither on vsock nor on vhost transport layers, such marker allows to restore original record on receiver's side. If user's buffer is smaller that record length, when all out of size data is dropped. Maximum length of datagram is not limited as in stream socket, because same credit logic is used. Difference with stream socket is that user is not woken up until whole record is received or error occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. Tests also implemented. Arseny Krasnov (13): af_vsock: prepare for SOCK_SEQPACKET support af_vsock: prepare 'vsock_connectible_recvmsg()' af_vsock: implement SEQPACKET rx loop af_vsock: implement send logic for SOCK_SEQPACKET af_vsock: rest of SEQPACKET support af_vsock: update comments for stream sockets virtio/vsock: dequeue callback for SOCK_SEQPACKET virtio/vsock: fetch length for SEQPACKET record virtio/vsock: add SEQPACKET receive logic virtio/vsock: rest of SOCK_SEQPACKET support virtio/vsock: setup SEQPACKET ops for transport vhost/vsock: setup SEQPACKET ops for transport vsock_test: add SOCK_SEQPACKET tests drivers/vhost/vsock.c | 7 +- include/linux/virtio_vsock.h| 12 + include/net/af_vsock.h | 6 + include/uapi/linux/virtio_vsock.h | 9 + net/vmw_vsock/af_vsock.c| 543 -- net/vmw_vsock/virtio_transport.c| 4 + net/vmw_vsock/virtio_transport_common.c | 295 ++-- tools/testing/vsock/util.c | 32 +- tools/testing/vsock/util.h | 3 + tools/testing/vsock/vsock_test.c| 126 + 10 files changed, 862 insertions(+), 175 deletions(-) TODO: - Support for record integrity control. As transport could drop some packets, something like "record-id" and record end marker need to be implemented. Idea is that SEQ_BEGIN packet carries both record length and record id, end marker(let it be SEQ_END) carries only record id. To be sure that no one packet was lost, receiver checks length of data between SEQ_BEGIN and SEQ_END(it must be same with value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this means that both markers were not dropped. I think that easiest way to implement record id for SEQ_BEGIN is to reuse another field of packet header(SEQ_BEGIN already uses 'flags' as record length).For SEQ_END record id could be stored in 'flags'. Another way to implement it, is to move metadata of both SEQ_END and SEQ_BEGIN to payload. But this approach has problem, because if we move something to payload, such payload is accounted by credit logic, which fragments payload, while payload with record length and id couldn't be fragmented. One way to overcome it is to ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution is to update 'stream_has_space()' function: current implementation return non-zero when at least 1 byte is allowed to use,but updated version will have extra argument, which is needed length. For 'RW' packet this argument is 1, for SEQ_BEGIN it is sizeof(record len + record id) and for SEQ_END it is sizeof(record id). - What to do, when server doesn't support SOCK_SEQPACKET. In current implementation RST is replied in the same way when listening port is not found. I think that current RST is enough,because case when server doesn't support SEQ_PACKET is same when listener missed(e.g. no listener in both cases). v2 -> v3: - patches reorganized: split for prepare and implementation patches - local variables are declared in "Reverse Christmas tree" manner - virtio_transport_common.c: valid leXX_to_cpu() for vsock header fields access - af_vsock.c: 'vsock_connectible_*sockopt()' added as shared code between stream and seqpacket sockets. - af_vsock.c: loops in '__vsock_*_recvmsg()' refactored. - af_vsock.c: 'vsock_wait_data()' refactored. v1 -> v2: - patches reordered: af_vsock.c related changes now before virtio vsock - patches reorganized: more small patches, where +/- are not mixed - tests for SOCK_SEQPACKET added - all commit messages updated - af_vsock.c: 'vsock_pre_recv_check()' inlined to 'vsock_connectible_recvmsg()' - af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport was not found -
Re: [PATCH v2] virtio-blk: support per-device queue depth
On Fri, Jan 22, 2021 at 05:21:46PM +0800, Joseph Qi wrote: > module parameter 'virtblk_queue_depth' was firstly introduced for > testing/benchmarking purposes described in commit fc4324b4597c > ("virtio-blk: base queue-depth on virtqueue ringsize or module param"). > And currently 'virtblk_queue_depth' is used as a saved value for the > first probed device. > Since we have different virtio-blk devices which have different > capabilities, it requires that we support per-device queue depth instead > of per-module. So defaultly use vq free elements if module parameter > 'virtblk_queue_depth' is not set. > > Signed-off-by: Joseph Qi > Acked-by: Jason Wang > --- > drivers/block/virtio_blk.c | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC v2 3/3] vhost: Add Vdmabuf backend
This backend acts as the counterpart to the Vdmabuf Virtio frontend. When it receives a new export event from the frontend, it raises an event to alert the Qemu UI/userspace. Qemu then "imports" this buffer using the Unique ID. As part of the import step, a new dmabuf is created on the Host using the page information obtained from the Guest. The fd associated with this dmabuf is made available to Qemu UI/userspace which then creates a texture from it for the purpose of displaying it. Signed-off-by: Dongwon Kim Signed-off-by: Vivek Kasireddy --- drivers/vhost/Kconfig |9 + drivers/vhost/Makefile |3 + drivers/vhost/vdmabuf.c| 1407 include/uapi/linux/vhost.h |3 + 4 files changed, 1422 insertions(+) create mode 100644 drivers/vhost/vdmabuf.c diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 587fbae06182..9a99cc2611ca 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -89,4 +89,13 @@ config VHOST_CROSS_ENDIAN_LEGACY If unsure, say "N". +config VHOST_VDMABUF + bool "Vhost backend for the Vdmabuf driver" + depends on KVM && EVENTFD + select VHOST + default n + help + This driver works in pair with the Virtio Vdmabuf frontend. It can + be used to create a dmabuf using the pages shared by the Guest. + endif diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index f3e1897cce85..5c2cea4a7eaf 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -17,3 +17,6 @@ obj-$(CONFIG_VHOST) += vhost.o obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o vhost_iotlb-y := iotlb.o + +obj-$(CONFIG_VHOST_VDMABUF) += vhost_vdmabuf.o +vhost_vdmabuf-y := vdmabuf.o diff --git a/drivers/vhost/vdmabuf.c b/drivers/vhost/vdmabuf.c new file mode 100644 index ..2a8a1d852e93 --- /dev/null +++ b/drivers/vhost/vdmabuf.c @@ -0,0 +1,1407 @@ +// SPDX-License-Identifier: (MIT OR GPL-2.0) + +/* + * Copyright © 2021 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Dongwon Kim + *Mateusz Polrola + *Vivek Kasireddy + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vhost.h" + +#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long)) + +static struct virtio_vdmabuf_info *drv_info; + +struct kvm_instance { + struct kvm *kvm; + struct list_head link; +}; + +struct vhost_vdmabuf { + struct vhost_dev dev; + struct vhost_virtqueue vq; + struct vhost_work tx_work; + struct virtio_vdmabuf_event_queue *evq; + u64 vmid; + + /* synchronization between transmissions */ + struct mutex tx_mutex; + /* synchronization on tx and rx */ + struct mutex vq_mutex; + + struct virtio_vdmabuf_txmsg next; + struct list_head list; + struct kvm *kvm; +}; + +static inline void vhost_vdmabuf_add(struct vhost_vdmabuf *new) +{ + list_add_tail(>list, _info->head_vdmabuf_list); +} + +static inline struct vhost_vdmabuf *vhost_vdmabuf_find(u64 vmid) +{ + struct vhost_vdmabuf *found; + + list_for_each_entry(found, _info->head_vdmabuf_list, list) + if (found->vmid == vmid) + return found; + + return NULL; +} + +static inline bool vhost_vdmabuf_del(struct vhost_vdmabuf *vdmabuf) +{ + struct vhost_vdmabuf *iter, *temp; + + list_for_each_entry_safe(iter, temp, +_info->head_vdmabuf_list, +list) + if (iter == vdmabuf) { + list_del(>list); + return true; + } + + return false; +} + +static inline void vhost_vdmabuf_del_all(void) +{ + struct
[RFC v2 2/3] virtio: Introduce Vdmabuf driver
This driver "transfers" a dmabuf created on the Guest to the Host. A common use-case for such a transfer includes sharing the scanout buffer created by a display server or a compositor running in the Guest with Qemu UI -- running on the Host. The "transfer" is accomplished by sharing the PFNs of all the pages associated with the dmabuf and having a new dmabuf created on the Host that is backed up by the pages mapped from the Guest. Signed-off-by: Dongwon Kim Signed-off-by: Vivek Kasireddy --- drivers/virtio/Kconfig | 8 + drivers/virtio/Makefile | 1 + drivers/virtio/virtio_vdmabuf.c | 986 include/linux/virtio_vdmabuf.h | 272 include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_vdmabuf.h | 99 +++ 6 files changed, 1367 insertions(+) create mode 100644 drivers/virtio/virtio_vdmabuf.c create mode 100644 include/linux/virtio_vdmabuf.h create mode 100644 include/uapi/linux/virtio_vdmabuf.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 7b41130d3f35..e563c12f711e 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER This option adds a flavor of dma buffers that are backed by virtio resources. +config VIRTIO_VDMABUF + bool "Enables Vdmabuf driver in guest os" + default n + depends on VIRTIO + help +This driver provides a way to share the dmabufs created in +the Guest with the Host. + endif # VIRTIO_MENU diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 591e6f72aa54..b4bb0738009c 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o +obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c new file mode 100644 index ..0b40ea4fd6f1 --- /dev/null +++ b/drivers/virtio/virtio_vdmabuf.c @@ -0,0 +1,986 @@ +// SPDX-License-Identifier: (MIT OR GPL-2.0) + +/* + * Copyright © 2021 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Dongwon Kim + *Mateusz Polrola + *Vivek Kasireddy + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define VIRTIO_VDMABUF_MAX_ID INT_MAX +#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long)) +#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0x) << 32) | \ + ((cnt) & 0x)) + +/* one global drv object */ +static struct virtio_vdmabuf_info *drv_info; + +struct virtio_vdmabuf { + /* virtio device structure */ + struct virtio_device *vdev; + + /* virtual queue array */ + struct virtqueue *vq; + + /* ID of guest OS */ + u64 vmid; + + /* spin lock that needs to be acquired before accessing +* virtual queue +*/ + spinlock_t vq_lock; + struct mutex rx_lock; + + /* workqueue */ + struct workqueue_struct *wq; + struct work_struct rx_work; + struct virtio_vdmabuf_event_queue *evq; +}; + +static virtio_vdmabuf_buf_id_t get_buf_id(struct virtio_vdmabuf *vdmabuf) +{ + virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} }; + static int count = 0; + + count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0; + buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count); + + /* random data embedded in the id for security */ + get_random_bytes(_id.rng_key[0], 8); + + return buf_id; +} + +/* sharing pages for original DMABUF with Host */ +static struct
[RFC v2 1/3] kvm: Add a notifier for create and destroy VM events
After registering with this notifier, other drivers that are dependent on KVM can get notified whenever a VM is created or destroyed. This also provides a way for sharing the KVM instance pointer with other drivers. Signed-off-by: Vivek Kasireddy --- include/linux/kvm_host.h | 5 + virt/kvm/kvm_main.c | 20 ++-- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f3b1013fb22c..fc1a688301a0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -88,6 +88,9 @@ #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) +#define KVM_EVENT_CREATE_VM 0 +#define KVM_EVENT_DESTROY_VM 1 + /* * error pfns indicate that the gfn is in slot but faild to * translate it to pfn on host. @@ -1494,5 +1497,7 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +int kvm_vm_register_notifier(struct notifier_block *nb); +int kvm_vm_unregister_notifier(struct notifier_block *nb); #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5f260488e999..8a0e8bb02a5f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -101,6 +101,8 @@ DEFINE_MUTEX(kvm_lock); static DEFINE_RAW_SPINLOCK(kvm_count_lock); LIST_HEAD(vm_list); +static struct blocking_notifier_head kvm_vm_notifier; + static cpumask_var_t cpus_hardware_enabled; static int kvm_usage_count; static atomic_t hardware_enable_failed; @@ -148,12 +150,20 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus); __visible bool kvm_rebooting; EXPORT_SYMBOL_GPL(kvm_rebooting); -#define KVM_EVENT_CREATE_VM 0 -#define KVM_EVENT_DESTROY_VM 1 static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm); static unsigned long long kvm_createvm_count; static unsigned long long kvm_active_vms; +inline int kvm_vm_register_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(_vm_notifier, nb); +} + +inline int kvm_vm_unregister_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(_vm_notifier, nb); +} + __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, unsigned long start, unsigned long end) { @@ -808,6 +818,8 @@ static struct kvm *kvm_create_vm(unsigned long type) preempt_notifier_inc(); + blocking_notifier_call_chain(_vm_notifier, +KVM_EVENT_CREATE_VM, kvm); return kvm; out_err: @@ -886,6 +898,8 @@ static void kvm_destroy_vm(struct kvm *kvm) preempt_notifier_dec(); hardware_disable_all(); mmdrop(mm); + blocking_notifier_call_chain(_vm_notifier, +KVM_EVENT_DESTROY_VM, kvm); } void kvm_get_kvm(struct kvm *kvm) @@ -4968,6 +4982,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, r = kvm_vfio_ops_init(); WARN_ON(r); + BLOCKING_INIT_NOTIFIER_HEAD(_vm_notifier); + return 0; out_unreg: -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC v2 0/3] Introduce Vdmabuf driver
The Virtual dmabuf or Virtio based dmabuf (Vdmabuf) driver can be used to "transfer" a page-backed dmabuf created in the Guest to the Host without making any copies. This is mostly accomplished by recreating the dmabuf on the Host using the PFNs and other meta-data shared by the guest. A use-case where this driver would be a good fit is a multi-GPU system (perhaps one discrete and one integrated) where one of the GPUs does not have access to the display/connectors/outputs. This could be an embedded system design decision or a restriction made at the firmware/BIOS level or perhaps the device is setup in UPT (Universal Passthrough) mode. When such a GPU is passthrough'd to a Guest OS, this driver can help in transferring the scanout buffer(s) (rendered using the native rendering stack) to the Host for the purpose of displaying them. The userspace component running in the Guest that transfers the dmabuf is referred to as the producer or exporter and its counterpart running in the Host is referred to as importer or consumer. For instance, a Wayland compositor would potentially be a producer and Qemu UI would be a consumer. It is the producer's responsibility to not reuse or destroy the shared buffer while it is still being used by the consumer. The consumer would send a release cmd indicating that it is done after which the shared buffer can be safely used again by the producer. One way the producer can prevent accidental re-use of the shared buffer is to lock the buffer when it exports it and unlock it after it gets a release cmd. As an example, the GBM API provides a simple way to lock and unlock a surface's buffers. For each dmabuf that is to be shared with the Host, a 128-bit unique ID is generated that identifies this buffer across the whole system. This ID is a combination of the Qemu process ID, a counter and a randomizer. We could potentially use UUID API but we currently use the above mentioned combination to identify the source of the buffer at any given time for bookkeeping. v2: - Added a notifier mechanism for getting the kvm pointer instead of sharing it via VFIO. - Added start and stop routines in the Vhost backend. - Augmented the cover letter and made some minor improvements. Vivek Kasireddy (3): kvm: Add a notifier for create and destroy VM events virtio: Introduce Vdmabuf driver vhost: Add Vdmabuf backend drivers/vhost/Kconfig |9 + drivers/vhost/Makefile |3 + drivers/vhost/vdmabuf.c | 1407 +++ drivers/virtio/Kconfig |8 + drivers/virtio/Makefile |1 + drivers/virtio/virtio_vdmabuf.c | 986 +++ include/linux/kvm_host.h|5 + include/linux/virtio_vdmabuf.h | 272 ++ include/uapi/linux/vhost.h |3 + include/uapi/linux/virtio_ids.h |1 + include/uapi/linux/virtio_vdmabuf.h | 99 ++ virt/kvm/kvm_main.c | 20 +- 12 files changed, 2812 insertions(+), 2 deletions(-) create mode 100644 drivers/vhost/vdmabuf.c create mode 100644 drivers/virtio/virtio_vdmabuf.c create mode 100644 include/linux/virtio_vdmabuf.h create mode 100644 include/uapi/linux/virtio_vdmabuf.h -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 8/9] ALSA: virtio: introduce PCM channel map support
On Sun, 24 Jan 2021, Anton Yakovlev wrote: Enumerate all available PCM channel maps and create ALSA controls. Signed-off-by: Anton Yakovlev --- sound/virtio/Makefile | 1 + sound/virtio/virtio_card.c | 15 +++ sound/virtio/virtio_card.h | 8 ++ sound/virtio/virtio_chmap.c | 237 sound/virtio/virtio_pcm.h | 4 + 5 files changed, 265 insertions(+) create mode 100644 sound/virtio/virtio_chmap.c [snip] diff --git a/sound/virtio/virtio_chmap.c b/sound/virtio/virtio_chmap.c new file mode 100644 index ..8a2ddc4dcffb --- /dev/null +++ b/sound/virtio/virtio_chmap.c @@ -0,0 +1,237 @@ [snip] +/** + * virtsnd_chmap_parse_cfg() - Parse the channel map configuration. + * @snd: VirtIO sound device. + * + * This function is called during initial device initialization. + * + * Context: Any context that permits to sleep. + * Return: 0 on success, -errno on failure. + */ +int virtsnd_chmap_parse_cfg(struct virtio_snd *snd) +{ + struct virtio_device *vdev = snd->vdev; + unsigned int i; + int rc; + + virtio_cread(vdev, struct virtio_snd_config, chmaps, >nchmaps); + if (!snd->nchmaps) + return 0; + + snd->chmaps = devm_kcalloc(>dev, snd->nchmaps, + sizeof(*snd->chmaps), GFP_KERNEL); + if (!snd->chmaps) + return -ENOMEM; + + rc = virtsnd_ctl_query_info(snd, VIRTIO_SND_R_CHMAP_INFO, 0, + snd->nchmaps, sizeof(*snd->chmaps), + snd->chmaps); + if (rc) + return rc; + + /* Count the number of channel maps per each PCM device/stream. */ + for (i = 0; i < snd->nchmaps; ++i) { + struct virtio_snd_chmap_info *info = >chmaps[i]; + unsigned int nid = le32_to_cpu(info->hdr.hda_fn_nid); + struct virtio_pcm *pcm; + struct virtio_pcm_stream *stream; + + pcm = virtsnd_pcm_find_or_create(snd, nid); + if (IS_ERR(pcm)) + return PTR_ERR(pcm); + + switch (info->direction) { + case VIRTIO_SND_D_OUTPUT: { + stream = >streams[SNDRV_PCM_STREAM_PLAYBACK]; + break; + } + case VIRTIO_SND_D_INPUT: { + stream = >streams[SNDRV_PCM_STREAM_CAPTURE]; + break; + } + default: { + dev_err(>dev, + "chmap #%u: unknown direction (%u)\n", i, + info->direction); + return -EINVAL; + } + } + + stream->nchmaps++; + } + + return 0; +} + +/** + * virtsnd_chmap_add_ctls() - Create an ALSA control for channel maps. + * @pcm: ALSA PCM device. + * @direction: PCM stream direction (SNDRV_PCM_STREAM_XXX). + * @stream: VirtIO PCM stream. + * + * Context: Any context. + * Return: 0 on success, -errno on failure. + */ +static int virtsnd_chmap_add_ctls(struct snd_pcm *pcm, int direction, + struct virtio_pcm_stream *stream) +{ + unsigned int i; + int max_channels = 0; + + for (i = 0; i < stream->nchmaps; i++) + if (max_channels < stream->chmaps[i].channels) + max_channels = stream->chmaps[i].channels; + + return snd_pcm_add_chmap_ctls(pcm, direction, stream->chmaps, + max_channels, 0, NULL); +} + +/** + * virtsnd_chmap_build_devs() - Build ALSA controls for channel maps. + * @snd: VirtIO sound device. + * + * Context: Any context. + * Return: 0 on success, -errno on failure. + */ +int virtsnd_chmap_build_devs(struct virtio_snd *snd) +{ + struct virtio_device *vdev = snd->vdev; + struct virtio_pcm *pcm; + struct virtio_pcm_stream *stream; + unsigned int i; + int rc; + + /* Allocate channel map elements per each PCM device/stream. */ + list_for_each_entry(pcm, >pcm_list, list) { + for (i = 0; i < ARRAY_SIZE(pcm->streams); ++i) { + stream = >streams[i]; + + if (!stream->nchmaps) + continue; + + stream->chmaps = devm_kcalloc(>dev, + stream->nchmaps + 1, + sizeof(*stream->chmaps), + GFP_KERNEL); + if (!stream->chmaps) + return -ENOMEM; + + stream->nchmaps = 0; + } + } + + /* Initialize channel maps per each PCM device/stream. */ + for (i = 0; i < snd->nchmaps; ++i) { + struct virtio_snd_chmap_info *info = >chmaps[i]; +
Re: [PATCH v3] vhost_vdpa: fix the problem in vhost_vdpa_set_config_call
On 2021/1/26 下午3:16, Cindy Lu wrote: In vhost_vdpa_set_config_call, the cb.private should be vhost_vdpa. this cb.private will finally use in vhost_vdpa_config_cb as vhost_vdpa. Fix this issue. Cc: sta...@vger.kernel.org Fixes: 776f395004d82 ("vhost_vdpa: Support config interrupt in vdpa") Signed-off-by: Cindy Lu Acked-by: Jason Wang --- drivers/vhost/vdpa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index ef688c8c0e0e..3fbb9c1f49da 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -319,7 +319,7 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa *v, u32 __user *argp) struct eventfd_ctx *ctx; cb.callback = vhost_vdpa_config_cb; - cb.private = v->vdpa; + cb.private = v; if (copy_from_user(, argp, sizeof(fd))) return -EFAULT; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v3 08/11] vduse: Introduce VDUSE - vDPA Device in Userspace
On 2021/1/19 下午1:07, Xie Yongji wrote: This VDUSE driver enables implementing vDPA devices in userspace. Both control path and data path of vDPA devices will be able to be handled in userspace. In the control path, the VDUSE driver will make use of message mechnism to forward the config operation from vdpa bus driver to userspace. Userspace can use read()/write() to receive/reply those control messages. In the data path, VDUSE_IOTLB_GET_FD ioctl will be used to get the file descriptors referring to vDPA device's iova regions. Then userspace can use mmap() to access those iova regions. Besides, the eventfd mechanism is used to trigger interrupt callbacks and receive virtqueue kicks in userspace. Signed-off-by: Xie Yongji --- Documentation/driver-api/vduse.rst | 85 ++ Documentation/userspace-api/ioctl/ioctl-number.rst |1 + drivers/vdpa/Kconfig |7 + drivers/vdpa/Makefile |1 + drivers/vdpa/vdpa_user/Makefile|5 + drivers/vdpa/vdpa_user/eventfd.c | 221 drivers/vdpa/vdpa_user/eventfd.h | 48 + drivers/vdpa/vdpa_user/iova_domain.c | 426 +++ drivers/vdpa/vdpa_user/iova_domain.h | 68 ++ drivers/vdpa/vdpa_user/vduse.h | 62 + drivers/vdpa/vdpa_user/vduse_dev.c | 1217 include/uapi/linux/vdpa.h |1 + include/uapi/linux/vduse.h | 125 ++ 13 files changed, 2267 insertions(+) create mode 100644 Documentation/driver-api/vduse.rst create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/eventfd.c create mode 100644 drivers/vdpa/vdpa_user/eventfd.h create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h create mode 100644 drivers/vdpa/vdpa_user/vduse.h create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h Btw, if you could split this into three parts: 1) iova domain 2) vduse device 3) doc It would be more easier for the reviewers. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v3 11/11] vduse: Introduce a workqueue for irq injection
On 2021/1/19 下午1:07, Xie Yongji wrote: This patch introduces a dedicated workqueue for irq injection so that we are able to do some performance tuning for it. Signed-off-by: Xie Yongji If we want the split like this. It might be better to: 1) implement a simple irq injection on the ioctl context in patch 8 2) add the dedicated workqueue injection in this patch Since my understanding is that 1) the function looks more isolated for readers 2) the difference between sysctl vs workqueue should be more obvious than system wq vs dedicated wq 3) a chance to describe why workqueue is needed in the commit log in this patch Thanks --- drivers/vdpa/vdpa_user/eventfd.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_user/eventfd.c b/drivers/vdpa/vdpa_user/eventfd.c index dbffddb08908..caf7d8d68ac0 100644 --- a/drivers/vdpa/vdpa_user/eventfd.c +++ b/drivers/vdpa/vdpa_user/eventfd.c @@ -18,6 +18,7 @@ #include "eventfd.h" static struct workqueue_struct *vduse_irqfd_cleanup_wq; +static struct workqueue_struct *vduse_irq_wq; static void vduse_virqfd_shutdown(struct work_struct *work) { @@ -57,7 +58,7 @@ static int vduse_virqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, __poll_t flags = key_to_poll(key); if (flags & EPOLLIN) - schedule_work(>inject); + queue_work(vduse_irq_wq, >inject); if (flags & EPOLLHUP) { spin_lock(>irq_lock); @@ -165,11 +166,18 @@ int vduse_virqfd_init(void) if (!vduse_irqfd_cleanup_wq) return -ENOMEM; + vduse_irq_wq = alloc_workqueue("vduse-irq", WQ_SYSFS | WQ_UNBOUND, 0); + if (!vduse_irq_wq) { + destroy_workqueue(vduse_irqfd_cleanup_wq); + return -ENOMEM; + } + return 0; } void vduse_virqfd_exit(void) { + destroy_workqueue(vduse_irq_wq); destroy_workqueue(vduse_irqfd_cleanup_wq); } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v3 10/11] vduse: grab the module's references until there is no vduse device
On 2021/1/19 下午1:07, Xie Yongji wrote: The module should not be unloaded if any vduse device exists. So increase the module's reference count when creating vduse device. And the reference count is kept until the device is destroyed. Signed-off-by: Xie Yongji Looks like a bug fix. If yes, let's squash this into patch 8. Thanks --- drivers/vdpa/vdpa_user/vduse_dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 4d21203da5b6..003aeb281bce 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -978,6 +978,7 @@ static int vduse_destroy_dev(u32 id) kfree(dev->vqs); vduse_domain_destroy(dev->domain); vduse_dev_destroy(dev); + module_put(THIS_MODULE); return 0; } @@ -1022,6 +1023,7 @@ static int vduse_create_dev(struct vduse_dev_config *config) dev->connected = true; list_add(>list, _devs); + __module_get(THIS_MODULE); return fd; err_fd: ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v3 08/11] vduse: Introduce VDUSE - vDPA Device in Userspace
On 2021/1/19 下午1:07, Xie Yongji wrote: This VDUSE driver enables implementing vDPA devices in userspace. Both control path and data path of vDPA devices will be able to be handled in userspace. In the control path, the VDUSE driver will make use of message mechnism to forward the config operation from vdpa bus driver to userspace. Userspace can use read()/write() to receive/reply those control messages. In the data path, VDUSE_IOTLB_GET_FD ioctl will be used to get the file descriptors referring to vDPA device's iova regions. Then userspace can use mmap() to access those iova regions. Besides, the eventfd mechanism is used to trigger interrupt callbacks and receive virtqueue kicks in userspace. Signed-off-by: Xie Yongji --- Documentation/driver-api/vduse.rst | 85 ++ Documentation/userspace-api/ioctl/ioctl-number.rst |1 + drivers/vdpa/Kconfig |7 + drivers/vdpa/Makefile |1 + drivers/vdpa/vdpa_user/Makefile|5 + drivers/vdpa/vdpa_user/eventfd.c | 221 drivers/vdpa/vdpa_user/eventfd.h | 48 + drivers/vdpa/vdpa_user/iova_domain.c | 426 +++ drivers/vdpa/vdpa_user/iova_domain.h | 68 ++ drivers/vdpa/vdpa_user/vduse.h | 62 + drivers/vdpa/vdpa_user/vduse_dev.c | 1217 include/uapi/linux/vdpa.h |1 + include/uapi/linux/vduse.h | 125 ++ 13 files changed, 2267 insertions(+) create mode 100644 Documentation/driver-api/vduse.rst create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/eventfd.c create mode 100644 drivers/vdpa/vdpa_user/eventfd.h create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h create mode 100644 drivers/vdpa/vdpa_user/vduse.h create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/driver-api/vduse.rst b/Documentation/driver-api/vduse.rst new file mode 100644 index ..9418a7f6646b --- /dev/null +++ b/Documentation/driver-api/vduse.rst @@ -0,0 +1,85 @@ +== +VDUSE - "vDPA Device in Userspace" +== + +vDPA (virtio data path acceleration) device is a device that uses a +datapath which complies with the virtio specifications with vendor +specific control path. vDPA devices can be both physically located on +the hardware or emulated by software. VDUSE is a framework that makes it +possible to implement software-emulated vDPA devices in userspace. + +How VDUSE works + +Each userspace vDPA device is created by the VDUSE_CREATE_DEV ioctl on +the VDUSE character device (/dev/vduse). Then a file descriptor pointing +to the new resources will be returned, which can be used to implement the +userspace vDPA device's control path and data path. + +To implement control path, the read/write operations to the file descriptor +will be used to receive/reply the control messages from/to VDUSE driver. It's better to document the protocol here. E.g the identifier stuffs. +Those control messages are mostly based on the vdpa_config_ops which defines +a unified interface to control different types of vDPA device. + +The following types of messages are provided by the VDUSE framework now: + +- VDUSE_SET_VQ_ADDR: Set the addresses of the different aspects of virtqueue. "Set the vring address of a virtqueue" might be better here. + +- VDUSE_SET_VQ_NUM: Set the size of virtqueue + +- VDUSE_SET_VQ_READY: Set ready status of virtqueue + +- VDUSE_GET_VQ_READY: Get ready status of virtqueue + +- VDUSE_SET_VQ_STATE: Set the state (last_avail_idx) for virtqueue + +- VDUSE_GET_VQ_STATE: Get the state (last_avail_idx) for virtqueue It's better not to mention layout specific stuffs here (last_avail_idx). Consider we should support packed virtqueue in the future. + +- VDUSE_SET_FEATURES: Set virtio features supported by the driver + +- VDUSE_GET_FEATURES: Get virtio features supported by the device + +- VDUSE_SET_STATUS: Set the device status + +- VDUSE_GET_STATUS: Get the device status + +- VDUSE_SET_CONFIG: Write to device specific configuration space + +- VDUSE_GET_CONFIG: Read from device specific configuration space + +- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping in device IOTLB + +Please see include/linux/vdpa.h for details. + +In the data path, vDPA device's iova regions will be mapped into userspace with +the help of VDUSE_IOTLB_GET_FD ioctl on the userspace vDPA device fd: + +- VDUSE_IOTLB_GET_FD: get the file descriptor to iova region. Userspace can + access this iova region by passing the fd to mmap(2). + +Besides, the eventfd mechanism is used to trigger interrupt callbacks and +receive