RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-01-26 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/hyperv/irqdomain.c |  54 ++
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c| 179 +++-
>  3 files changed, 233 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 19637cd60231..8e2b4e478b70 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
>  }
> 
>  #endif /* CONFIG_PCI_MSI */
> +
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_unmap_interrupt(device_id.as_uint64, entry) & 
> HV_HYPERCALL_RESULT_MASK;

The masking is already done in hv_unmap_interrupt.

> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> +
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry)
> +{
> + unsigned long flags;
> + struct hv_input_map_device_interrupt *input;
> + struct hv_output_map_device_interrupt *output;
> + union hv_device_id device_id;
> + struct hv_device_interrupt_descriptor *intr_desc;
> + u16 status;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + memset(input, 0, sizeof(*input));
> + intr_desc = >interrupt_descriptor;
> + input->partition_id = hv_current_partition_id;
> + input->device_id = device_id.as_uint64;
> + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> + intr_desc->target.vector = vector;
> + intr_desc->vector_count = 1;
> +
> + if (level)
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> + else
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +
> + __set_bit(vcpu, (unsigned long *)_desc->target.vp_mask);
> +
> + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, 
> output) &
> +  HV_HYPERCALL_RESULT_MASK;
> + local_irq_restore(flags);
> +
> + *entry = output->interrupt_entry;
> +
> + return status;

As a cross-check, I was comparing this code against hv_map_msi_interrupt().  
They are
mostly parallel, though some of the assignments are done in a different order.  
It's a nit,
but making them as parallel as possible would be nice. :-)

Same 64 vCPU comment applies here as well.


> +}
> +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccc849e25d5e..345d7c6f8c37 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> hv_msi_entry *msi_entry,
> 
>  struct irq_domain *hv_create_pci_msi_domain(void);
> 
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry);
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry);
> +
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index b7db6024e65c..6d35e4c303c6 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops 
> = {
>   .free = hyperv_irq_remapping_free,
>  };
> 
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
>  static int __init hyperv_prepare_irq_remapping(void)
>  {
>   struct fwnode_handle *fn;
>   int i;
> + const char *name;
> + const struct irq_domain_ops *ops;
> 
>   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>   x86_init.hyper.msi_ext_dest_id() ||
> - !x2apic_supported() || hv_root_partition)
> + !x2apic_supported())

Any reason that the check for hv_root_partition was added
in patch #4  of this series, and then removed here?  Could
patch #4 just be dropped?

>   return -ENODEV;
> 
> - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> + if (hv_root_partition) {
> + name = "HYPERV-ROOT-IR";
> + ops = 

RE: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition

2021-01-26 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux runs as the root partition on Microsoft Hypervisor, its
> interrupts are remapped.  Linux will need to explicitly map and unmap
> interrupts for hardware.
> 
> Implement an MSI domain to issue the correct hypercalls. And initialize
> this irqdomain as the default MSI irq domain.
> 
> Signed-off-by: Sunil Muthuswamy 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
> v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> v3: build irqdomain.o for 32bit as well.

I'm not clear on the intent for 32-bit builds.  Given that hv_proc.c is built
only for 64-bit, I'm assuming running Linux in the root partition
is only functional for 64-bit builds.  So is the goal simply that 32-bit
builds will compile correctly?  Seems like maybe there should be
a CONFIG option for running Linux in the root partition, and that
option would force 64-bit.

> v2: This patch is simplified due to upstream changes.
> ---
>  arch/x86/hyperv/Makefile|   2 +-
>  arch/x86/hyperv/hv_init.c   |   9 +
>  arch/x86/hyperv/irqdomain.c | 332 
>  arch/x86/include/asm/mshyperv.h |   2 +
>  4 files changed, 344 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/irqdomain.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 565358020921..48e2c51464e8 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y:= hv_init.o mmu.o nested.o
> +obj-y:= hv_init.o mmu.o nested.o irqdomain.o
>  obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index ad8e77859b32..1cb2f7d1850a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -484,6 +484,15 @@ void __init hyperv_init(void)
> 
>   BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> 
> +#ifdef CONFIG_PCI_MSI
> + /*
> +  * If we're running as root, we want to create our own PCI MSI domain.
> +  * We can't set this in hv_pci_init because that would be too late.
> +  */
> + if (hv_root_partition)
> + x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain;
> +#endif
> +
>   return;
> 
>  remove_cpuhp_state:
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> new file mode 100644
> index ..19637cd60231
> --- /dev/null
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -0,0 +1,332 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
> +//
> +// Authors:
> +//   Sunil Muthuswamy 
> +//   Wei Liu 

I think the // comment style should only be used for the SPDX line.

> +
> +#include 
> +#include 
> +#include 
> +
> +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
> +{
> + unsigned long flags;
> + struct hv_input_unmap_device_interrupt *input;
> + struct hv_interrupt_entry *intr_entry;
> + u16 status;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + memset(input, 0, sizeof(*input));
> + intr_entry = >interrupt_entry;
> + input->partition_id = hv_current_partition_id;
> + input->device_id = id;
> + *intr_entry = *old_entry;
> +
> + status = hv_do_rep_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, 0, 0, 
> input, NULL) &
> +  HV_HYPERCALL_RESULT_MASK;
> + local_irq_restore(flags);
> +
> + return status;
> +}
> +
> +#ifdef CONFIG_PCI_MSI
> +struct rid_data {
> + struct pci_dev *bridge;
> + u32 rid;
> +};
> +
> +static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data)
> +{
> + struct rid_data *rd = data;
> + u8 bus = PCI_BUS_NUM(rd->rid);
> +
> + if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) {
> + rd->bridge = pdev;
> + rd->rid = alias;
> + }
> +
> + return 0;
> +}
> +
> +static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
> +{
> + union hv_device_id dev_id;
> + struct rid_data data = {
> + .bridge = NULL,
> + .rid = PCI_DEVID(dev->bus->number, dev->devfn)
> + };
> +
> + pci_for_each_dma_alias(dev, get_rid_cb, );
> +
> + dev_id.as_uint64 = 0;
> + dev_id.device_type = HV_DEVICE_TYPE_PCI;
> + dev_id.pci.segment = pci_domain_nr(dev->bus);
> +
> + dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid);
> + dev_id.pci.bdf.device = PCI_SLOT(data.rid);
> + dev_id.pci.bdf.function = PCI_FUNC(data.rid);
> + dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
> +
> + if (data.bridge) {
> + int pos;
> +
> + /*
> +  * Microsoft Hypervisor requires a bus range when the bridge is
> +  * running in PCI-X mode.
> +  

Re: [RFC v3 06/11] vhost-vdpa: Add an opaque pointer for vhost IOTLB

2021-01-26 Thread Jason Wang


On 2021/1/20 下午3:52, Yongji Xie wrote:

On Wed, Jan 20, 2021 at 2:24 PM Jason Wang  wrote:


On 2021/1/19 下午12:59, Xie Yongji wrote:

Add an opaque pointer for vhost IOTLB to store the
corresponding vma->vm_file and offset on the DMA mapping.


Let's split the patch into two.

1) opaque pointer
2) vma stuffs


OK.


It will be used in VDUSE case later.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
---
   drivers/vdpa/vdpa_sim/vdpa_sim.c | 11 ---
   drivers/vhost/iotlb.c|  5 ++-
   drivers/vhost/vdpa.c | 66 
+++-
   drivers/vhost/vhost.c|  4 +--
   include/linux/vdpa.h |  3 +-
   include/linux/vhost_iotlb.h  |  8 -
   6 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 03c796873a6b..1ffcef67954f 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -279,7 +279,7 @@ static dma_addr_t vdpasim_map_page(struct device *dev, 
struct page *page,
*/
   spin_lock(>iommu_lock);
   ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1,
- pa, dir_to_perm(dir));
+ pa, dir_to_perm(dir), NULL);


Maybe its better to introduce

vhost_iotlb_add_range_ctx() which can accepts the opaque (context). And
let vhost_iotlb_add_range() just call that.


If so, we need export both vhost_iotlb_add_range() and
vhost_iotlb_add_range_ctx() which will be used in VDUSE driver. Is it
a bit redundant?



Probably not, we do something similar in virtio core:

void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
                void **ctx)
{
    struct vring_virtqueue *vq = to_vvq(_vq);

    return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
                 virtqueue_get_buf_ctx_split(_vq, len, ctx);
}
EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);

void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
{
    return virtqueue_get_buf_ctx(_vq, len, NULL);
}
EXPORT_SYMBOL_GPL(virtqueue_get_buf);





   spin_unlock(>iommu_lock);
   if (ret)
   return DMA_MAPPING_ERROR;
@@ -317,7 +317,7 @@ static void *vdpasim_alloc_coherent(struct device *dev, 
size_t size,

   ret = vhost_iotlb_add_range(iommu, (u64)pa,
   (u64)pa + size - 1,
- pa, VHOST_MAP_RW);
+ pa, VHOST_MAP_RW, NULL);
   if (ret) {
   *dma_addr = DMA_MAPPING_ERROR;
   kfree(addr);
@@ -625,7 +625,8 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,
   for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
map = vhost_iotlb_itree_next(map, start, last)) {
   ret = vhost_iotlb_add_range(vdpasim->iommu, map->start,
- map->last, map->addr, map->perm);
+ map->last, map->addr,
+ map->perm, NULL);
   if (ret)
   goto err;
   }
@@ -639,14 +640,14 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,
   }

   static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size,
-u64 pa, u32 perm)
+u64 pa, u32 perm, void *opaque)
   {
   struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
   int ret;

   spin_lock(>iommu_lock);
   ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa,
- perm);
+ perm, NULL);
   spin_unlock(>iommu_lock);

   return ret;
diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 0fd3f87e913c..3bd5bd06cdbc 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -42,13 +42,15 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_map_free);
* @last: last of IOVA range
* @addr: the address that is mapped to @start
* @perm: access permission of this range
+ * @opaque: the opaque pointer for the IOTLB mapping
*
* Returns an error last is smaller than start or memory allocation
* fails
*/
   int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
 u64 start, u64 last,
-   u64 addr, unsigned int perm)
+   u64 addr, unsigned int perm,
+   void *opaque)
   {
   struct vhost_iotlb_map *map;

@@ -71,6 +73,7 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
   map->last = last;
   map->addr = addr;
   map->perm = perm;
+ map->opaque = opaque;

   iotlb->nmaps++;
   vhost_iotlb_itree_insert(map, >root);
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 36b6950ba37f..e83e5be7cec8 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -488,6 

Re: [RFC v3 05/11] vdpa: shared virtual addressing support

2021-01-26 Thread Jason Wang


On 2021/1/20 下午3:10, Yongji Xie wrote:

On Wed, Jan 20, 2021 at 1:55 PM Jason Wang  wrote:


On 2021/1/19 下午12:59, Xie Yongji wrote:

This patches introduces SVA (Shared Virtual Addressing)
support for vDPA device. During vDPA device allocation,
vDPA device driver needs to indicate whether SVA is
supported by the device. Then vhost-vdpa bus driver
will not pin user page and transfer userspace virtual
address instead of physical address during DMA mapping.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
---
   drivers/vdpa/ifcvf/ifcvf_main.c   |  2 +-
   drivers/vdpa/mlx5/net/mlx5_vnet.c |  2 +-
   drivers/vdpa/vdpa.c   |  5 -
   drivers/vdpa/vdpa_sim/vdpa_sim.c  |  3 ++-
   drivers/vhost/vdpa.c  | 35 +++
   include/linux/vdpa.h  | 10 +++---
   6 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 23474af7da40..95c4601f82f5 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -439,7 +439,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)

   adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
   dev, _vdpa_ops,
- IFCVF_MAX_QUEUE_PAIRS * 2, NULL);
+ IFCVF_MAX_QUEUE_PAIRS * 2, NULL, false);
   if (adapter == NULL) {
   IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
   return -ENOMEM;
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 77595c81488d..05988d6907f2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1959,7 +1959,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
   max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);

   ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, 
_vdpa_ops,
-  2 * mlx5_vdpa_max_qps(max_vqs), NULL);
+  2 * mlx5_vdpa_max_qps(max_vqs), NULL, false);
   if (IS_ERR(ndev))
   return PTR_ERR(ndev);

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 32bd48baffab..50cab930b2e5 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -72,6 +72,7 @@ static void vdpa_release_dev(struct device *d)
* @nvqs: number of virtqueues supported by this device
* @size: size of the parent structure that contains private data
* @name: name of the vdpa device; optional.
+ * @sva: indicate whether SVA (Shared Virtual Addressing) is supported
*
* Driver should use vdpa_alloc_device() wrapper macro instead of
* using this directly.
@@ -81,7 +82,8 @@ static void vdpa_release_dev(struct device *d)
*/
   struct vdpa_device *__vdpa_alloc_device(struct device *parent,
   const struct vdpa_config_ops *config,
- int nvqs, size_t size, const char *name)
+ int nvqs, size_t size, const char *name,
+ bool sva)
   {
   struct vdpa_device *vdev;
   int err = -EINVAL;
@@ -108,6 +110,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
   vdev->config = config;
   vdev->features_valid = false;
   vdev->nvqs = nvqs;
+ vdev->sva = sva;

   if (name)
   err = dev_set_name(>dev, "%s", name);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 85776e4e6749..03c796873a6b 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -367,7 +367,8 @@ static struct vdpasim *vdpasim_create(const char *name)
   else
   ops = _net_config_ops;

- vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, 
VDPASIM_VQ_NUM, name);
+ vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops,
+ VDPASIM_VQ_NUM, name, false);
   if (!vdpasim)
   goto err_alloc;

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 4a241d380c40..36b6950ba37f 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -486,21 +486,25 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
   static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last)
   {
   struct vhost_dev *dev = >vdev;
+ struct vdpa_device *vdpa = v->vdpa;
   struct vhost_iotlb *iotlb = dev->iotlb;
   struct vhost_iotlb_map *map;
   struct page *page;
   unsigned long pfn, pinned;

   while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) {
- pinned = map->size >> PAGE_SHIFT;
- for (pfn = map->addr >> PAGE_SHIFT;
-  pinned > 0; pfn++, pinned--) {
- page = pfn_to_page(pfn);
- if (map->perm & VHOST_ACCESS_WO)

Re: [RFC v3 01/11] eventfd: track eventfd_signal() recursion depth separately in different cases

2021-01-26 Thread Jason Wang


On 2021/1/20 下午2:52, Yongji Xie wrote:

On Wed, Jan 20, 2021 at 12:24 PM Jason Wang  wrote:


On 2021/1/19 下午12:59, Xie Yongji wrote:

Now we have a global percpu counter to limit the recursion depth
of eventfd_signal(). This can avoid deadlock or stack overflow.
But in stack overflow case, it should be OK to increase the
recursion depth if needed. So we add a percpu counter in eventfd_ctx
to limit the recursion depth for deadlock case. Then it could be
fine to increase the global percpu counter later.


I wonder whether or not it's worth to introduce percpu for each eventfd.

How about simply check if eventfd_signal_count() is greater than 2?


It can't avoid deadlock in this way.



I may miss something but the count is to avoid recursive eventfd call. 
So for VDUSE what we suffers is e.g the interrupt injection path:


userspace write IRQFD -> vq->cb() -> another IRQFD.

It looks like increasing EVENTFD_WAKEUP_DEPTH should be sufficient?

Thanks



So we need a percpu counter for
each eventfd to limit the recursion depth for deadlock cases. And
using a global percpu counter to avoid stack overflow.

Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v3 03/11] vdpa: Remove the restriction that only supports virtio-net devices

2021-01-26 Thread Jason Wang


On 2021/1/20 下午7:08, Stefano Garzarella wrote:

On Wed, Jan 20, 2021 at 11:46:38AM +0800, Jason Wang wrote:


On 2021/1/19 下午12:59, Xie Yongji wrote:

With VDUSE, we should be able to support all kinds of virtio devices.

Signed-off-by: Xie Yongji 
---
 drivers/vhost/vdpa.c | 29 +++--
 1 file changed, 3 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 29ed4173f04e..448be7875b6d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "vhost.h"
@@ -185,26 +186,6 @@ static long vhost_vdpa_set_status(struct 
vhost_vdpa *v, u8 __user *statusp)

 return 0;
 }
-static int vhost_vdpa_config_validate(struct vhost_vdpa *v,
-  struct vhost_vdpa_config *c)
-{
-    long size = 0;
-
-    switch (v->virtio_id) {
-    case VIRTIO_ID_NET:
-    size = sizeof(struct virtio_net_config);
-    break;
-    }
-
-    if (c->len == 0)
-    return -EINVAL;
-
-    if (c->len > size - c->off)
-    return -E2BIG;
-
-    return 0;
-}



I think we should use a separate patch for this.


For the vdpa-blk simulator I had the same issues and I'm adding a 
.get_config_size() callback to vdpa devices.


Do you think make sense or is better to remove this check in 
vhost/vdpa, delegating the boundaries checks to get_config/set_config 
callbacks.



A question here. How much value could we gain from get_config_size() 
consider we can let vDPA parent to validate the length in its get_config().


Thanks




Thanks,
Stefano



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] Fix "ordering" comment typos

2021-01-26 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Fix comment typos in "ordering".

Signed-off-by: Bjorn Helgaas 
---
 arch/s390/include/asm/facility.h | 2 +-
 drivers/gpu/drm/qxl/qxl_drv.c| 2 +-
 drivers/net/wireless/intel/iwlwifi/fw/file.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


Unless somebody objects, I'll just merge these typo fixes via the PCI tree.


diff --git a/arch/s390/include/asm/facility.h b/arch/s390/include/asm/facility.h
index 68c476b20b57..91b5d714d28f 100644
--- a/arch/s390/include/asm/facility.h
+++ b/arch/s390/include/asm/facility.h
@@ -44,7 +44,7 @@ static inline int __test_facility(unsigned long nr, void 
*facilities)
 }
 
 /*
- * The test_facility function uses the bit odering where the MSB is bit 0.
+ * The test_facility function uses the bit ordering where the MSB is bit 0.
  * That makes it easier to query facility bits with the bit number as
  * documented in the Principles of Operation.
  */
diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
index 6e7f16f4cec7..dab190a547cc 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -141,7 +141,7 @@ static void qxl_drm_release(struct drm_device *dev)
 
/*
 * TODO: qxl_device_fini() call should be in qxl_pci_remove(),
-* reodering qxl_modeset_fini() + qxl_device_fini() calls is
+* reordering qxl_modeset_fini() + qxl_device_fini() calls is
 * non-trivial though.
 */
qxl_modeset_fini(qdev);
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/file.h 
b/drivers/net/wireless/intel/iwlwifi/fw/file.h
index 597bc88479ba..04fbfe5cbeb0 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/file.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/file.h
@@ -866,7 +866,7 @@ struct iwl_fw_dbg_trigger_time_event {
  * tx_bar: tid bitmap to configure on what tid the trigger should occur
  * when a BAR is send (for an Rx BlocAck session).
  * frame_timeout: tid bitmap to configure on what tid the trigger should occur
- * when a frame times out in the reodering buffer.
+ * when a frame times out in the reordering buffer.
  */
 struct iwl_fw_dbg_trigger_ba {
__le16 rx_ba_start;
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 0/3] VMCI: Queue pair bug fixes

2021-01-26 Thread Greg KH
On Wed, Jan 20, 2021 at 08:32:04AM -0800, Jorgen Hansen wrote:
> This series contains three bug fixes for the queue pair
> implementation in the VMCI driver.
> 
> v1 -> v2:
>   - format patches as a series
>   - use min_t instead of min to ensure size_t comparison
> (issue pointed out by kernel test robot )
> 
> Jorgen Hansen (3):
>   VMCI: Stop log spew when qp allocation isn't possible
>   VMCI: Use set_page_dirty_lock() when unregistering guest memory
>   VMCI: Enforce queuepair max size for IOCTL_VMCI_QUEUEPAIR_ALLOC
> 
>  drivers/misc/vmw_vmci/vmci_queue_pair.c | 16 ++--
>  include/linux/vmw_vmci_defs.h   |  4 ++--
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> -- 
> 2.6.2
> 

Please in the future properly thread your emails so that tools like 'b4'
can pick them all up at once.

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/5] drm/qxl: use drmm_mode_config_init

2021-01-26 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Daniel Vetter 
Acked-by: Thomas Zimmermann 
---
 drivers/gpu/drm/qxl/qxl_display.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 012bce0cdb65..38d6b596094d 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -1195,7 +1195,9 @@ int qxl_modeset_init(struct qxl_device *qdev)
int i;
int ret;
 
-   drm_mode_config_init(>ddev);
+   ret = drmm_mode_config_init(>ddev);
+   if (ret)
+   return ret;
 
ret = qxl_create_monitors_object(qdev);
if (ret)
@@ -1228,5 +1230,4 @@ int qxl_modeset_init(struct qxl_device *qdev)
 void qxl_modeset_fini(struct qxl_device *qdev)
 {
qxl_destroy_monitors_object(qdev);
-   drm_mode_config_cleanup(>ddev);
 }
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 2/5] drm/qxl: unpin release objects

2021-01-26 Thread Gerd Hoffmann
Balances the qxl_create_bo(..., pinned=true, ...);
call in qxl_release_bo_alloc().

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_release.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index c52412724c26..28013fd1f8ea 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -347,6 +347,7 @@ int qxl_alloc_release_reserved(struct qxl_device *qdev, 
unsigned long size,
 
mutex_lock(>release_mutex);
if (qdev->current_release_bo_offset[cur_idx] + 1 >= 
releases_per_bo[cur_idx]) {
+   qxl_bo_unpin(qdev->current_release_bo[cur_idx]);
qxl_bo_unref(>current_release_bo[cur_idx]);
qdev->current_release_bo_offset[cur_idx] = 0;
qdev->current_release_bo[cur_idx] = NULL;
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 5/5] drm/qxl: properly free qxl releases

2021-01-26 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_drv.h |  1 +
 drivers/gpu/drm/qxl/qxl_kms.c | 22 --
 drivers/gpu/drm/qxl/qxl_release.c |  2 ++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 01354b43c413..1c57b587b6a7 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -214,6 +214,7 @@ struct qxl_device {
spinlock_t  release_lock;
struct idr  release_idr;
uint32_trelease_seqno;
+   atomic_trelease_count;
spinlock_t release_idr_lock;
struct mutexasync_io_mutex;
unsigned int last_sent_io_cmd;
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index 4a60a52ab62e..f177f72bfc12 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -25,6 +25,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -286,8 +287,25 @@ int qxl_device_init(struct qxl_device *qdev,
 
 void qxl_device_fini(struct qxl_device *qdev)
 {
-   qxl_bo_unref(>current_release_bo[0]);
-   qxl_bo_unref(>current_release_bo[1]);
+   int cur_idx, try;
+
+   for (cur_idx = 0; cur_idx < 3; cur_idx++) {
+   if (!qdev->current_release_bo[cur_idx])
+   continue;
+   qxl_bo_unpin(qdev->current_release_bo[cur_idx]);
+   qxl_bo_unref(>current_release_bo[cur_idx]);
+   qdev->current_release_bo_offset[cur_idx] = 0;
+   qdev->current_release_bo[cur_idx] = NULL;
+   }
+
+   /*
+* Ask host to release resources (+fill release ring),
+* then wait for the release actually happening.
+*/
+   qxl_io_notify_oom(qdev);
+   for (try = 0; try < 20 && atomic_read(>release_count) > 0; try++)
+   msleep(20);
+
qxl_gem_fini(qdev);
qxl_bo_fini(qdev);
flush_work(>gc_work);
diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index 28013fd1f8ea..43a5436853b7 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -196,6 +196,7 @@ qxl_release_free(struct qxl_device *qdev,
qxl_release_free_list(release);
kfree(release);
}
+   atomic_dec(>release_count);
 }
 
 static int qxl_release_bo_alloc(struct qxl_device *qdev,
@@ -344,6 +345,7 @@ int qxl_alloc_release_reserved(struct qxl_device *qdev, 
unsigned long size,
*rbo = NULL;
return idr_ret;
}
+   atomic_inc(>release_count);
 
mutex_lock(>release_mutex);
if (qdev->current_release_bo_offset[cur_idx] + 1 >= 
releases_per_bo[cur_idx]) {
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 3/5] drm/qxl: release shadow on shutdown

2021-01-26 Thread Gerd Hoffmann
In case we have a shadow surface on shutdown release
it so it doesn't leak.

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_display.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 38d6b596094d..60331e31861a 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -1229,5 +1229,9 @@ int qxl_modeset_init(struct qxl_device *qdev)
 
 void qxl_modeset_fini(struct qxl_device *qdev)
 {
+   if (qdev->dumb_shadow_bo) {
+   drm_gem_object_put(>dumb_shadow_bo->tbo.base);
+   qdev->dumb_shadow_bo = NULL;
+   }
qxl_destroy_monitors_object(qdev);
 }
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 4/5] drm/qxl: handle shadow in primary destroy

2021-01-26 Thread Gerd Hoffmann
qxl_primary_atomic_disable must check whenever the framebuffer bo has a
shadow surface and in case it has check the shadow primary status.

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_display.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 60331e31861a..f5ee8cd72b5b 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -562,6 +562,8 @@ static void qxl_primary_atomic_disable(struct drm_plane 
*plane,
if (old_state->fb) {
struct qxl_bo *bo = gem_to_qxl_bo(old_state->fb->obj[0]);
 
+   if (bo->shadow)
+   bo = bo->shadow;
if (bo->is_primary) {
qxl_io_destroy_primary(qdev);
bo->is_primary = false;
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE

2021-01-26 Thread Michael S. Tsirkin
On Tue, Jan 26, 2021 at 12:58:29PM +0100, David Hildenbrand wrote:
> Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and
> "mhp_flags". As discussed recently [1], "mhp" is our internal
> acronym for memory hotplug now.
> 
> [1] 
> https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c3...@redhat.com/
> 
> Cc: Andrew Morton 
> Cc: "K. Y. Srinivasan" 
> Cc: Haiyang Zhang 
> Cc: Stephen Hemminger 
> Cc: Wei Liu 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: Stefano Stabellini 
> Cc: Pankaj Gupta 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Anshuman Khandual 
> Cc: Wei Yang 
> Cc: linux-hyp...@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: xen-de...@lists.xenproject.org
> Signed-off-by: David Hildenbrand 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/hv/hv_balloon.c| 2 +-
>  drivers/virtio/virtio_mem.c| 2 +-
>  drivers/xen/balloon.c  | 2 +-
>  include/linux/memory_hotplug.h | 2 +-
>  mm/memory_hotplug.c| 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 8c471823a5af..2f776d78e3c1 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
> long size,
>  
>   nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>   ret = add_memory(nid, PFN_PHYS((start_pfn)),
> - (HA_CHUNK << PAGE_SHIFT), MEMHP_MERGE_RESOURCE);
> + (HA_CHUNK << PAGE_SHIFT), MHP_MERGE_RESOURCE);
>  
>   if (ret) {
>   pr_err("hot_add memory failed error is %d\n", ret);
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 85a272c9978e..148bea39b09a 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -623,7 +623,7 @@ static int virtio_mem_add_memory(struct virtio_mem *vm, 
> uint64_t addr,
>   /* Memory might get onlined immediately. */
>   atomic64_add(size, >offline_size);
>   rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name,
> -MEMHP_MERGE_RESOURCE);
> +MHP_MERGE_RESOURCE);
>   if (rc) {
>   atomic64_sub(size, >offline_size);
>   dev_warn(>vdev->dev, "adding memory failed: %d\n", rc);
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index b57b2067ecbf..671c71245a7b 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void)
>   mutex_unlock(_mutex);
>   /* add_memory_resource() requires the device_hotplug lock */
>   lock_device_hotplug();
> - rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE);
> + rc = add_memory_resource(nid, resource, MHP_MERGE_RESOURCE);
>   unlock_device_hotplug();
>   mutex_lock(_mutex);
>  
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 3d99de0db2dd..4b834f5d032e 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -53,7 +53,7 @@ typedef int __bitwise mhp_t;
>   * with this flag set, the resource pointer must no longer be used as it
>   * might be stale, or the resource might have changed.
>   */
> -#define MEMHP_MERGE_RESOURCE ((__force mhp_t)BIT(0))
> +#define MHP_MERGE_RESOURCE   ((__force mhp_t)BIT(0))
>  
>  /*
>   * Extended parameters for memory hotplug:
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 710e469fb3a1..ae497e3ff77c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1153,7 +1153,7 @@ int __ref add_memory_resource(int nid, struct resource 
> *res, mhp_t mhp_flags)
>* In case we're allowed to merge the resource, flag it and trigger
>* merging now that adding succeeded.
>*/
> - if (mhp_flags & MEMHP_MERGE_RESOURCE)
> + if (mhp_flags & MHP_MERGE_RESOURCE)
>   merge_system_ram_resource(res);
>  
>   /* online pages if requested */
> -- 
> 2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH iproute2-next 2/2] vdpa: Add vdpa tool

2021-01-26 Thread Parav Pandit



> From: David Ahern 
> Sent: Tuesday, January 26, 2021 9:53 AM
> 
> Looks fine. A few comments below around code re-use.
> 
> On 1/22/21 4:26 AM, Parav Pandit wrote:
> > diff --git a/vdpa/vdpa.c b/vdpa/vdpa.c new file mode 100644 index
> > ..942524b7
> > --- /dev/null
> > +++ b/vdpa/vdpa.c
> > @@ -0,0 +1,828 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include "mnl_utils.h"
> > +
> > +#include "version.h"
> > +#include "json_print.h"
> > +#include "utils.h"
> > +
> > +static int g_indent_level;
> > +
> > +#define INDENT_STR_STEP 2
> > +#define INDENT_STR_MAXLEN 32
> > +static char g_indent_str[INDENT_STR_MAXLEN + 1] = "";
> 
> The indent code has a lot of parallels with devlink -- including helpers below
> around indent_inc and _dec. Please take a look at how to refactor and re-
> use.
> 
Ok. Devlink has some more convoluted code with next line etc.
But I will see if I can consolidate without changing the devlink's flow/logic.

> > +
> > +struct vdpa_socket {
> > +   struct mnl_socket *nl;
> > +   char *buf;
> > +   uint32_t family;
> > +   unsigned int seq;
> > +};
> > +
> > +static int vdpa_socket_sndrcv(struct vdpa_socket *nlg, const struct
> nlmsghdr *nlh,
> > + mnl_cb_t data_cb, void *data) {
> > +   int err;
> > +
> > +   err = mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len);
> > +   if (err < 0) {
> > +   perror("Failed to send data");
> > +   return -errno;
> > +   }
> > +
> > +   err = mnlu_socket_recv_run(nlg->nl, nlh->nlmsg_seq, nlg->buf,
> MNL_SOCKET_BUFFER_SIZE,
> > +  data_cb, data);
> > +   if (err < 0) {
> > +   fprintf(stderr, "vdpa answers: %s\n", strerror(errno));
> > +   return -errno;
> > +   }
> > +   return 0;
> > +}
> > +
> > +static int get_family_id_attr_cb(const struct nlattr *attr, void
> > +*data) {
> > +   int type = mnl_attr_get_type(attr);
> > +   const struct nlattr **tb = data;
> > +
> > +   if (mnl_attr_type_valid(attr, CTRL_ATTR_MAX) < 0)
> > +   return MNL_CB_ERROR;
> > +
> > +   if (type == CTRL_ATTR_FAMILY_ID &&
> > +   mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
> > +   return MNL_CB_ERROR;
> > +   tb[type] = attr;
> > +   return MNL_CB_OK;
> > +}
> > +
> > +static int get_family_id_cb(const struct nlmsghdr *nlh, void *data) {
> > +   struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
> > +   struct nlattr *tb[CTRL_ATTR_MAX + 1] = {};
> > +   uint32_t *p_id = data;
> > +
> > +   mnl_attr_parse(nlh, sizeof(*genl), get_family_id_attr_cb, tb);
> > +   if (!tb[CTRL_ATTR_FAMILY_ID])
> > +   return MNL_CB_ERROR;
> > +   *p_id = mnl_attr_get_u16(tb[CTRL_ATTR_FAMILY_ID]);
> > +   return MNL_CB_OK;
> > +}
> > +
> > +static int family_get(struct vdpa_socket *nlg) {
> > +   struct genlmsghdr hdr = {};
> > +   struct nlmsghdr *nlh;
> > +   int err;
> > +
> > +   hdr.cmd = CTRL_CMD_GETFAMILY;
> > +   hdr.version = 0x1;
> > +
> > +   nlh = mnlu_msg_prepare(nlg->buf, GENL_ID_CTRL,
> > +  NLM_F_REQUEST | NLM_F_ACK,
> > +  , sizeof(hdr));
> > +
> > +   mnl_attr_put_strz(nlh, CTRL_ATTR_FAMILY_NAME,
> VDPA_GENL_NAME);
> > +
> > +   err = mnl_socket_sendto(nlg->nl, nlh, nlh->nlmsg_len);
> > +   if (err < 0)
> > +   return err;
> > +
> > +   err = mnlu_socket_recv_run(nlg->nl, nlh->nlmsg_seq, nlg->buf,
> > +  MNL_SOCKET_BUFFER_SIZE,
> > +  get_family_id_cb, >family);
> > +   return err;
> > +}
> > +
> > +static int vdpa_socket_open(struct vdpa_socket *nlg) {
> > +   int err;
> > +
> > +   nlg->buf = malloc(MNL_SOCKET_BUFFER_SIZE);
> > +   if (!nlg->buf)
> > +   goto err_buf_alloc;
> > +
> > +   nlg->nl = mnlu_socket_open(NETLINK_GENERIC);
> > +   if (!nlg->nl)
> > +   goto err_socket_open;
> > +
> > +   err = family_get(nlg);
> > +   if (err)
> > +   goto err_socket;
> > +
> > +   return 0;
> > +
> > +err_socket:
> > +   mnl_socket_close(nlg->nl);
> > +err_socket_open:
> > +   free(nlg->buf);
> > +err_buf_alloc:
> > +   return -1;
> > +}
> 
> The above 4 functions duplicate a lot of devlink functionality. Please create 
> a
> helper in lib/mnl_utils.c that can be used in both.
> 
Will do.

> > +
> > +static unsigned int strslashcount(char *str) {
> > +   unsigned int count = 0;
> > +   char *pos = str;
> > +
> > +   while ((pos = strchr(pos, '/'))) {
> > +   count++;
> > +   pos++;
> > +   }
> > +   return count;
> > +}
> 
> you could make that a generic function (e.g., str_char_count) by passing '/' 
> as
> an input.
> 
Yes.

> > +
> > +static int strslashrsplit(char *str, const char **before, const char
> > +**after) {
> > +   char *slash;
> > +
> > +   slash = strrchr(str, '/');
> > +   if (!slash)
> > +   return -EINVAL;
> > +   *slash = '\0';
> > +   *before = str;
> > +   

Re: [PATCH v2 10/11] drm: Use state helper instead of the plane state pointer

2021-01-26 Thread Ville Syrjälä
On Thu, Jan 21, 2021 at 05:35:35PM +0100, Maxime Ripard wrote:
> Many drivers reference the plane->state pointer in order to get the
> current plane state in their atomic_update or atomic_disable hooks,
> which would be the new plane state in the global atomic state since
> _swap_state happened when those hooks are run.
> 
> Use the drm_atomic_get_new_plane_state helper to get that state to make it
> more obvious.
> 
> This was made using the coccinelle script below:
> 
> @ plane_atomic_func @
> identifier helpers;
> identifier func;
> @@
> 
> (
>  static const struct drm_plane_helper_funcs helpers = {
>   ...,
>   .atomic_disable = func,
>   ...,
>  };
> |
>  static const struct drm_plane_helper_funcs helpers = {
>   ...,
>   .atomic_update = func,
>   ...,
>  };
> )
> 
> @ adds_new_state @
> identifier plane_atomic_func.func;
> identifier plane, state;
> identifier new_state;
> @@
> 
>  func(struct drm_plane *plane, struct drm_atomic_state *state)
>  {
>   ...
> - struct drm_plane_state *new_state = plane->state;
> + struct drm_plane_state *new_state = 
> drm_atomic_get_new_plane_state(state, plane);
>   ...
>  }
> 
> @ include depends on adds_new_state @
> @@
> 
>  #include 
> 
> @ no_include depends on !include && adds_new_state @
> @@
> 
> + #include 
>   #include 
> 
> Signed-off-by: Maxime Ripard 

Looks great.

Reviewed-by: Ville Syrjälä 

-- 
Ville Syrjälä
Intel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1] mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE

2021-01-26 Thread David Hildenbrand
Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and
"mhp_flags". As discussed recently [1], "mhp" is our internal
acronym for memory hotplug now.

[1] 
https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c3...@redhat.com/

Cc: Andrew Morton 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Wei Liu 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Pankaj Gupta 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Anshuman Khandual 
Cc: Wei Yang 
Cc: linux-hyp...@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Signed-off-by: David Hildenbrand 
---
 drivers/hv/hv_balloon.c| 2 +-
 drivers/virtio/virtio_mem.c| 2 +-
 drivers/xen/balloon.c  | 2 +-
 include/linux/memory_hotplug.h | 2 +-
 mm/memory_hotplug.c| 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 8c471823a5af..2f776d78e3c1 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
 
nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
ret = add_memory(nid, PFN_PHYS((start_pfn)),
-   (HA_CHUNK << PAGE_SHIFT), MEMHP_MERGE_RESOURCE);
+   (HA_CHUNK << PAGE_SHIFT), MHP_MERGE_RESOURCE);
 
if (ret) {
pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 85a272c9978e..148bea39b09a 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -623,7 +623,7 @@ static int virtio_mem_add_memory(struct virtio_mem *vm, 
uint64_t addr,
/* Memory might get onlined immediately. */
atomic64_add(size, >offline_size);
rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name,
-  MEMHP_MERGE_RESOURCE);
+  MHP_MERGE_RESOURCE);
if (rc) {
atomic64_sub(size, >offline_size);
dev_warn(>vdev->dev, "adding memory failed: %d\n", rc);
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index b57b2067ecbf..671c71245a7b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void)
mutex_unlock(_mutex);
/* add_memory_resource() requires the device_hotplug lock */
lock_device_hotplug();
-   rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE);
+   rc = add_memory_resource(nid, resource, MHP_MERGE_RESOURCE);
unlock_device_hotplug();
mutex_lock(_mutex);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 3d99de0db2dd..4b834f5d032e 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -53,7 +53,7 @@ typedef int __bitwise mhp_t;
  * with this flag set, the resource pointer must no longer be used as it
  * might be stale, or the resource might have changed.
  */
-#define MEMHP_MERGE_RESOURCE   ((__force mhp_t)BIT(0))
+#define MHP_MERGE_RESOURCE ((__force mhp_t)BIT(0))
 
 /*
  * Extended parameters for memory hotplug:
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 710e469fb3a1..ae497e3ff77c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1153,7 +1153,7 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
 * In case we're allowed to merge the resource, flag it and trigger
 * merging now that adding succeeded.
 */
-   if (mhp_flags & MEMHP_MERGE_RESOURCE)
+   if (mhp_flags & MHP_MERGE_RESOURCE)
merge_system_ram_resource(res);
 
/* online pages if requested */
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v3 00/13] virtio/vsock: introduce SOCK_SEQPACKET support

2021-01-26 Thread Stefano Garzarella

Hi Arseny,
thanks for this new series!
I'm a bit busy but I hope to review it tomorrow or on Thursday.

Stefano

On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:

This patchset impelements support of SOCK_SEQPACKET for virtio
transport.
As SOCK_SEQPACKET guarantees to save record boundaries, so to
do it, new packet operation was added: it marks start of record (with
record length in header), such packet doesn't carry any data.  To send
record, packet with start marker is sent first, then all data is sent
as usual 'RW' packets. On receiver's side, length of record is known
from packet with start record marker. Now as  packets of one socket
are not reordered neither on vsock nor on vhost transport layers, such
marker allows to restore original record on receiver's side. If user's
buffer is smaller that record length, when all out of size data is
dropped.
Maximum length of datagram is not limited as in stream socket,
because same credit logic is used. Difference with stream socket is
that user is not woken up until whole record is received or error
occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
Tests also implemented.

Arseny Krasnov (13):
 af_vsock: prepare for SOCK_SEQPACKET support
 af_vsock: prepare 'vsock_connectible_recvmsg()'
 af_vsock: implement SEQPACKET rx loop
 af_vsock: implement send logic for SOCK_SEQPACKET
 af_vsock: rest of SEQPACKET support
 af_vsock: update comments for stream sockets
 virtio/vsock: dequeue callback for SOCK_SEQPACKET
 virtio/vsock: fetch length for SEQPACKET record
 virtio/vsock: add SEQPACKET receive logic
 virtio/vsock: rest of SOCK_SEQPACKET support
 virtio/vsock: setup SEQPACKET ops for transport
 vhost/vsock: setup SEQPACKET ops for transport
 vsock_test: add SOCK_SEQPACKET tests

drivers/vhost/vsock.c   |   7 +-
include/linux/virtio_vsock.h|  12 +
include/net/af_vsock.h  |   6 +
include/uapi/linux/virtio_vsock.h   |   9 +
net/vmw_vsock/af_vsock.c| 543 --
net/vmw_vsock/virtio_transport.c|   4 +
net/vmw_vsock/virtio_transport_common.c | 295 ++--
tools/testing/vsock/util.c  |  32 +-
tools/testing/vsock/util.h  |   3 +
tools/testing/vsock/vsock_test.c| 126 +
10 files changed, 862 insertions(+), 175 deletions(-)

TODO:
- Support for record integrity control. As transport could drop some
  packets, something like "record-id" and record end marker need to
  be implemented. Idea is that SEQ_BEGIN packet carries both record
  length and record id, end marker(let it be SEQ_END) carries only
  record id. To be sure that no one packet was lost, receiver checks
  length of data between SEQ_BEGIN and SEQ_END(it must be same with
  value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
  means that both markers were not dropped. I think that easiest way
  to implement record id for SEQ_BEGIN is to reuse another field of
  packet header(SEQ_BEGIN already uses 'flags' as record length).For
  SEQ_END record id could be stored in 'flags'.
Another way to implement it, is to move metadata of both SEQ_END
  and SEQ_BEGIN to payload. But this approach has problem, because
  if we move something to payload, such payload is accounted by
  credit logic, which fragments payload, while payload with record
  length and id couldn't be fragmented. One way to overcome it is to
  ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
  is to update 'stream_has_space()' function: current implementation
  return non-zero when at least 1 byte is allowed to use,but updated
  version will have extra argument, which is needed length. For 'RW'
  packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
  record id) and for SEQ_END it is sizeof(record id).

- What to do, when server doesn't support SOCK_SEQPACKET. In current
  implementation RST is replied in the same way when listening port
  is not found. I think that current RST is enough,because case when
  server doesn't support SEQ_PACKET is same when listener missed(e.g.
  no listener in both cases).

v2 -> v3:
- patches reorganized: split for prepare and implementation patches
- local variables are declared in "Reverse Christmas tree" manner
- virtio_transport_common.c: valid leXX_to_cpu() for vsock header
  fields access
- af_vsock.c: 'vsock_connectible_*sockopt()' added as shared code
  between stream and seqpacket sockets.
- af_vsock.c: loops in '__vsock_*_recvmsg()' refactored.
- af_vsock.c: 'vsock_wait_data()' refactored.

v1 -> v2:
- patches reordered: af_vsock.c related changes now before virtio vsock
- patches reorganized: more small patches, where +/- are not mixed
- tests for SOCK_SEQPACKET added
- all commit messages updated
- af_vsock.c: 'vsock_pre_recv_check()' inlined to
  'vsock_connectible_recvmsg()'
- af_vsock.c: 'vsock_assign_transport()' returns ENODEV if transport
  was not found
- 

Re: [PATCH v2] virtio-blk: support per-device queue depth

2021-01-26 Thread Stefan Hajnoczi
On Fri, Jan 22, 2021 at 05:21:46PM +0800, Joseph Qi wrote:
> module parameter 'virtblk_queue_depth' was firstly introduced for
> testing/benchmarking purposes described in commit fc4324b4597c
> ("virtio-blk: base queue-depth on virtqueue ringsize or module param").
> And currently 'virtblk_queue_depth' is used as a saved value for the
> first probed device.
> Since we have different virtio-blk devices which have different
> capabilities, it requires that we support per-device queue depth instead
> of per-module. So defaultly use vq free elements if module parameter
> 'virtblk_queue_depth' is not set.
> 
> Signed-off-by: Joseph Qi 
> Acked-by: Jason Wang 
> ---
>  drivers/block/virtio_blk.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[RFC v2 3/3] vhost: Add Vdmabuf backend

2021-01-26 Thread Vivek Kasireddy
This backend acts as the counterpart to the Vdmabuf Virtio frontend.
When it receives a new export event from the frontend, it raises an
event to alert the Qemu UI/userspace. Qemu then "imports" this buffer
using the Unique ID.

As part of the import step, a new dmabuf is created on the Host using
the page information obtained from the Guest. The fd associated with
this dmabuf is made available to Qemu UI/userspace which then creates
a texture from it for the purpose of displaying it.

Signed-off-by: Dongwon Kim 
Signed-off-by: Vivek Kasireddy 
---
 drivers/vhost/Kconfig  |9 +
 drivers/vhost/Makefile |3 +
 drivers/vhost/vdmabuf.c| 1407 
 include/uapi/linux/vhost.h |3 +
 4 files changed, 1422 insertions(+)
 create mode 100644 drivers/vhost/vdmabuf.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..9a99cc2611ca 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -89,4 +89,13 @@ config VHOST_CROSS_ENDIAN_LEGACY
 
  If unsure, say "N".
 
+config VHOST_VDMABUF
+   bool "Vhost backend for the Vdmabuf driver"
+   depends on KVM && EVENTFD
+   select VHOST
+   default n
+   help
+ This driver works in pair with the Virtio Vdmabuf frontend. It can
+ be used to create a dmabuf using the pages shared by the Guest.
+
 endif
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..5c2cea4a7eaf 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -17,3 +17,6 @@ obj-$(CONFIG_VHOST)   += vhost.o
 
 obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
 vhost_iotlb-y := iotlb.o
+
+obj-$(CONFIG_VHOST_VDMABUF) += vhost_vdmabuf.o
+vhost_vdmabuf-y := vdmabuf.o
diff --git a/drivers/vhost/vdmabuf.c b/drivers/vhost/vdmabuf.c
new file mode 100644
index ..2a8a1d852e93
--- /dev/null
+++ b/drivers/vhost/vdmabuf.c
@@ -0,0 +1,1407 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Dongwon Kim 
+ *Mateusz Polrola 
+ *Vivek Kasireddy 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+
+static struct virtio_vdmabuf_info *drv_info;
+
+struct kvm_instance {
+   struct kvm *kvm;
+   struct list_head link;
+};
+
+struct vhost_vdmabuf {
+   struct vhost_dev dev;
+   struct vhost_virtqueue vq;
+   struct vhost_work tx_work;
+   struct virtio_vdmabuf_event_queue *evq;
+   u64 vmid;
+
+   /* synchronization between transmissions */
+   struct mutex tx_mutex;
+   /* synchronization on tx and rx */
+   struct mutex vq_mutex;
+
+   struct virtio_vdmabuf_txmsg next;
+   struct list_head list;
+   struct kvm *kvm;
+};
+
+static inline void vhost_vdmabuf_add(struct vhost_vdmabuf *new)
+{
+   list_add_tail(>list, _info->head_vdmabuf_list);
+}
+
+static inline struct vhost_vdmabuf *vhost_vdmabuf_find(u64 vmid)
+{
+   struct vhost_vdmabuf *found;
+
+   list_for_each_entry(found, _info->head_vdmabuf_list, list)
+   if (found->vmid == vmid)
+   return found;
+
+   return NULL;
+}
+
+static inline bool vhost_vdmabuf_del(struct vhost_vdmabuf *vdmabuf)
+{
+   struct vhost_vdmabuf *iter, *temp;
+
+   list_for_each_entry_safe(iter, temp,
+_info->head_vdmabuf_list,
+list)
+   if (iter == vdmabuf) {
+   list_del(>list);
+   return true;
+   }
+
+   return false;
+}
+
+static inline void vhost_vdmabuf_del_all(void)
+{
+   struct 

[RFC v2 2/3] virtio: Introduce Vdmabuf driver

2021-01-26 Thread Vivek Kasireddy
This driver "transfers" a dmabuf created on the Guest to the Host.
A common use-case for such a transfer includes sharing the scanout
buffer created by a display server or a compositor running in the
Guest with Qemu UI -- running on the Host.

The "transfer" is accomplished by sharing the PFNs of all the pages
associated with the dmabuf and having a new dmabuf created on the
Host that is backed up by the pages mapped from the Guest.

Signed-off-by: Dongwon Kim 
Signed-off-by: Vivek Kasireddy 
---
 drivers/virtio/Kconfig  |   8 +
 drivers/virtio/Makefile |   1 +
 drivers/virtio/virtio_vdmabuf.c | 986 
 include/linux/virtio_vdmabuf.h  | 272 
 include/uapi/linux/virtio_ids.h |   1 +
 include/uapi/linux/virtio_vdmabuf.h |  99 +++
 6 files changed, 1367 insertions(+)
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 7b41130d3f35..e563c12f711e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER
 This option adds a flavor of dma buffers that are backed by
 virtio resources.
 
+config VIRTIO_VDMABUF
+   bool "Enables Vdmabuf driver in guest os"
+   default n
+   depends on VIRTIO
+   help
+This driver provides a way to share the dmabufs created in
+the Guest with the Host.
+
 endif # VIRTIO_MENU
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 591e6f72aa54..b4bb0738009c 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
 obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
 obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
 obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
+obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o
diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c
new file mode 100644
index ..0b40ea4fd6f1
--- /dev/null
+++ b/drivers/virtio/virtio_vdmabuf.c
@@ -0,0 +1,986 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Dongwon Kim 
+ *Mateusz Polrola 
+ *Vivek Kasireddy 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define VIRTIO_VDMABUF_MAX_ID INT_MAX
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0x) << 32) | \
+   ((cnt) & 0x))
+
+/* one global drv object */
+static struct virtio_vdmabuf_info *drv_info;
+
+struct virtio_vdmabuf {
+   /* virtio device structure */
+   struct virtio_device *vdev;
+
+   /* virtual queue array */
+   struct virtqueue *vq;
+
+   /* ID of guest OS */
+   u64 vmid;
+
+   /* spin lock that needs to be acquired before accessing
+* virtual queue
+*/
+   spinlock_t vq_lock;
+   struct mutex rx_lock;
+
+   /* workqueue */
+   struct workqueue_struct *wq;
+   struct work_struct rx_work;
+   struct virtio_vdmabuf_event_queue *evq;
+};
+
+static virtio_vdmabuf_buf_id_t get_buf_id(struct virtio_vdmabuf *vdmabuf)
+{
+   virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} };
+   static int count = 0;
+
+   count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0;
+   buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count);
+
+   /* random data embedded in the id for security */
+   get_random_bytes(_id.rng_key[0], 8);
+
+   return buf_id;
+}
+
+/* sharing pages for original DMABUF with Host */
+static struct 

[RFC v2 1/3] kvm: Add a notifier for create and destroy VM events

2021-01-26 Thread Vivek Kasireddy
After registering with this notifier, other drivers that are dependent
on KVM can get notified whenever a VM is created or destroyed. This
also provides a way for sharing the KVM instance pointer with other
drivers.

Signed-off-by: Vivek Kasireddy 
---
 include/linux/kvm_host.h |  5 +
 virt/kvm/kvm_main.c  | 20 ++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3b1013fb22c..fc1a688301a0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -88,6 +88,9 @@
 #define KVM_PFN_ERR_HWPOISON   (KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_RO_FAULT   (KVM_PFN_ERR_MASK + 2)
 
+#define KVM_EVENT_CREATE_VM 0
+#define KVM_EVENT_DESTROY_VM 1
+
 /*
  * error pfns indicate that the gfn is in slot but faild to
  * translate it to pfn on host.
@@ -1494,5 +1497,7 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu 
*vcpu)
 
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
+int kvm_vm_register_notifier(struct notifier_block *nb);
+int kvm_vm_unregister_notifier(struct notifier_block *nb);
 
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5f260488e999..8a0e8bb02a5f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -101,6 +101,8 @@ DEFINE_MUTEX(kvm_lock);
 static DEFINE_RAW_SPINLOCK(kvm_count_lock);
 LIST_HEAD(vm_list);
 
+static struct blocking_notifier_head kvm_vm_notifier;
+
 static cpumask_var_t cpus_hardware_enabled;
 static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
@@ -148,12 +150,20 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 __visible bool kvm_rebooting;
 EXPORT_SYMBOL_GPL(kvm_rebooting);
 
-#define KVM_EVENT_CREATE_VM 0
-#define KVM_EVENT_DESTROY_VM 1
 static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
 static unsigned long long kvm_createvm_count;
 static unsigned long long kvm_active_vms;
 
+inline int kvm_vm_register_notifier(struct notifier_block *nb)
+{
+   return blocking_notifier_chain_register(_vm_notifier, nb);
+}
+
+inline int kvm_vm_unregister_notifier(struct notifier_block *nb)
+{
+   return blocking_notifier_chain_unregister(_vm_notifier, nb);
+}
+
 __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
   unsigned long start, 
unsigned long end)
 {
@@ -808,6 +818,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
 
preempt_notifier_inc();
 
+   blocking_notifier_call_chain(_vm_notifier,
+KVM_EVENT_CREATE_VM, kvm);
return kvm;
 
 out_err:
@@ -886,6 +898,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
preempt_notifier_dec();
hardware_disable_all();
mmdrop(mm);
+   blocking_notifier_call_chain(_vm_notifier,
+KVM_EVENT_DESTROY_VM, kvm);
 }
 
 void kvm_get_kvm(struct kvm *kvm)
@@ -4968,6 +4982,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
r = kvm_vfio_ops_init();
WARN_ON(r);
 
+   BLOCKING_INIT_NOTIFIER_HEAD(_vm_notifier);
+
return 0;
 
 out_unreg:
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC v2 0/3] Introduce Vdmabuf driver

2021-01-26 Thread Vivek Kasireddy
The Virtual dmabuf or Virtio based dmabuf (Vdmabuf) driver can be used
to "transfer" a page-backed dmabuf created in the Guest to the Host
without making any copies. This is mostly accomplished by recreating the
dmabuf on the Host using the PFNs and other meta-data shared by the guest. 
A use-case where this driver would be a good fit is a multi-GPU system 
(perhaps one discrete and one integrated) where one of the GPUs does not 
have access to the display/connectors/outputs. This could be an embedded 
system design decision or a restriction made at the firmware/BIOS level
or perhaps the device is setup in UPT (Universal Passthrough) mode. When 
such a GPU is passthrough'd to a Guest OS, this driver can help in 
transferring the scanout buffer(s) (rendered using the native rendering 
stack) to the Host for the purpose of displaying them.

The userspace component running in the Guest that transfers the dmabuf
is referred to as the producer or exporter and its counterpart running
in the Host is referred to as importer or consumer. For instance, a
Wayland compositor would potentially be a producer and Qemu UI would
be a consumer. It is the producer's responsibility to not reuse or
destroy the shared buffer while it is still being used by the consumer.
The consumer would send a release cmd indicating that it is done after
which the shared buffer can be safely used again by the producer. One
way the producer can prevent accidental re-use of the shared buffer is
to lock the buffer when it exports it and unlock it after it gets a 
release cmd. As an example, the GBM API provides a simple way to lock 
and unlock a surface's buffers.

For each dmabuf that is to be shared with the Host, a 128-bit unique
ID is generated that identifies this buffer across the whole system.
This ID is a combination of the Qemu process ID, a counter and a
randomizer. We could potentially use UUID API but we currently use
the above mentioned combination to identify the source of the buffer
at any given time for bookkeeping.

v2:
- Added a notifier mechanism for getting the kvm pointer instead of
  sharing it via VFIO.
- Added start and stop routines in the Vhost backend.
- Augmented the cover letter and made some minor improvements.

Vivek Kasireddy (3):
  kvm: Add a notifier for create and destroy VM events
  virtio: Introduce Vdmabuf driver
  vhost: Add Vdmabuf backend

 drivers/vhost/Kconfig   |9 +
 drivers/vhost/Makefile  |3 +
 drivers/vhost/vdmabuf.c | 1407 +++
 drivers/virtio/Kconfig  |8 +
 drivers/virtio/Makefile |1 +
 drivers/virtio/virtio_vdmabuf.c |  986 +++
 include/linux/kvm_host.h|5 +
 include/linux/virtio_vdmabuf.h  |  272 ++
 include/uapi/linux/vhost.h  |3 +
 include/uapi/linux/virtio_ids.h |1 +
 include/uapi/linux/virtio_vdmabuf.h |   99 ++
 virt/kvm/kvm_main.c |   20 +-
 12 files changed, 2812 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vhost/vdmabuf.c
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 8/9] ALSA: virtio: introduce PCM channel map support

2021-01-26 Thread Guennadi Liakhovetski



On Sun, 24 Jan 2021, Anton Yakovlev wrote:


Enumerate all available PCM channel maps and create ALSA controls.

Signed-off-by: Anton Yakovlev 
---
sound/virtio/Makefile   |   1 +
sound/virtio/virtio_card.c  |  15 +++
sound/virtio/virtio_card.h  |   8 ++
sound/virtio/virtio_chmap.c | 237 
sound/virtio/virtio_pcm.h   |   4 +
5 files changed, 265 insertions(+)
create mode 100644 sound/virtio/virtio_chmap.c


[snip]


diff --git a/sound/virtio/virtio_chmap.c b/sound/virtio/virtio_chmap.c
new file mode 100644
index ..8a2ddc4dcffb
--- /dev/null
+++ b/sound/virtio/virtio_chmap.c
@@ -0,0 +1,237 @@


[snip]


+/**
+ * virtsnd_chmap_parse_cfg() - Parse the channel map configuration.
+ * @snd: VirtIO sound device.
+ *
+ * This function is called during initial device initialization.
+ *
+ * Context: Any context that permits to sleep.
+ * Return: 0 on success, -errno on failure.
+ */
+int virtsnd_chmap_parse_cfg(struct virtio_snd *snd)
+{
+   struct virtio_device *vdev = snd->vdev;
+   unsigned int i;
+   int rc;
+
+   virtio_cread(vdev, struct virtio_snd_config, chmaps, >nchmaps);
+   if (!snd->nchmaps)
+   return 0;
+
+   snd->chmaps = devm_kcalloc(>dev, snd->nchmaps,
+  sizeof(*snd->chmaps), GFP_KERNEL);
+   if (!snd->chmaps)
+   return -ENOMEM;
+
+   rc = virtsnd_ctl_query_info(snd, VIRTIO_SND_R_CHMAP_INFO, 0,
+   snd->nchmaps, sizeof(*snd->chmaps),
+   snd->chmaps);
+   if (rc)
+   return rc;
+
+   /* Count the number of channel maps per each PCM device/stream. */
+   for (i = 0; i < snd->nchmaps; ++i) {
+   struct virtio_snd_chmap_info *info = >chmaps[i];
+   unsigned int nid = le32_to_cpu(info->hdr.hda_fn_nid);
+   struct virtio_pcm *pcm;
+   struct virtio_pcm_stream *stream;
+
+   pcm = virtsnd_pcm_find_or_create(snd, nid);
+   if (IS_ERR(pcm))
+   return PTR_ERR(pcm);
+
+   switch (info->direction) {
+   case VIRTIO_SND_D_OUTPUT: {
+   stream = >streams[SNDRV_PCM_STREAM_PLAYBACK];
+   break;
+   }
+   case VIRTIO_SND_D_INPUT: {
+   stream = >streams[SNDRV_PCM_STREAM_CAPTURE];
+   break;
+   }
+   default: {
+   dev_err(>dev,
+   "chmap #%u: unknown direction (%u)\n", i,
+   info->direction);
+   return -EINVAL;
+   }
+   }
+
+   stream->nchmaps++;
+   }
+
+   return 0;
+}
+
+/**
+ * virtsnd_chmap_add_ctls() - Create an ALSA control for channel maps.
+ * @pcm: ALSA PCM device.
+ * @direction: PCM stream direction (SNDRV_PCM_STREAM_XXX).
+ * @stream: VirtIO PCM stream.
+ *
+ * Context: Any context.
+ * Return: 0 on success, -errno on failure.
+ */
+static int virtsnd_chmap_add_ctls(struct snd_pcm *pcm, int direction,
+ struct virtio_pcm_stream *stream)
+{
+   unsigned int i;
+   int max_channels = 0;
+
+   for (i = 0; i < stream->nchmaps; i++)
+   if (max_channels < stream->chmaps[i].channels)
+   max_channels = stream->chmaps[i].channels;
+
+   return snd_pcm_add_chmap_ctls(pcm, direction, stream->chmaps,
+ max_channels, 0, NULL);
+}
+
+/**
+ * virtsnd_chmap_build_devs() - Build ALSA controls for channel maps.
+ * @snd: VirtIO sound device.
+ *
+ * Context: Any context.
+ * Return: 0 on success, -errno on failure.
+ */
+int virtsnd_chmap_build_devs(struct virtio_snd *snd)
+{
+   struct virtio_device *vdev = snd->vdev;
+   struct virtio_pcm *pcm;
+   struct virtio_pcm_stream *stream;
+   unsigned int i;
+   int rc;
+
+   /* Allocate channel map elements per each PCM device/stream. */
+   list_for_each_entry(pcm, >pcm_list, list) {
+   for (i = 0; i < ARRAY_SIZE(pcm->streams); ++i) {
+   stream = >streams[i];
+
+   if (!stream->nchmaps)
+   continue;
+
+   stream->chmaps = devm_kcalloc(>dev,
+ stream->nchmaps + 1,
+ sizeof(*stream->chmaps),
+ GFP_KERNEL);
+   if (!stream->chmaps)
+   return -ENOMEM;
+
+   stream->nchmaps = 0;
+   }
+   }
+
+   /* Initialize channel maps per each PCM device/stream. */
+   for (i = 0; i < snd->nchmaps; ++i) {
+   struct virtio_snd_chmap_info *info = >chmaps[i];
+   

Re: [PATCH v3] vhost_vdpa: fix the problem in vhost_vdpa_set_config_call

2021-01-26 Thread Jason Wang


On 2021/1/26 下午3:16, Cindy Lu wrote:

In vhost_vdpa_set_config_call, the cb.private should be vhost_vdpa.
this cb.private will finally use in vhost_vdpa_config_cb as
vhost_vdpa. Fix this issue.

Cc: sta...@vger.kernel.org
Fixes: 776f395004d82 ("vhost_vdpa: Support config interrupt in vdpa")
Signed-off-by: Cindy Lu 



Acked-by: Jason Wang 



---
  drivers/vhost/vdpa.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ef688c8c0e0e..3fbb9c1f49da 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -319,7 +319,7 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa 
*v, u32 __user *argp)
struct eventfd_ctx *ctx;
  
  	cb.callback = vhost_vdpa_config_cb;

-   cb.private = v->vdpa;
+   cb.private = v;
if (copy_from_user(, argp, sizeof(fd)))
return  -EFAULT;
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v3 08/11] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-01-26 Thread Jason Wang


On 2021/1/19 下午1:07, Xie Yongji wrote:

This VDUSE driver enables implementing vDPA devices in userspace.
Both control path and data path of vDPA devices will be able to
be handled in userspace.

In the control path, the VDUSE driver will make use of message
mechnism to forward the config operation from vdpa bus driver
to userspace. Userspace can use read()/write() to receive/reply
those control messages.

In the data path, VDUSE_IOTLB_GET_FD ioctl will be used to get
the file descriptors referring to vDPA device's iova regions. Then
userspace can use mmap() to access those iova regions. Besides,
the eventfd mechanism is used to trigger interrupt callbacks and
receive virtqueue kicks in userspace.

Signed-off-by: Xie Yongji
---
  Documentation/driver-api/vduse.rst |   85 ++
  Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
  drivers/vdpa/Kconfig   |7 +
  drivers/vdpa/Makefile  |1 +
  drivers/vdpa/vdpa_user/Makefile|5 +
  drivers/vdpa/vdpa_user/eventfd.c   |  221 
  drivers/vdpa/vdpa_user/eventfd.h   |   48 +
  drivers/vdpa/vdpa_user/iova_domain.c   |  426 +++
  drivers/vdpa/vdpa_user/iova_domain.h   |   68 ++
  drivers/vdpa/vdpa_user/vduse.h |   62 +
  drivers/vdpa/vdpa_user/vduse_dev.c | 1217 
  include/uapi/linux/vdpa.h  |1 +
  include/uapi/linux/vduse.h |  125 ++
  13 files changed, 2267 insertions(+)
  create mode 100644 Documentation/driver-api/vduse.rst
  create mode 100644 drivers/vdpa/vdpa_user/Makefile
  create mode 100644 drivers/vdpa/vdpa_user/eventfd.c
  create mode 100644 drivers/vdpa/vdpa_user/eventfd.h
  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h
  create mode 100644 drivers/vdpa/vdpa_user/vduse.h
  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
  create mode 100644 include/uapi/linux/vduse.h



Btw, if you could split this into three parts:

1) iova domain
2) vduse device
3) doc

It would be more easier for the reviewers.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v3 11/11] vduse: Introduce a workqueue for irq injection

2021-01-26 Thread Jason Wang


On 2021/1/19 下午1:07, Xie Yongji wrote:

This patch introduces a dedicated workqueue for irq injection
so that we are able to do some performance tuning for it.

Signed-off-by: Xie Yongji 



If we want the split like this.

It might be better to:

1) implement a simple irq injection on the ioctl context in patch 8
2) add the dedicated workqueue injection in this patch

Since my understanding is that

1) the function looks more isolated for readers
2) the difference between sysctl vs workqueue should be more obvious 
than system wq vs dedicated wq
3) a chance to describe why workqueue is needed in the commit log in 
this patch


Thanks



---
  drivers/vdpa/vdpa_user/eventfd.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/vdpa_user/eventfd.c b/drivers/vdpa/vdpa_user/eventfd.c
index dbffddb08908..caf7d8d68ac0 100644
--- a/drivers/vdpa/vdpa_user/eventfd.c
+++ b/drivers/vdpa/vdpa_user/eventfd.c
@@ -18,6 +18,7 @@
  #include "eventfd.h"
  
  static struct workqueue_struct *vduse_irqfd_cleanup_wq;

+static struct workqueue_struct *vduse_irq_wq;
  
  static void vduse_virqfd_shutdown(struct work_struct *work)

  {
@@ -57,7 +58,7 @@ static int vduse_virqfd_wakeup(wait_queue_entry_t *wait, 
unsigned int mode,
__poll_t flags = key_to_poll(key);
  
  	if (flags & EPOLLIN)

-   schedule_work(>inject);
+   queue_work(vduse_irq_wq, >inject);
  
  	if (flags & EPOLLHUP) {

spin_lock(>irq_lock);
@@ -165,11 +166,18 @@ int vduse_virqfd_init(void)
if (!vduse_irqfd_cleanup_wq)
return -ENOMEM;
  
+	vduse_irq_wq = alloc_workqueue("vduse-irq", WQ_SYSFS | WQ_UNBOUND, 0);

+   if (!vduse_irq_wq) {
+   destroy_workqueue(vduse_irqfd_cleanup_wq);
+   return -ENOMEM;
+   }
+
return 0;
  }
  
  void vduse_virqfd_exit(void)

  {
+   destroy_workqueue(vduse_irq_wq);
destroy_workqueue(vduse_irqfd_cleanup_wq);
  }
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v3 10/11] vduse: grab the module's references until there is no vduse device

2021-01-26 Thread Jason Wang


On 2021/1/19 下午1:07, Xie Yongji wrote:

The module should not be unloaded if any vduse device exists.
So increase the module's reference count when creating vduse
device. And the reference count is kept until the device is
destroyed.

Signed-off-by: Xie Yongji 



Looks like a bug fix. If yes, let's squash this into patch 8.

Thanks



---
  drivers/vdpa/vdpa_user/vduse_dev.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 4d21203da5b6..003aeb281bce 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -978,6 +978,7 @@ static int vduse_destroy_dev(u32 id)
kfree(dev->vqs);
vduse_domain_destroy(dev->domain);
vduse_dev_destroy(dev);
+   module_put(THIS_MODULE);
  
  	return 0;

  }
@@ -1022,6 +1023,7 @@ static int vduse_create_dev(struct vduse_dev_config 
*config)
  
  	dev->connected = true;

list_add(>list, _devs);
+   __module_get(THIS_MODULE);
  
  	return fd;

  err_fd:


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v3 08/11] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-01-26 Thread Jason Wang


On 2021/1/19 下午1:07, Xie Yongji wrote:

This VDUSE driver enables implementing vDPA devices in userspace.
Both control path and data path of vDPA devices will be able to
be handled in userspace.

In the control path, the VDUSE driver will make use of message
mechnism to forward the config operation from vdpa bus driver
to userspace. Userspace can use read()/write() to receive/reply
those control messages.

In the data path, VDUSE_IOTLB_GET_FD ioctl will be used to get
the file descriptors referring to vDPA device's iova regions. Then
userspace can use mmap() to access those iova regions. Besides,
the eventfd mechanism is used to trigger interrupt callbacks and
receive virtqueue kicks in userspace.

Signed-off-by: Xie Yongji 
---
  Documentation/driver-api/vduse.rst |   85 ++
  Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
  drivers/vdpa/Kconfig   |7 +
  drivers/vdpa/Makefile  |1 +
  drivers/vdpa/vdpa_user/Makefile|5 +
  drivers/vdpa/vdpa_user/eventfd.c   |  221 
  drivers/vdpa/vdpa_user/eventfd.h   |   48 +
  drivers/vdpa/vdpa_user/iova_domain.c   |  426 +++
  drivers/vdpa/vdpa_user/iova_domain.h   |   68 ++
  drivers/vdpa/vdpa_user/vduse.h |   62 +
  drivers/vdpa/vdpa_user/vduse_dev.c | 1217 
  include/uapi/linux/vdpa.h  |1 +
  include/uapi/linux/vduse.h |  125 ++
  13 files changed, 2267 insertions(+)
  create mode 100644 Documentation/driver-api/vduse.rst
  create mode 100644 drivers/vdpa/vdpa_user/Makefile
  create mode 100644 drivers/vdpa/vdpa_user/eventfd.c
  create mode 100644 drivers/vdpa/vdpa_user/eventfd.h
  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h
  create mode 100644 drivers/vdpa/vdpa_user/vduse.h
  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
  create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/driver-api/vduse.rst 
b/Documentation/driver-api/vduse.rst
new file mode 100644
index ..9418a7f6646b
--- /dev/null
+++ b/Documentation/driver-api/vduse.rst
@@ -0,0 +1,85 @@
+==
+VDUSE - "vDPA Device in Userspace"
+==
+
+vDPA (virtio data path acceleration) device is a device that uses a
+datapath which complies with the virtio specifications with vendor
+specific control path. vDPA devices can be both physically located on
+the hardware or emulated by software. VDUSE is a framework that makes it
+possible to implement software-emulated vDPA devices in userspace.
+
+How VDUSE works
+
+Each userspace vDPA device is created by the VDUSE_CREATE_DEV ioctl on
+the VDUSE character device (/dev/vduse). Then a file descriptor pointing
+to the new resources will be returned, which can be used to implement the
+userspace vDPA device's control path and data path.
+
+To implement control path, the read/write operations to the file descriptor
+will be used to receive/reply the control messages from/to VDUSE driver.



It's better to document the protocol here. E.g the identifier stuffs.



+Those control messages are mostly based on the vdpa_config_ops which defines
+a unified interface to control different types of vDPA device.
+
+The following types of messages are provided by the VDUSE framework now:
+
+- VDUSE_SET_VQ_ADDR: Set the addresses of the different aspects of virtqueue.



"Set the vring address of a virtqueue" might be better here.



+
+- VDUSE_SET_VQ_NUM: Set the size of virtqueue
+
+- VDUSE_SET_VQ_READY: Set ready status of virtqueue
+
+- VDUSE_GET_VQ_READY: Get ready status of virtqueue
+
+- VDUSE_SET_VQ_STATE: Set the state (last_avail_idx) for virtqueue
+
+- VDUSE_GET_VQ_STATE: Get the state (last_avail_idx) for virtqueue



It's better not to mention layout specific stuffs here (last_avail_idx). 
Consider we should support packed virtqueue in the future.




+
+- VDUSE_SET_FEATURES: Set virtio features supported by the driver
+
+- VDUSE_GET_FEATURES: Get virtio features supported by the device
+
+- VDUSE_SET_STATUS: Set the device status
+
+- VDUSE_GET_STATUS: Get the device status
+
+- VDUSE_SET_CONFIG: Write to device specific configuration space
+
+- VDUSE_GET_CONFIG: Read from device specific configuration space
+
+- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping in device 
IOTLB
+
+Please see include/linux/vdpa.h for details.
+
+In the data path, vDPA device's iova regions will be mapped into userspace with
+the help of VDUSE_IOTLB_GET_FD ioctl on the userspace vDPA device fd:
+
+- VDUSE_IOTLB_GET_FD: get the file descriptor to iova region. Userspace can
+  access this iova region by passing the fd to mmap(2).
+
+Besides, the eventfd mechanism is used to trigger interrupt callbacks and
+receive