Re: [PATCH 0/2] virtio: Support encrypted memory on powerpc secure guests

2019-10-11 Thread Ram Pai
Hmm.. git-send-email forgot to CC  Michael Tsirkin, and Thiago; the
original author, who is on vacation.

Adding them now.
RP

On Fri, Oct 11, 2019 at 06:25:17PM -0700, Ram Pai wrote:
>  **We would like the patches to be merged through the virtio tree.  Please
>  review, and ack merging the DMA mapping change through that tree. Thanks!**
> 
>  The memory of powerpc secure guests can't be accessed by the hypervisor /
>  virtio device except for a few memory regions designated as 'shared'.
> 
>  At the moment, Linux uses bounce-buffering to communicate with the
>  hypervisor, with a bounce buffer marked as shared. This is how the DMA API
>  is implemented on this platform.
> 
>  In particular, the most convenient way to use virtio on this platform is by
>  making virtio use the DMA API: in fact, this is exactly what happens if the
>  virtio device exposes the flag VIRTIO_F_ACCESS_PLATFORM.  However, bugs in 
> the
>  hypervisor on the powerpc platform do not allow setting this flag, with some
>  hypervisors already in the field that don't set this flag. At the moment they
>  are forced to use emulated devices when guest is in secure mode; virtio is
>  only useful when guest is not secure.
> 
>  Normally, both device and driver must support VIRTIO_F_ACCESS_PLATFORM:
>  if one of them doesn't, the other mustn't assume it for communication
>  to work.
> 
>  However, a guest-side work-around is possible to enable virtio
>  for these hypervisors with guest in secure mode: it so happens that on
>  powerpc secure platform the DMA address is actually a physical address -
>  that of the bounce buffer. For these platforms we can make the virtio
>  driver go through the DMA API even though the device itself ignores
>  the DMA API.
> 
>  These patches implement this work around for virtio: we detect that
>  - secure guest mode is enabled - so we know that since we don't share
>most memory and Hypervisor has not enabled VIRTIO_F_ACCESS_PLATFORM,
>regular virtio code won't work.
>  - DMA API is giving us addresses that are actually also physical
>addresses.
>  - Hypervisor has not enabled VIRTIO_F_ACCESS_PLATFORM.
> 
>  and if all conditions are true, we force all data through the bounce
>  buffer.
> 
>  To put it another way, from hypervisor's point of view DMA API is
>  not required: hypervisor would be happy to get access to all of guest
>  memory. That's why it does not set VIRTIO_F_ACCESS_PLATFORM. However,
>  guest decides that it does not trust the hypervisor and wants to force
>  a bounce buffer for its own reasons.
> 
> 
> Thiago Jung Bauermann (2):
>   dma-mapping: Add dma_addr_is_phys_addr()
>   virtio_ring: Use DMA API if memory is encrypted
> 
>  arch/powerpc/include/asm/dma-mapping.h | 21 +
>  arch/powerpc/platforms/pseries/Kconfig |  1 +
>  drivers/virtio/virtio.c| 18 ++
>  drivers/virtio/virtio_ring.c   |  8 
>  include/linux/dma-mapping.h| 20 
>  include/linux/virtio_config.h  | 14 ++
>  kernel/dma/Kconfig |  3 +++
>  7 files changed, 85 insertions(+)
> 
> -- 
> 1.8.3.1

-- 
Ram Pai

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted

2019-10-11 Thread Ram Pai
From: Thiago Jung Bauermann 

Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must
be set by both device and guest driver. However, as a hack, when DMA API
returns physical addresses, guest driver can use the DMA API; even though
device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical
addresses.

Doing this works-around POWER secure guests for which only the bounce
buffer is accessible to the device, but which don't set
VIRTIO_F_IOMMU_PLATFORM due to a set of hypervisor and architectural bugs.
To guard against platform changes, breaking any of these assumptions down
the road, we check at probe time and fail if that's not the case.

cc: Benjamin Herrenschmidt 
cc: David Gibson 
cc: Michael Ellerman 
cc: Paul Mackerras 
cc: Michael Roth 
cc: Alexey Kardashevskiy 
cc: Jason Wang 
cc: Christoph Hellwig 
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
---
 drivers/virtio/virtio.c   | 18 ++
 drivers/virtio/virtio_ring.c  |  8 
 include/linux/virtio_config.h | 14 ++
 3 files changed, 40 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index a977e32..77a3baf 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Unique numbering for virtio devices. */
@@ -245,6 +246,23 @@ static int virtio_dev_probe(struct device *_d)
if (err)
goto err;
 
+   /*
+* If memory is encrypted, but VIRTIO_F_IOMMU_PLATFORM is not set, then
+* the device is broken: DMA API is required for these platforms, but
+* the only way using the DMA API is going to work at all is if the
+* device is ready for it. So we need a flag on the virtio device,
+* exposed by the hypervisor (or hardware for hw virtio devices) that
+* says: hey, I'm real, don't take a shortcut.
+*
+* There's one exception where guest can make things work, and that is
+* when DMA API is guaranteed to always return physical addresses.
+*/
+   if (mem_encrypt_active() && !virtio_can_use_dma_api(dev)) {
+   dev_err(_d, "virtio: device unable to access encrypted 
memory\n");
+   err = -EINVAL;
+   goto err;
+   }
+
err = drv->probe(dev);
if (err)
goto err;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8be1c4..9c56b61 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -255,6 +255,14 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
if (xen_domain())
return true;
 
+   /*
+* Also, if guest memory is encrypted the host can't access it
+* directly. We need to either use an IOMMU or do bounce buffering.
+* Both are done via the DMA API.
+*/
+   if (mem_encrypt_active() && virtio_can_use_dma_api(vdev))
+   return true;
+
return false;
 }
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index bb4cc49..57bc25c 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -174,6 +175,19 @@ static inline bool virtio_has_iommu_quirk(const struct 
virtio_device *vdev)
return !virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
 }
 
+/**
+ * virtio_can_use_dma_api - determine whether the DMA API can be used
+ * @vdev: the device
+ *
+ * The DMA API can be used either when the device doesn't have the IOMMU quirk,
+ * or when the DMA API is guaranteed to always return physical addresses.
+ */
+static inline bool virtio_can_use_dma_api(const struct virtio_device *vdev)
+{
+   return !virtio_has_iommu_quirk(vdev) ||
+  dma_addr_is_phys_addr(vdev->dev.parent);
+}
+
 static inline
 struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev,
vq_callback_t *c, const char *n)
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] dma-mapping: Add dma_addr_is_phys_addr()

2019-10-11 Thread Ram Pai
From: Thiago Jung Bauermann 

In order to safely use the DMA API, virtio needs to know whether DMA
addresses are in fact physical addresses and for that purpose,
dma_addr_is_phys_addr() is introduced.

cc: Benjamin Herrenschmidt 
cc: David Gibson 
cc: Michael Ellerman 
cc: Paul Mackerras 
cc: Michael Roth 
cc: Alexey Kardashevskiy 
cc: Paul Burton 
cc: Robin Murphy 
cc: Bartlomiej Zolnierkiewicz 
cc: Marek Szyprowski 
cc: Christoph Hellwig 
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/dma-mapping.h | 21 +
 arch/powerpc/platforms/pseries/Kconfig |  1 +
 include/linux/dma-mapping.h| 20 
 kernel/dma/Kconfig |  3 +++
 4 files changed, 45 insertions(+)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 565d6f7..f92c0a4b 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -5,6 +5,8 @@
 #ifndef _ASM_DMA_MAPPING_H
 #define _ASM_DMA_MAPPING_H
 
+#include 
+
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
/* We don't handle the NULL dev case for ISA for now. We could
@@ -15,4 +17,23 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return NULL;
 }
 
+#ifdef CONFIG_ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR
+/**
+ * dma_addr_is_phys_addr - check whether a device DMA address is a physical
+ * address
+ * @dev:   device to check
+ *
+ * Returns %true if any DMA address for this device happens to also be a valid
+ * physical address (not necessarily of the same page).
+ */
+static inline bool dma_addr_is_phys_addr(struct device *dev)
+{
+   /*
+* Secure guests always use the SWIOTLB, therefore DMA addresses are
+* actually the physical address of the bounce buffer.
+*/
+   return is_secure_guest();
+}
+#endif
+
 #endif /* _ASM_DMA_MAPPING_H */
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 9e35cdd..0108150 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -152,6 +152,7 @@ config PPC_SVM
select SWIOTLB
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR
help
 There are certain POWER platforms which support secure guests using
 the Protected Execution Facility, with the help of an Ultravisor
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f7d1eea..6df5664 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -693,6 +693,26 @@ static inline bool dma_addressing_limited(struct device 
*dev)
dma_get_required_mask(dev);
 }
 
+#ifndef CONFIG_ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR
+/**
+ * dma_addr_is_phys_addr - check whether a device DMA address is a physical
+ * address
+ * @dev:   device to check
+ *
+ * Returns %true if any DMA address for this device happens to also be a valid
+ * physical address (not necessarily of the same page).
+ */
+static inline bool dma_addr_is_phys_addr(struct device *dev)
+{
+   /*
+* Except in very specific setups, DMA addresses exist in a different
+* address space from CPU physical addresses and cannot be directly used
+* to reference system memory.
+*/
+   return false;
+}
+#endif
+
 #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 9decbba..6209b46 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -51,6 +51,9 @@ config ARCH_HAS_DMA_MMAP_PGPROT
 config ARCH_HAS_FORCE_DMA_UNENCRYPTED
bool
 
+config ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR
+   bool
+
 config DMA_NONCOHERENT_CACHE_SYNC
bool
 
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] virtio: Support encrypted memory on powerpc secure guests

2019-10-11 Thread Ram Pai
 **We would like the patches to be merged through the virtio tree.  Please
 review, and ack merging the DMA mapping change through that tree. Thanks!**

 The memory of powerpc secure guests can't be accessed by the hypervisor /
 virtio device except for a few memory regions designated as 'shared'.
 
 At the moment, Linux uses bounce-buffering to communicate with the
 hypervisor, with a bounce buffer marked as shared. This is how the DMA API
 is implemented on this platform.
 
 In particular, the most convenient way to use virtio on this platform is by
 making virtio use the DMA API: in fact, this is exactly what happens if the
 virtio device exposes the flag VIRTIO_F_ACCESS_PLATFORM.  However, bugs in the
 hypervisor on the powerpc platform do not allow setting this flag, with some
 hypervisors already in the field that don't set this flag. At the moment they
 are forced to use emulated devices when guest is in secure mode; virtio is
 only useful when guest is not secure.
 
 Normally, both device and driver must support VIRTIO_F_ACCESS_PLATFORM:
 if one of them doesn't, the other mustn't assume it for communication
 to work.
 
 However, a guest-side work-around is possible to enable virtio
 for these hypervisors with guest in secure mode: it so happens that on
 powerpc secure platform the DMA address is actually a physical address -
 that of the bounce buffer. For these platforms we can make the virtio
 driver go through the DMA API even though the device itself ignores
 the DMA API.
 
 These patches implement this work around for virtio: we detect that
 - secure guest mode is enabled - so we know that since we don't share
   most memory and Hypervisor has not enabled VIRTIO_F_ACCESS_PLATFORM,
   regular virtio code won't work.
 - DMA API is giving us addresses that are actually also physical
   addresses.
 - Hypervisor has not enabled VIRTIO_F_ACCESS_PLATFORM.
 
 and if all conditions are true, we force all data through the bounce
 buffer.
 
 To put it another way, from hypervisor's point of view DMA API is
 not required: hypervisor would be happy to get access to all of guest
 memory. That's why it does not set VIRTIO_F_ACCESS_PLATFORM. However,
 guest decides that it does not trust the hypervisor and wants to force
 a bounce buffer for its own reasons.


Thiago Jung Bauermann (2):
  dma-mapping: Add dma_addr_is_phys_addr()
  virtio_ring: Use DMA API if memory is encrypted

 arch/powerpc/include/asm/dma-mapping.h | 21 +
 arch/powerpc/platforms/pseries/Kconfig |  1 +
 drivers/virtio/virtio.c| 18 ++
 drivers/virtio/virtio_ring.c   |  8 
 include/linux/dma-mapping.h| 20 
 include/linux/virtio_config.h  | 14 ++
 kernel/dma/Kconfig |  3 +++
 7 files changed, 85 insertions(+)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[vhost:vhost 6/6] drivers/vhost/vhost.c:2672:9: error: 'desc' undeclared; did you mean 'rdtsc'?

2019-10-11 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost
head:   2b91e7f7cb5a6f12a2f43b200cb2d65a218237ed
commit: 2b91e7f7cb5a6f12a2f43b200cb2d65a218237ed [6/6] vhost: batching fetches
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
git checkout 2b91e7f7cb5a6f12a2f43b200cb2d65a218237ed
# save the attached .config to linux build tree
make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   drivers/vhost/vhost.c: In function 'vhost_get_vq_desc_batch':
>> drivers/vhost/vhost.c:2672:9: error: 'desc' undeclared (first use in this 
>> function); did you mean 'rdtsc'?
  if (!(desc->flags & VRING_DESC_F_NEXT))
^~~~
rdtsc
   drivers/vhost/vhost.c:2672:9: note: each undeclared identifier is reported 
only once for each function it appears in

vim +2672 drivers/vhost/vhost.c

  2569  
  2570  /* This looks in the virtqueue and for the first available buffer, and 
converts
  2571   * it to an iovec for convenient access.  Since descriptors consist of 
some
  2572   * number of output then some number of input descriptors, it's 
actually two
  2573   * iovecs, but we pack them into one and note how many of each there 
were.
  2574   *
  2575   * This function returns the descriptor number found, or vq->num (which 
is
  2576   * never a valid descriptor number) if none was found.  A negative code 
is
  2577   * returned on error. */
  2578  int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
  2579struct iovec iov[], unsigned int iov_size,
  2580unsigned int *out_num, unsigned int *in_num,
  2581struct vhost_log *log, unsigned int *log_num)
  2582  {
  2583  int ret = fetch_descs(vq);
  2584  struct vhost_desc *last;
  2585  u16 id;
  2586  int i;
  2587  
  2588  if (ret)
  2589  return ret;
  2590  
  2591  /* Note: indirect descriptors are not batched */
  2592  /* TODO: batch up to a limit */
  2593  last = peek_split_desc(vq);
  2594  id = last->id;
  2595  
  2596  if (last->flags & VRING_DESC_F_INDIRECT) {
  2597  int r;
  2598  
  2599  pop_split_desc(vq);
  2600  r = fetch_indirect_descs(vq, last, id);
  2601  if (unlikely(r < 0)) {
  2602  if (r != -EAGAIN)
  2603  vq_err(vq, "Failure detected "
  2604 "in indirect descriptor 
at idx %d\n", id);
  2605  return ret;
  2606  }
  2607  }
  2608  
  2609  /* Now convert to IOV */
  2610  /* When we start there are none of either input nor output. */
  2611  *out_num = *in_num = 0;
  2612  if (unlikely(log))
  2613  *log_num = 0;
  2614  
  2615  for (i = vq->first_desc; i < vq->ndescs; ++i) {
  2616  unsigned iov_count = *in_num + *out_num;
  2617  struct vhost_desc *desc = >descs[i];
  2618  int access;
  2619  
  2620  if (desc->flags & ~VHOST_DESC_FLAGS) {
  2621  vq_err(vq, "Unexpected flags: 0x%x at 
descriptor id 0x%x\n",
  2622 desc->flags, desc->id);
  2623  ret = -EINVAL;
  2624  goto err;
  2625  }
  2626  if (desc->flags & VRING_DESC_F_WRITE)
  2627  access = VHOST_ACCESS_WO;
  2628  else
  2629  access = VHOST_ACCESS_RO;
  2630  ret = translate_desc(vq, desc->addr,
  2631   desc->len, iov + iov_count,
  2632   iov_size - iov_count, access);
  2633  if (unlikely(ret < 0)) {
  2634  if (ret != -EAGAIN)
  2635  vq_err(vq, "Translation failure %d 
descriptor idx %d\n",
  2636  ret, i);
  2637  goto err;
  2638  }
  2639  if (access == VHOST_ACCESS_WO) {
  2640  /* If this is an input descriptor,
  2641   * increment that count. */
  2642  *in_num += ret;
  2643  if (unlikely(log && ret)) {
  2644  log[*log_num].addr = desc->addr;
  2645  log[*log_num].len = desc->len;
  2646  ++*log_num;
  2647  }
  2648  } else {
  

[vhost:vhost 6/6] drivers/vhost/vhost.c:2672:9: error: 'desc' undeclared

2019-10-11 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost
head:   2b91e7f7cb5a6f12a2f43b200cb2d65a218237ed
commit: 2b91e7f7cb5a6f12a2f43b200cb2d65a218237ed [6/6] vhost: batching fetches
config: mips-allmodconfig (attached as .config)
compiler: mips-linux-gcc (GCC) 7.4.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 2b91e7f7cb5a6f12a2f43b200cb2d65a218237ed
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   drivers/vhost/vhost.c: In function 'vhost_get_vq_desc_batch':
>> drivers/vhost/vhost.c:2672:9: error: 'desc' undeclared (first use in this 
>> function)
  if (!(desc->flags & VRING_DESC_F_NEXT))
^~~~
   drivers/vhost/vhost.c:2672:9: note: each undeclared identifier is reported 
only once for each function it appears in

vim +/desc +2672 drivers/vhost/vhost.c

  2569  
  2570  /* This looks in the virtqueue and for the first available buffer, and 
converts
  2571   * it to an iovec for convenient access.  Since descriptors consist of 
some
  2572   * number of output then some number of input descriptors, it's 
actually two
  2573   * iovecs, but we pack them into one and note how many of each there 
were.
  2574   *
  2575   * This function returns the descriptor number found, or vq->num (which 
is
  2576   * never a valid descriptor number) if none was found.  A negative code 
is
  2577   * returned on error. */
  2578  int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
  2579struct iovec iov[], unsigned int iov_size,
  2580unsigned int *out_num, unsigned int *in_num,
  2581struct vhost_log *log, unsigned int *log_num)
  2582  {
  2583  int ret = fetch_descs(vq);
  2584  struct vhost_desc *last;
  2585  u16 id;
  2586  int i;
  2587  
  2588  if (ret)
  2589  return ret;
  2590  
  2591  /* Note: indirect descriptors are not batched */
  2592  /* TODO: batch up to a limit */
  2593  last = peek_split_desc(vq);
  2594  id = last->id;
  2595  
  2596  if (last->flags & VRING_DESC_F_INDIRECT) {
  2597  int r;
  2598  
  2599  pop_split_desc(vq);
  2600  r = fetch_indirect_descs(vq, last, id);
  2601  if (unlikely(r < 0)) {
  2602  if (r != -EAGAIN)
  2603  vq_err(vq, "Failure detected "
  2604 "in indirect descriptor 
at idx %d\n", id);
  2605  return ret;
  2606  }
  2607  }
  2608  
  2609  /* Now convert to IOV */
  2610  /* When we start there are none of either input nor output. */
  2611  *out_num = *in_num = 0;
  2612  if (unlikely(log))
  2613  *log_num = 0;
  2614  
  2615  for (i = vq->first_desc; i < vq->ndescs; ++i) {
  2616  unsigned iov_count = *in_num + *out_num;
  2617  struct vhost_desc *desc = >descs[i];
  2618  int access;
  2619  
  2620  if (desc->flags & ~VHOST_DESC_FLAGS) {
  2621  vq_err(vq, "Unexpected flags: 0x%x at 
descriptor id 0x%x\n",
  2622 desc->flags, desc->id);
  2623  ret = -EINVAL;
  2624  goto err;
  2625  }
  2626  if (desc->flags & VRING_DESC_F_WRITE)
  2627  access = VHOST_ACCESS_WO;
  2628  else
  2629  access = VHOST_ACCESS_RO;
  2630  ret = translate_desc(vq, desc->addr,
  2631   desc->len, iov + iov_count,
  2632   iov_size - iov_count, access);
  2633  if (unlikely(ret < 0)) {
  2634  if (ret != -EAGAIN)
  2635  vq_err(vq, "Translation failure %d 
descriptor idx %d\n",
  2636  ret, i);
  2637  goto err;
  2638  }
  2639  if (access == VHOST_ACCESS_WO) {
  2640  /* If this is an input descriptor,
  2641   * increment that count. */
  2642  *in_num += ret;
  2643  if (unlikely(log && ret)) {
  2644  log[*log_num].addr = desc->addr;
  2645  log[*log_num].len = 

Re: [PATCH net 1/2] vsock: add half-closed socket details in the implementation notes

2019-10-11 Thread Stefano Garzarella
On Fri, Oct 11, 2019 at 10:22:30AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 11, 2019 at 03:07:57PM +0200, Stefano Garzarella wrote:
> > vmci_transport never allowed half-closed socket on the host side.
> > Since we want to have the same behaviour across all transports, we
> > add a section in the "Implementation notes".
> > 
> > Cc: Jorgen Hansen 
> > Cc: Adit Ranadive 
> > Signed-off-by: Stefano Garzarella 
> > ---
> >  net/vmw_vsock/af_vsock.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 2ab43b2bba31..27df57c2024b 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -83,6 +83,10 @@
> >   *   TCP_ESTABLISHED - connected
> >   *   TCP_CLOSING - disconnecting
> >   *   TCP_LISTEN - listening
> > + *
> > + * - Half-closed socket is supported only on the guest side. recv() on the 
> > host
> > + * side should return EOF when the guest closes a connection, also if some
> > + * data is still in the receive queue.
> >   */
> >  
> >  #include 
> 
> That's a great way to lose data in a way that's hard to debug.
> 
> VMCI sockets connect to a hypervisor so there's tight control
> of what the hypervisor can do.
> 
> But vhost vsocks connect to a fully fledged Linux, so
> you can't assume this is safe. And application authors do not read
> kernel source.

Thanks for explaining.
Discard this patch, I'll try to add a getsockopt() to allow the tests
(and applications) to understand if half-closed socket is supported or not.

Stefano
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 0/2] vsock: don't allow half-closed socket in the host transports

2019-10-11 Thread Stefano Garzarella
On Fri, Oct 11, 2019 at 10:19:13AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 11, 2019 at 03:07:56PM +0200, Stefano Garzarella wrote:
> > We are implementing a test suite for the VSOCK sockets and we discovered
> > that vmci_transport never allowed half-closed socket on the host side.
> > 
> > As Jorgen explained [1] this is due to the implementation of VMCI.
> > 
> > Since we want to have the same behaviour across all transports, this
> > series adds a section in the "Implementation notes" to exaplain this
> > behaviour, and changes the vhost_transport to behave the same way.
> > 
> > [1] https://patchwork.ozlabs.org/cover/847998/#1831400
> 
> Half closed sockets are very useful, and lots of
> applications use tricks to swap a vsock for a tcp socket,
> which might as a result break.

Got it!

> 
> If VMCI really cares it can implement an ioctl to
> allow applications to detect that half closed sockets aren't supported.
> 
> It does not look like VMCI wants to bother (users do not read
> kernel implementation notes) so it does not really care.
> So why do we want to cripple other transports intentionally?

The main reason is that we are developing the test suite and we noticed
the miss match. Since we want to make sure that applications behave in
the same way on different transports, we thought we would solve it that
way.

But what you are saying (also in the reply of the patches) is actually
quite right. Not being publicized, applications do not expect this behavior,
so please discard this series.

My problem during the tests, was trying to figure out if half-closed
sockets were supported or not, so as you say adding an IOCTL or maybe
better a getsockopt() could solve the problem.

What do you think?

Thanks,
Stefano
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 2/2] vhost/vsock: don't allow half-closed socket in the host

2019-10-11 Thread Stefano Garzarella
On Fri, Oct 11, 2019 at 10:26:34AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 11, 2019 at 03:07:58PM +0200, Stefano Garzarella wrote:
> > vmci_transport never allowed half-closed socket on the host side.
> > In order to provide the same behaviour, we changed the
> > vhost_transport_stream_has_data() to return 0 (no data available)
> > if the peer (guest) closed the connection.
> > 
> > Signed-off-by: Stefano Garzarella 
> 
> I don't think we should copy bugs like this.
> Applications don't actually depend on this VMCI limitation, in fact
> it looks like a working application can get broken by this.
> 
> So this looks like a userspace visible ABI change
> which we can't really do.
> 
> If it turns out some application cares, it can always
> fully close the connection. Or add an ioctl so the application
> can find out whether half close works.
> 

I got your point.
Discard this patch.

Thanks,
Stefano
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 2/2] vhost/vsock: don't allow half-closed socket in the host

2019-10-11 Thread Michael S. Tsirkin
On Fri, Oct 11, 2019 at 03:07:58PM +0200, Stefano Garzarella wrote:
> vmci_transport never allowed half-closed socket on the host side.
> In order to provide the same behaviour, we changed the
> vhost_transport_stream_has_data() to return 0 (no data available)
> if the peer (guest) closed the connection.
> 
> Signed-off-by: Stefano Garzarella 

I don't think we should copy bugs like this.
Applications don't actually depend on this VMCI limitation, in fact
it looks like a working application can get broken by this.

So this looks like a userspace visible ABI change
which we can't really do.

If it turns out some application cares, it can always
fully close the connection. Or add an ioctl so the application
can find out whether half close works.

> ---
>  drivers/vhost/vsock.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 9f57736fe15e..754120aa4478 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -58,6 +58,21 @@ static u32 vhost_transport_get_local_cid(void)
>   return VHOST_VSOCK_DEFAULT_HOST_CID;
>  }
>  
> +static s64 vhost_transport_stream_has_data(struct vsock_sock *vsk)
> +{
> + /* vmci_transport doesn't allow half-closed socket on the host side.
> +  * recv() on the host side returns EOF when the guest closes a
> +  * connection, also if some data is still in the receive queue.
> +  *
> +  * In order to provide the same behaviour, we always return 0
> +  * (no data available) if the peer (guest) closed the connection.
> +  */
> + if (vsk->peer_shutdown == SHUTDOWN_MASK)
> + return 0;
> +
> + return virtio_transport_stream_has_data(vsk);
> +}
> +
>  /* Callers that dereference the return value must hold vhost_vsock_mutex or 
> the
>   * RCU read lock.
>   */
> @@ -804,7 +819,7 @@ static struct virtio_transport vhost_transport = {
>  
>   .stream_enqueue   = virtio_transport_stream_enqueue,
>   .stream_dequeue   = virtio_transport_stream_dequeue,
> - .stream_has_data  = virtio_transport_stream_has_data,
> + .stream_has_data  = vhost_transport_stream_has_data,
>   .stream_has_space = virtio_transport_stream_has_space,
>   .stream_rcvhiwat  = virtio_transport_stream_rcvhiwat,
>   .stream_is_active = virtio_transport_stream_is_active,
> -- 
> 2.21.0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 1/2] vsock: add half-closed socket details in the implementation notes

2019-10-11 Thread Michael S. Tsirkin
On Fri, Oct 11, 2019 at 03:07:57PM +0200, Stefano Garzarella wrote:
> vmci_transport never allowed half-closed socket on the host side.
> Since we want to have the same behaviour across all transports, we
> add a section in the "Implementation notes".
> 
> Cc: Jorgen Hansen 
> Cc: Adit Ranadive 
> Signed-off-by: Stefano Garzarella 
> ---
>  net/vmw_vsock/af_vsock.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 2ab43b2bba31..27df57c2024b 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -83,6 +83,10 @@
>   *   TCP_ESTABLISHED - connected
>   *   TCP_CLOSING - disconnecting
>   *   TCP_LISTEN - listening
> + *
> + * - Half-closed socket is supported only on the guest side. recv() on the 
> host
> + * side should return EOF when the guest closes a connection, also if some
> + * data is still in the receive queue.
>   */
>  
>  #include 

That's a great way to lose data in a way that's hard to debug.

VMCI sockets connect to a hypervisor so there's tight control
of what the hypervisor can do.

But vhost vsocks connect to a fully fledged Linux, so
you can't assume this is safe. And application authors do not read
kernel source.

> -- 
> 2.21.0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 0/2] vsock: don't allow half-closed socket in the host transports

2019-10-11 Thread Michael S. Tsirkin
On Fri, Oct 11, 2019 at 03:07:56PM +0200, Stefano Garzarella wrote:
> We are implementing a test suite for the VSOCK sockets and we discovered
> that vmci_transport never allowed half-closed socket on the host side.
> 
> As Jorgen explained [1] this is due to the implementation of VMCI.
> 
> Since we want to have the same behaviour across all transports, this
> series adds a section in the "Implementation notes" to exaplain this
> behaviour, and changes the vhost_transport to behave the same way.
> 
> [1] https://patchwork.ozlabs.org/cover/847998/#1831400

Half closed sockets are very useful, and lots of
applications use tricks to swap a vsock for a tcp socket,
which might as a result break.

If VMCI really cares it can implement an ioctl to
allow applications to detect that half closed sockets aren't supported.

It does not look like VMCI wants to bother (users do not read
kernel implementation notes) so it does not really care.
So why do we want to cripple other transports intentionally?



> Stefano Garzarella (2):
>   vsock: add half-closed socket details in the implementation notes
>   vhost/vsock: don't allow half-closed socket in the host
> 
>  drivers/vhost/vsock.c| 17 -
>  net/vmw_vsock/af_vsock.c |  4 
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> -- 
> 2.21.0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

2019-10-11 Thread Michael S. Tsirkin
On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin  wrote:
> > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi 
> > > > > Signed-off-by: Stefano Garzarella 
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > how to address the issue completely.
> > >
> > > I did a better analysis of the credit mechanism that we implemented in
> > > virtio-vsock to get a clearer view and I'd share it with you:
> > >
> > > This issue affect only the "host->guest" path. In this case, when the
> > > host wants to send a packet to the guest, it uses a "free" buffer
> > > allocated by the guest (4KB).
> > > The "free" buffers available for the host are shared between all
> > > sockets, instead, the credit mechanism is per-socket, I think to
> > > avoid the starvation of others sockets.
> > > The guests re-fill the "free" queue when the available buffers are
> > > less than half.
> > >
> > > Each peer have these variables in the per-socket state:
> > >/* local vars */
> > >buf_alloc/* max bytes usable by this socket
> > >[exposed to the other peer] */
> > >fwd_cnt  /* increased when RX packet is consumed by the
> > >user space [exposed to the other peer] */
> > >tx_cnt /* increased when TX packet is sent to the 
> > > other peer */
> > >
> > >/* remote vars  */
> > >peer_buf_alloc   /* peer's buf_alloc */
> > >peer_fwd_cnt /* peer's fwd_cnt */
> > >
> > > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > > receiver consumes the packet (copy it to the user-space buffer), it
> > > increases the 'fwd_cnt'.
> > > Note: increments are made considering the payload length and not the
> > > buffer length.
> > >
> > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > > all packet headers or with an explicit CREDIT_UPDATE packet.
> > >
> > > The local 'buf_alloc' value can be modified by the user space using
> > > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > >
> > > Before to send a packet, the peer checks the space available:
> > >   credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > > and it will send up to credit_available bytes to the other peer.
> > >
> > > Possible solutions considering Michael's advice:
> > > 1. Use the buffer length instead of the payload length when we increment
> > >the counters:
> > >   - This approach will account precisely the memory used per socket.
> > >   - This requires changes in both guest and host.
> > >   - It is not compatible with old drivers, so a feature should be 
> > > negotiated.
> > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > >the socket queue but not used. (e.g. 256 byte used on 4K available in
> > >the buffer)
> > >   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > >   - This should be compatible also with old drivers.
> > >
> > > Maybe the second is less invasive, but will it be too tricky?
> > > Any other advice or suggestions?
> > >
> > > Thanks in advance,
> > > Stefano
> >
> > OK let me try to clarify.  The idea is this:
> >
> > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> > means that in the worst case (128 byte packets), each byte of credit in
> > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > to also account for the 

[PATCH RFC v1 2/2] vhost: batching fetches

2019-10-11 Thread Michael S. Tsirkin
With this patch applied, new and old code perform identically.

Lots of extra optimizations are now possible, e.g.
we can fetch multiple heads with copy_from/to_user now.
We can get rid of maintaining the log array.  Etc etc.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/test.c  |  2 +-
 drivers/vhost/vhost.c | 50 ---
 drivers/vhost/vhost.h |  4 +++-
 3 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 39a018a7af2d..e3a8e9db22cd 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -128,7 +128,7 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64,
   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT);
 
f->private_data = n;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 36661d6cb51f..aa383e847865 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -302,6 +302,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 {
vq->num = 1;
vq->ndescs = 0;
+   vq->first_desc = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -390,6 +391,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
vq->max_descs = dev->iov_limit;
+   vq->batch_descs = dev->iov_limit - UIO_MAXIOV;
vq->descs = kmalloc_array(vq->max_descs,
  sizeof(*vq->descs),
  GFP_KERNEL);
@@ -2366,6 +2368,8 @@ static void pop_split_desc(struct vhost_virtqueue *vq)
--vq->ndescs;
 }
 
+#define VHOST_DESC_FLAGS (VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE | \
+ VRING_DESC_F_NEXT)
 static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc 
*desc, u16 id)
 {
struct vhost_desc *h;
@@ -2375,7 +2379,7 @@ static int push_split_desc(struct vhost_virtqueue *vq, 
struct vring_desc *desc,
h = >descs[vq->ndescs++];
h->addr = vhost64_to_cpu(vq, desc->addr);
h->len = vhost32_to_cpu(vq, desc->len);
-   h->flags = vhost16_to_cpu(vq, desc->flags);
+   h->flags = vhost16_to_cpu(vq, desc->flags) & VHOST_DESC_FLAGS;
h->id = id;
 
return 0;
@@ -2450,7 +2454,7 @@ static int fetch_indirect_descs(struct vhost_virtqueue 
*vq,
return 0;
 }
 
-static int fetch_descs(struct vhost_virtqueue *vq)
+static int fetch_buf(struct vhost_virtqueue *vq)
 {
struct vring_desc desc;
unsigned int i, head, found = 0;
@@ -2462,7 +2466,11 @@ static int fetch_descs(struct vhost_virtqueue *vq)
/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vq->last_avail_idx;
 
-   if (vq->avail_idx == vq->last_avail_idx) {
+   if (unlikely(vq->avail_idx == vq->last_avail_idx)) {
+   /* If we already have work to do, don't bother re-checking. */
+   if (likely(vq->ndescs))
+   return vq->num;
+
if (unlikely(vhost_get_avail_idx(vq, _idx))) {
vq_err(vq, "Failed to access avail idx at %p\n",
>avail->idx);
@@ -2541,6 +2549,24 @@ static int fetch_descs(struct vhost_virtqueue *vq)
return 0;
 }
 
+static int fetch_descs(struct vhost_virtqueue *vq)
+{
+   int ret = 0;
+
+   if (unlikely(vq->first_desc >= vq->ndescs)) {
+   vq->first_desc = 0;
+   vq->ndescs = 0;
+   }
+
+   if (vq->ndescs)
+   return 0;
+
+   while (!ret && vq->ndescs <= vq->batch_descs)
+   ret = fetch_buf(vq);
+
+   return vq->ndescs ? 0 : ret;
+}
+
 /* This looks in the virtqueue and for the first available buffer, and converts
  * it to an iovec for convenient access.  Since descriptors consist of some
  * number of output then some number of input descriptors, it's actually two
@@ -2562,6 +2588,8 @@ int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
if (ret)
return ret;
 
+   /* Note: indirect descriptors are not batched */
+   /* TODO: batch up to a limit */
last = peek_split_desc(vq);
id = last->id;
 
@@ -2584,12 +2612,12 @@ int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq,
if (unlikely(log))
*log_num = 0;
 
-   for (i = 0; i < vq->ndescs; ++i) {
+   for (i = vq->first_desc; i < vq->ndescs; ++i) {
unsigned iov_count = *in_num + *out_num;
struct vhost_desc *desc = >descs[i];
int access;
 
-   if (desc->flags & 

[PATCH RFC v1 0/2] vhost: ring format independence

2019-10-11 Thread Michael S. Tsirkin
So the idea is as follows: we convert descriptors to an
independent format first, and process that converting to
iov later.

The point is that we have a tight loop that fetches
descriptors, which is good for cache utilization.
This will also allow all kind of batching tricks -
e.g. it seems possible to keep SMAP disabled while
we are fetching multiple descriptors.

And perhaps more importantly, this is a very good fit for the packed
ring layout, where we get and put descriptors in order.

This patchset seems to already perform exactly the same as the original
code already based on a microbenchmark.  More testing would be very much
appreciated.

Biggest TODO before this first step is ready to go in is to
batch indirect descriptors as well.

Integrating into vhost-net is basically
s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ -
or add a module parameter like I did in the test module.



Michael S. Tsirkin (2):
  vhost: option to fetch descriptors through an independent struct
  vhost: batching fetches

 drivers/vhost/test.c  |  19 ++-
 drivers/vhost/vhost.c | 333 +-
 drivers/vhost/vhost.h |  20 ++-
 3 files changed, 365 insertions(+), 7 deletions(-)

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v1 1/2] vhost: option to fetch descriptors through an independent struct

2019-10-11 Thread Michael S. Tsirkin
The idea is to support multiple ring formats by converting
to a format-independent array of descriptors.

This costs extra cycles, but we gain in ability
to fetch a batch of descriptors in one go, which
is good for code cache locality.

To simplify benchmarking, I kept the old code
around so one can switch back and forth by
writing into a module parameter.
This will go away in the final submission.

This patch causes a minor performance degradation,
it's been kept as simple as possible for ease of review.
Next patch gets us back the performance by adding batching.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/test.c  |  17 ++-
 drivers/vhost/vhost.c | 299 +-
 drivers/vhost/vhost.h |  16 +++
 3 files changed, 327 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 056308008288..39a018a7af2d 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -18,6 +18,9 @@
 #include "test.h"
 #include "vhost.h"
 
+static int newcode = 0;
+module_param(newcode, int, 0644);
+
 /* Max number of bytes transferred before requeueing the job.
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_TEST_WEIGHT 0x8
@@ -58,10 +61,16 @@ static void handle_vq(struct vhost_test *n)
vhost_disable_notify(>dev, vq);
 
for (;;) {
-   head = vhost_get_vq_desc(vq, vq->iov,
-ARRAY_SIZE(vq->iov),
-, ,
-NULL, NULL);
+   if (newcode)
+   head = vhost_get_vq_desc_batch(vq, vq->iov,
+  ARRAY_SIZE(vq->iov),
+  , ,
+  NULL, NULL);
+   else
+   head = vhost_get_vq_desc(vq, vq->iov,
+ARRAY_SIZE(vq->iov),
+, ,
+NULL, NULL);
/* On error, stop handling until the next kick. */
if (unlikely(head < 0))
break;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 36ca2cf419bf..36661d6cb51f 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -301,6 +301,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
   struct vhost_virtqueue *vq)
 {
vq->num = 1;
+   vq->ndescs = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -369,6 +370,9 @@ static int vhost_worker(void *data)
 
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 {
+   kfree(vq->descs);
+   vq->descs = NULL;
+   vq->max_descs = 0;
kfree(vq->indirect);
vq->indirect = NULL;
kfree(vq->log);
@@ -385,6 +389,10 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
 
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
+   vq->max_descs = dev->iov_limit;
+   vq->descs = kmalloc_array(vq->max_descs,
+ sizeof(*vq->descs),
+ GFP_KERNEL);
vq->indirect = kmalloc_array(UIO_MAXIOV,
 sizeof(*vq->indirect),
 GFP_KERNEL);
@@ -392,7 +400,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
GFP_KERNEL);
vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
  GFP_KERNEL);
-   if (!vq->indirect || !vq->log || !vq->heads)
+   if (!vq->indirect || !vq->log || !vq->heads || !vq->descs)
goto err_nomem;
}
return 0;
@@ -2346,6 +2354,295 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+static struct vhost_desc *peek_split_desc(struct vhost_virtqueue *vq)
+{
+   BUG_ON(!vq->ndescs);
+   return >descs[vq->ndescs - 1];
+}
+
+static void pop_split_desc(struct vhost_virtqueue *vq)
+{
+   BUG_ON(!vq->ndescs);
+   --vq->ndescs;
+}
+
+static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc 
*desc, u16 id)
+{
+   struct vhost_desc *h;
+
+   if (unlikely(vq->ndescs >= vq->max_descs))
+   return -EINVAL;
+   h = >descs[vq->ndescs++];
+   h->addr = vhost64_to_cpu(vq, desc->addr);
+   h->len = vhost32_to_cpu(vq, desc->len);
+   h->flags = vhost16_to_cpu(vq, desc->flags);
+   h->id = id;
+
+   return 0;
+}
+
+static int fetch_indirect_descs(struct vhost_virtqueue *vq,
+   struct vhost_desc *indirect,
+   u16 head)
+{
+   

Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

2019-10-11 Thread Stefano Garzarella
On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin  wrote:
> On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > >
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > >
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > >
> > > > Reviewed-by: Stefan Hajnoczi 
> > > > Signed-off-by: Stefano Garzarella 
> > >
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > >
> >
> > Since I'm back from holidays, I'm restarting this thread to figure out
> > how to address the issue completely.
> >
> > I did a better analysis of the credit mechanism that we implemented in
> > virtio-vsock to get a clearer view and I'd share it with you:
> >
> > This issue affect only the "host->guest" path. In this case, when the
> > host wants to send a packet to the guest, it uses a "free" buffer
> > allocated by the guest (4KB).
> > The "free" buffers available for the host are shared between all
> > sockets, instead, the credit mechanism is per-socket, I think to
> > avoid the starvation of others sockets.
> > The guests re-fill the "free" queue when the available buffers are
> > less than half.
> >
> > Each peer have these variables in the per-socket state:
> >/* local vars */
> >buf_alloc/* max bytes usable by this socket
> >[exposed to the other peer] */
> >fwd_cnt  /* increased when RX packet is consumed by the
> >user space [exposed to the other peer] */
> >tx_cnt /* increased when TX packet is sent to the 
> > other peer */
> >
> >/* remote vars  */
> >peer_buf_alloc   /* peer's buf_alloc */
> >peer_fwd_cnt /* peer's fwd_cnt */
> >
> > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > receiver consumes the packet (copy it to the user-space buffer), it
> > increases the 'fwd_cnt'.
> > Note: increments are made considering the payload length and not the
> > buffer length.
> >
> > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > all packet headers or with an explicit CREDIT_UPDATE packet.
> >
> > The local 'buf_alloc' value can be modified by the user space using
> > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> >
> > Before to send a packet, the peer checks the space available:
> >   credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > and it will send up to credit_available bytes to the other peer.
> >
> > Possible solutions considering Michael's advice:
> > 1. Use the buffer length instead of the payload length when we increment
> >the counters:
> >   - This approach will account precisely the memory used per socket.
> >   - This requires changes in both guest and host.
> >   - It is not compatible with old drivers, so a feature should be 
> > negotiated.
> > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> >the socket queue but not used. (e.g. 256 byte used on 4K available in
> >the buffer)
> >   - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> >   - This should be compatible also with old drivers.
> >
> > Maybe the second is less invasive, but will it be too tricky?
> > Any other advice or suggestions?
> >
> > Thanks in advance,
> > Stefano
>
> OK let me try to clarify.  The idea is this:
>
> Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
> means that in the worst case (128 byte packets), each byte of credit in
> the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> to also account for the virtio_vsock_pkt since I think it's kept around
> until userspace consumes it.
>
> Thus given X buf alloc allowed in the socket, we should publish X/16
> credits to the other side. This will ensure the other side does not send
> more than X/16 bytes for a given socket and 

[PATCH net 1/2] vsock: add half-closed socket details in the implementation notes

2019-10-11 Thread Stefano Garzarella
vmci_transport never allowed half-closed socket on the host side.
Since we want to have the same behaviour across all transports, we
add a section in the "Implementation notes".

Cc: Jorgen Hansen 
Cc: Adit Ranadive 
Signed-off-by: Stefano Garzarella 
---
 net/vmw_vsock/af_vsock.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2ab43b2bba31..27df57c2024b 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -83,6 +83,10 @@
  *   TCP_ESTABLISHED - connected
  *   TCP_CLOSING - disconnecting
  *   TCP_LISTEN - listening
+ *
+ * - Half-closed socket is supported only on the guest side. recv() on the host
+ * side should return EOF when the guest closes a connection, also if some
+ * data is still in the receive queue.
  */
 
 #include 
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net 0/2] vsock: don't allow half-closed socket in the host transports

2019-10-11 Thread Stefano Garzarella
We are implementing a test suite for the VSOCK sockets and we discovered
that vmci_transport never allowed half-closed socket on the host side.

As Jorgen explained [1] this is due to the implementation of VMCI.

Since we want to have the same behaviour across all transports, this
series adds a section in the "Implementation notes" to exaplain this
behaviour, and changes the vhost_transport to behave the same way.

[1] https://patchwork.ozlabs.org/cover/847998/#1831400

Stefano Garzarella (2):
  vsock: add half-closed socket details in the implementation notes
  vhost/vsock: don't allow half-closed socket in the host

 drivers/vhost/vsock.c| 17 -
 net/vmw_vsock/af_vsock.c |  4 
 2 files changed, 20 insertions(+), 1 deletion(-)

-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net 2/2] vhost/vsock: don't allow half-closed socket in the host

2019-10-11 Thread Stefano Garzarella
vmci_transport never allowed half-closed socket on the host side.
In order to provide the same behaviour, we changed the
vhost_transport_stream_has_data() to return 0 (no data available)
if the peer (guest) closed the connection.

Signed-off-by: Stefano Garzarella 
---
 drivers/vhost/vsock.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 9f57736fe15e..754120aa4478 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -58,6 +58,21 @@ static u32 vhost_transport_get_local_cid(void)
return VHOST_VSOCK_DEFAULT_HOST_CID;
 }
 
+static s64 vhost_transport_stream_has_data(struct vsock_sock *vsk)
+{
+   /* vmci_transport doesn't allow half-closed socket on the host side.
+* recv() on the host side returns EOF when the guest closes a
+* connection, also if some data is still in the receive queue.
+*
+* In order to provide the same behaviour, we always return 0
+* (no data available) if the peer (guest) closed the connection.
+*/
+   if (vsk->peer_shutdown == SHUTDOWN_MASK)
+   return 0;
+
+   return virtio_transport_stream_has_data(vsk);
+}
+
 /* Callers that dereference the return value must hold vhost_vsock_mutex or the
  * RCU read lock.
  */
@@ -804,7 +819,7 @@ static struct virtio_transport vhost_transport = {
 
.stream_enqueue   = virtio_transport_stream_enqueue,
.stream_dequeue   = virtio_transport_stream_dequeue,
-   .stream_has_data  = virtio_transport_stream_has_data,
+   .stream_has_data  = vhost_transport_stream_has_data,
.stream_has_space = virtio_transport_stream_has_space,
.stream_rcvhiwat  = virtio_transport_stream_rcvhiwat,
.stream_is_active = virtio_transport_stream_is_active,
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 07/13] vsock: handle buffer_size sockopts in the core

2019-10-11 Thread Stefano Garzarella
On Fri, Oct 11, 2019 at 09:27:14AM +0100, Stefan Hajnoczi wrote:
> On Thu, Oct 10, 2019 at 11:32:54AM +0200, Stefano Garzarella wrote:
> > On Wed, Oct 09, 2019 at 01:30:26PM +0100, Stefan Hajnoczi wrote:
> > > On Fri, Sep 27, 2019 at 01:26:57PM +0200, Stefano Garzarella wrote:
> > > Another issue is that this patch drops the VIRTIO_VSOCK_MAX_BUF_SIZE
> > > limit that used to be enforced by virtio_transport_set_buffer_size().
> > > Now the limit is only applied at socket init time.  If the buffer size
> > > is changed later then VIRTIO_VSOCK_MAX_BUF_SIZE can be exceeded.  If
> > > that doesn't matter, why even bother with VIRTIO_VSOCK_MAX_BUF_SIZE
> > > here?
> > > 
> > 
> > The .notify_buffer_size() should avoid this issue, since it allows the
> > transport to limit the buffer size requested after the initialization.
> > 
> > But again the min set by the user can not be respected and in the
> > previous implementation we forced it to VIRTIO_VSOCK_MAX_BUF_SIZE.
> > 
> > Now we don't limit the min, but we guarantee only that vsk->buffer_size
> > is lower than VIRTIO_VSOCK_MAX_BUF_SIZE.
> > 
> > Can that be an acceptable compromise?
> 
> I think so.
> 
> Setting buffer sizes was never tested or used much by userspace
> applications that I'm aware of.  We should probably include tests for
> changing buffer sizes in the test suite.

Good idea! We should add a test to check if min/max are respected,
playing a bit with these sockopt.

I'll do it in the test series!

Thanks,
Stefano
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 07/13] vsock: handle buffer_size sockopts in the core

2019-10-11 Thread Stefan Hajnoczi
On Thu, Oct 10, 2019 at 11:32:54AM +0200, Stefano Garzarella wrote:
> On Wed, Oct 09, 2019 at 01:30:26PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Sep 27, 2019 at 01:26:57PM +0200, Stefano Garzarella wrote:
> > Another issue is that this patch drops the VIRTIO_VSOCK_MAX_BUF_SIZE
> > limit that used to be enforced by virtio_transport_set_buffer_size().
> > Now the limit is only applied at socket init time.  If the buffer size
> > is changed later then VIRTIO_VSOCK_MAX_BUF_SIZE can be exceeded.  If
> > that doesn't matter, why even bother with VIRTIO_VSOCK_MAX_BUF_SIZE
> > here?
> > 
> 
> The .notify_buffer_size() should avoid this issue, since it allows the
> transport to limit the buffer size requested after the initialization.
> 
> But again the min set by the user can not be respected and in the
> previous implementation we forced it to VIRTIO_VSOCK_MAX_BUF_SIZE.
> 
> Now we don't limit the min, but we guarantee only that vsk->buffer_size
> is lower than VIRTIO_VSOCK_MAX_BUF_SIZE.
> 
> Can that be an acceptable compromise?

I think so.

Setting buffer sizes was never tested or used much by userspace
applications that I'm aware of.  We should probably include tests for
changing buffer sizes in the test suite.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH V3 6/7] virtio: introduce a mdev based transport

2019-10-11 Thread Jason Wang
This patch introduces a new mdev transport for virtio. This is used to
use kernel virtio driver to drive the mediated device that is capable
of populating virtqueue directly.

A new virtio-mdev driver will be registered to the mdev bus, when a
new virtio-mdev device is probed, it will register the device with
mdev based config ops. This means it is a software transport between
mdev driver and mdev device. The transport was implemented through
device specific opswhich is a part of mdev_parent_ops now.

Signed-off-by: Jason Wang 
---
 drivers/virtio/Kconfig   |   7 +
 drivers/virtio/Makefile  |   1 +
 drivers/virtio/virtio_mdev.c | 416 +++
 3 files changed, 424 insertions(+)
 create mode 100644 drivers/virtio/virtio_mdev.c

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 078615cf2afc..8d18722ab572 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -43,6 +43,13 @@ config VIRTIO_PCI_LEGACY
 
  If unsure, say Y.
 
+config VIRTIO_MDEV_DEVICE
+   tristate "VIRTIO driver for Mediated devices"
+   depends on VFIO_MDEV && VIRTIO
+   default n
+   help
+ VIRTIO based driver for Mediated devices.
+
 config VIRTIO_PMEM
tristate "Support for virtio pmem driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5dcf46..ebc7fa15ae82 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_MDEV_DEVICE) += virtio_mdev.o
diff --git a/drivers/virtio/virtio_mdev.c b/drivers/virtio/virtio_mdev.c
new file mode 100644
index ..8516f3f0f658
--- /dev/null
+++ b/drivers/virtio/virtio_mdev.c
@@ -0,0 +1,416 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VIRTIO based driver for Mediated device
+ *
+ * Copyright (c) 2019, Red Hat. All rights reserved.
+ * Author: Jason Wang 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Red Hat Corporation"
+#define DRIVER_DESC "VIRTIO based driver for Mediated device"
+
+#define to_virtio_mdev_device(dev) \
+   container_of(dev, struct virtio_mdev_device, vdev)
+
+struct virtio_mdev_device {
+   struct virtio_device vdev;
+   struct mdev_device *mdev;
+   unsigned long version;
+
+   struct virtqueue **vqs;
+   /* The lock to protect virtqueue list */
+   spinlock_t lock;
+   struct list_head virtqueues;
+};
+
+struct virtio_mdev_vq_info {
+   /* the actual virtqueue */
+   struct virtqueue *vq;
+
+   /* the list node for the virtqueues list */
+   struct list_head node;
+};
+
+static struct mdev_device *vm_get_mdev(struct virtio_device *vdev)
+{
+   struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+   struct mdev_device *mdev = vm_dev->mdev;
+
+   return mdev;
+}
+
+static void virtio_mdev_get(struct virtio_device *vdev, unsigned offset,
+   void *buf, unsigned len)
+{
+   struct mdev_device *mdev = vm_get_mdev(vdev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+
+   ops->get_config(mdev, offset, buf, len);
+}
+
+static void virtio_mdev_set(struct virtio_device *vdev, unsigned offset,
+   const void *buf, unsigned len)
+{
+   struct mdev_device *mdev = vm_get_mdev(vdev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+
+   ops->set_config(mdev, offset, buf, len);
+}
+
+static u32 virtio_mdev_generation(struct virtio_device *vdev)
+{
+   struct mdev_device *mdev = vm_get_mdev(vdev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+
+   return ops->get_generation(mdev);
+}
+
+static u8 virtio_mdev_get_status(struct virtio_device *vdev)
+{
+   struct mdev_device *mdev = vm_get_mdev(vdev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+
+   return ops->get_status(mdev);
+}
+
+static void virtio_mdev_set_status(struct virtio_device *vdev, u8 status)
+{
+   struct mdev_device *mdev = vm_get_mdev(vdev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+
+   return ops->set_status(mdev, status);
+}
+
+static void virtio_mdev_reset(struct virtio_device *vdev)
+{
+   struct mdev_device *mdev = vm_get_mdev(vdev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
+
+   return ops->set_status(mdev, 0);
+}
+
+static bool virtio_mdev_notify(struct virtqueue *vq)
+{
+   struct mdev_device *mdev = vm_get_mdev(vq->vdev);
+   const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);

[PATCH V3 7/7] docs: sample driver to demonstrate how to implement virtio-mdev framework

2019-10-11 Thread Jason Wang
This sample driver creates mdev device that simulate virtio net device
over virtio mdev transport. The device is implemented through vringh
and workqueue. A device specific dma ops is to make sure HVA is used
directly as the IOVA. This should be sufficient for kernel virtio
driver to work.

Only 'virtio' type is supported right now. I plan to add 'vhost' type
on top which requires some virtual IOMMU implemented in this sample
driver.

Signed-off-by: Jason Wang 
---
 MAINTAINERS|   1 +
 samples/Kconfig|   7 +
 samples/vfio-mdev/Makefile |   1 +
 samples/vfio-mdev/mvnet.c  | 691 +
 4 files changed, 700 insertions(+)
 create mode 100644 samples/vfio-mdev/mvnet.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3d196a023b5e..cb51351cd5c9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17254,6 +17254,7 @@ F:  include/linux/virtio*.h
 F: include/uapi/linux/virtio_*.h
 F: drivers/crypto/virtio/
 F: mm/balloon_compaction.c
+F: samples/vfio-mdev/mvnet.c
 
 VIRTIO BLOCK AND SCSI DRIVERS
 M: "Michael S. Tsirkin" 
diff --git a/samples/Kconfig b/samples/Kconfig
index c8dacb4dda80..a1a1ca2c00b7 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -131,6 +131,13 @@ config SAMPLE_VFIO_MDEV_MDPY
  mediated device.  It is a simple framebuffer and supports
  the region display interface (VFIO_GFX_PLANE_TYPE_REGION).
 
+config SAMPLE_VIRTIO_MDEV_NET
+tristate "Build virtio mdev net example mediated device sample code -- 
loadable modules only"
+   depends on VIRTIO_MDEV_DEVICE && VHOST_RING && m
+   help
+ Build a networking sample device for use as a virtio
+ mediated device.
+
 config SAMPLE_VFIO_MDEV_MDPY_FB
tristate "Build VFIO mdpy example guest fbdev driver -- loadable module 
only"
depends on FB && m
diff --git a/samples/vfio-mdev/Makefile b/samples/vfio-mdev/Makefile
index 10d179c4fdeb..f34af90ed0a0 100644
--- a/samples/vfio-mdev/Makefile
+++ b/samples/vfio-mdev/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_SAMPLE_VFIO_MDEV_MTTY) += mtty.o
 obj-$(CONFIG_SAMPLE_VFIO_MDEV_MDPY) += mdpy.o
 obj-$(CONFIG_SAMPLE_VFIO_MDEV_MDPY_FB) += mdpy-fb.o
 obj-$(CONFIG_SAMPLE_VFIO_MDEV_MBOCHS) += mbochs.o
+obj-$(CONFIG_SAMPLE_VIRTIO_MDEV_NET) += mvnet.o
diff --git a/samples/vfio-mdev/mvnet.c b/samples/vfio-mdev/mvnet.c
new file mode 100644
index ..b218e7075611
--- /dev/null
+++ b/samples/vfio-mdev/mvnet.c
@@ -0,0 +1,691 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Mediated virtual virtio-net device driver.
+ *
+ * Copyright (c) 2019, Red Hat Inc. All rights reserved.
+ * Author: Jason Wang 
+ *
+ * Sample driver that creates mdev device that simulates ethernet loopback
+ * device.
+ *
+ * Usage:
+ *
+ * # modprobe virtio_mdev
+ * # modprobe mvnet
+ * # cd /sys/devices/virtual/mvnet/mvnet/mdev_supported_types/mvnet-virtio
+ * # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > ./create
+ * # cd devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
+ * # ls -d virtio0
+ * virtio0
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define VERSION_STRING  "0.1"
+#define DRIVER_AUTHOR   "Red Hat Corporation"
+
+#define MVNET_CLASS_NAME "mvnet"
+#define MVNET_NAME   "mvnet"
+
+/*
+ * Global Structures
+ */
+
+static struct mvnet_dev {
+   struct class*vd_class;
+   struct idr  vd_idr;
+   struct device   dev;
+} mvnet_dev;
+
+struct mvnet_virtqueue {
+   struct vringh vring;
+   struct vringh_kiov iov;
+   unsigned short head;
+   bool ready;
+   u64 desc_addr;
+   u64 device_addr;
+   u64 driver_addr;
+   u32 num;
+   void *private;
+   irqreturn_t (*cb)(void *data);
+};
+
+#define MVNET_QUEUE_ALIGN PAGE_SIZE
+#define MVNET_QUEUE_MAX 256
+#define MVNET_DEVICE_ID 0x1
+#define MVNET_VENDOR_ID 0
+
+u64 mvnet_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
+(1ULL << VIRTIO_F_VERSION_1) |
+(1ULL << VIRTIO_F_IOMMU_PLATFORM);
+
+/* State of each mdev device */
+struct mvnet_state {
+   struct mvnet_virtqueue vqs[2];
+   struct work_struct work;
+   spinlock_t lock;
+   struct mdev_device *mdev;
+   struct virtio_net_config config;
+   void *buffer;
+   u32 status;
+   u32 generation;
+   u64 features;
+   struct list_head next;
+};
+
+static struct mutex mdev_list_lock;
+static struct list_head mdev_devices_list;
+
+static void mvnet_queue_ready(struct mvnet_state *mvnet, unsigned int idx)
+{
+   struct mvnet_virtqueue *vq = >vqs[idx];
+   int ret;
+
+   ret = vringh_init_kern(>vring, mvnet_features, MVNET_QUEUE_MAX,
+  false, (struct vring_desc *)vq->desc_addr,
+  (struct vring_avail *)vq->driver_addr,
+  

[PATCH V3 3/7] modpost: add support for mdev class id

2019-10-11 Thread Jason Wang
Add support to parse mdev class id table.

Signed-off-by: Jason Wang 
---
 drivers/vfio/mdev/vfio_mdev.c |  2 ++
 scripts/mod/devicetable-offsets.c |  3 +++
 scripts/mod/file2alias.c  | 10 ++
 3 files changed, 15 insertions(+)

diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
index fd2a4d9a3f26..891cf83a2d9a 100644
--- a/drivers/vfio/mdev/vfio_mdev.c
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -125,6 +125,8 @@ static struct mdev_class_id id_table[] = {
{ 0 },
 };
 
+MODULE_DEVICE_TABLE(mdev, id_table);
+
 static struct mdev_driver vfio_mdev_driver = {
.name   = "vfio_mdev",
.probe  = vfio_mdev_probe,
diff --git a/scripts/mod/devicetable-offsets.c 
b/scripts/mod/devicetable-offsets.c
index 054405b90ba4..6cbb1062488a 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -231,5 +231,8 @@ int main(void)
DEVID(wmi_device_id);
DEVID_FIELD(wmi_device_id, guid_string);
 
+   DEVID(mdev_class_id);
+   DEVID_FIELD(mdev_class_id, id);
+
return 0;
 }
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index c91eba751804..d365dfe7c718 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -1335,6 +1335,15 @@ static int do_wmi_entry(const char *filename, void 
*symval, char *alias)
return 1;
 }
 
+/* looks like: "mdev:cN" */
+static int do_mdev_entry(const char *filename, void *symval, char *alias)
+{
+   DEF_FIELD(symval, mdev_class_id, id);
+
+   sprintf(alias, "mdev:c%02X", id);
+   return 1;
+}
+
 /* Does namelen bytes of name exactly match the symbol? */
 static bool sym_is(const char *name, unsigned namelen, const char *symbol)
 {
@@ -1407,6 +1416,7 @@ static const struct devtable devtable[] = {
{"typec", SIZE_typec_device_id, do_typec_entry},
{"tee", SIZE_tee_client_device_id, do_tee_entry},
{"wmi", SIZE_wmi_device_id, do_wmi_entry},
+   {"mdev", SIZE_mdev_class_id, do_mdev_entry},
 };
 
 /* Create MODULE_ALIAS() statements.
-- 
2.19.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 4/7] mdev: introduce device specific ops

2019-10-11 Thread Jason Wang
Currently, except for the create and remove, the rest of
mdev_parent_ops is designed for vfio-mdev driver only and may not help
for kernel mdev driver. With the help of class id, this patch
introduces device specific callbacks inside mdev_device
structure. This allows different set of callback to be used by
vfio-mdev and virtio-mdev.

Signed-off-by: Jason Wang 
---
 .../driver-api/vfio-mediated-device.rst   | 22 +---
 MAINTAINERS   |  1 +
 drivers/gpu/drm/i915/gvt/kvmgt.c  | 18 ---
 drivers/s390/cio/vfio_ccw_ops.c   | 18 ---
 drivers/s390/crypto/vfio_ap_ops.c | 14 +++--
 drivers/vfio/mdev/mdev_core.c |  9 +++-
 drivers/vfio/mdev/mdev_private.h  |  1 +
 drivers/vfio/mdev/vfio_mdev.c | 37 ++---
 include/linux/mdev.h  | 42 +++
 include/linux/vfio_mdev.h | 52 +++
 samples/vfio-mdev/mbochs.c| 20 ---
 samples/vfio-mdev/mdpy.c  | 21 +---
 samples/vfio-mdev/mtty.c  | 18 ---
 13 files changed, 177 insertions(+), 96 deletions(-)
 create mode 100644 include/linux/vfio_mdev.h

diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
b/Documentation/driver-api/vfio-mediated-device.rst
index 2035e48da7b2..553574ebba73 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -152,11 +152,20 @@ callbacks per mdev parent device, per mdev type, or any 
other categorization.
 Vendor drivers are expected to be fully asynchronous in this respect or
 provide their own internal resource protection.)
 
-The callbacks in the mdev_parent_ops structure are as follows:
+In order to support multiple types of device/driver, device needs to
+provide both class_id and device_ops through:
 
-* open: open callback of mediated device
-* close: close callback of mediated device
-* ioctl: ioctl callback of mediated device
+void mdev_set_class(struct mdev_device *mdev, u16 id, const void *ops);
+
+The class_id is used to be paired with ids in id_table in mdev_driver
+structure for probing the correct driver. The device_ops is device
+specific callbacks which can be get through mdev_get_dev_ops()
+function by mdev bus driver. For vfio-mdev device, its device specific
+ops are as follows:
+
+* open: open callback of vfio mediated device
+* close: close callback of vfio mediated device
+* ioctl: ioctl callback of vfio mediated device
 * read : read emulation callback
 * write: write emulation callback
 * mmap: mmap emulation callback
@@ -167,9 +176,10 @@ register itself with the mdev core driver::
extern int  mdev_register_device(struct device *dev,
 const struct mdev_parent_ops *ops);
 
-It is also required to specify the class_id through::
+It is also required to specify the class_id and device specific ops through::
 
-   extern int mdev_set_class(struct device *dev, u16 id);
+   extern int mdev_set_class(struct device *dev, u16 id,
+ const void *ops);
 
 However, the mdev_parent_ops structure is not required in the function call
 that a driver should use to unregister itself with the mdev core driver::
diff --git a/MAINTAINERS b/MAINTAINERS
index 8824f61cd2c0..3d196a023b5e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17127,6 +17127,7 @@ S:  Maintained
 F: Documentation/driver-api/vfio-mediated-device.rst
 F: drivers/vfio/mdev/
 F: include/linux/mdev.h
+F: include/linux/vfio_mdev.h
 F: samples/vfio-mdev/
 
 VFIO PLATFORM DRIVER
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 17e9d4634c84..7e2b720074fd 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -643,6 +644,8 @@ static void kvmgt_put_vfio_device(void *vgpu)
vfio_device_put(((struct intel_vgpu *)vgpu)->vdev.vfio_device);
 }
 
+static const struct vfio_mdev_device_ops intel_vfio_vgpu_dev_ops;
+
 static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
 {
struct intel_vgpu *vgpu = NULL;
@@ -678,7 +681,7 @@ static int intel_vgpu_create(struct kobject *kobj, struct 
mdev_device *mdev)
 dev_name(mdev_dev(mdev)));
ret = 0;
 
-   mdev_set_class(mdev, MDEV_ID_VFIO);
+   mdev_set_class(mdev, MDEV_ID_VFIO, _vfio_vgpu_dev_ops);
 out:
return ret;
 }
@@ -1599,20 +1602,21 @@ static const struct attribute_group 
*intel_vgpu_groups[] = {
NULL,
 };
 
-static struct mdev_parent_ops intel_vgpu_ops = {
-   .mdev_attr_groups   = intel_vgpu_groups,
-   .create = intel_vgpu_create,
-   .remove = intel_vgpu_remove,
-
+static const struct vfio_mdev_device_ops 

[PATCH V3 5/7] mdev: introduce virtio device and its device ops

2019-10-11 Thread Jason Wang
This patch implements basic support for mdev driver that supports
virtio transport for kernel virtio driver.

Signed-off-by: Jason Wang 
---
 include/linux/mdev.h|   1 +
 include/linux/virtio_mdev.h | 148 
 2 files changed, 149 insertions(+)
 create mode 100644 include/linux/virtio_mdev.h

diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index f491308674e5..298581e7ad45 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -124,6 +124,7 @@ struct mdev_device *mdev_from_dev(struct device *dev);
 
 enum {
MDEV_ID_VFIO = 1,
+   MDEV_ID_VIRTIO = 2,
/* New entries must be added here */
 };
 
diff --git a/include/linux/virtio_mdev.h b/include/linux/virtio_mdev.h
new file mode 100644
index ..b39c62534acc
--- /dev/null
+++ b/include/linux/virtio_mdev.h
@@ -0,0 +1,148 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Virtio mediated device driver
+ *
+ * Copyright 2019, Red Hat Corp.
+ * Author: Jason Wang 
+ */
+#ifndef _LINUX_VIRTIO_MDEV_H
+#define _LINUX_VIRTIO_MDEV_H
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_MDEV_DEVICE_API_STRING  "virtio-mdev"
+#define VIRTIO_MDEV_F_VERSION_1 0x1
+
+struct virtio_mdev_callback {
+   irqreturn_t (*callback)(void *data);
+   void *private;
+};
+
+/**
+ * struct vfio_mdev_device_ops - Structure to be registered for each
+ * mdev device to register the device to virtio-mdev module.
+ *
+ * @set_vq_address:Set the address of virtqueue
+ * @mdev: mediated device
+ * @idx: virtqueue index
+ * @desc_area: address of desc area
+ * @driver_area: address of driver area
+ * @device_area: address of device area
+ * Returns integer: success (0) or error (< 0)
+ * @set_vq_num:Set the size of virtqueue
+ * @mdev: mediated device
+ * @idx: virtqueue index
+ * @num: the size of virtqueue
+ * @kick_vq:   Kick the virtqueue
+ * @mdev: mediated device
+ * @idx: virtqueue index
+ * @set_vq_cb: Set the interrut calback function for
+ * a virtqueue
+ * @mdev: mediated device
+ * @idx: virtqueue index
+ * @cb: virtio-mdev interrupt callback structure
+ * @set_vq_ready:  Set ready status for a virtqueue
+ * @mdev: mediated device
+ * @idx: virtqueue index
+ * @ready: ready (true) not ready(false)
+ * @get_vq_ready:  Get ready status for a virtqueue
+ * @mdev: mediated device
+ * @idx: virtqueue index
+ * Returns boolean: ready (true) or not (false)
+ * @set_vq_state:  Set the state for a virtqueue
+ * @mdev: mediated device
+ * @idx: virtqueue index
+ * @state: virtqueue state (last_avail_idx)
+ * Returns integer: success (0) or error (< 0)
+ * @get_vq_state:  Get the state for a virtqueue
+ * @mdev: mediated device
+ * @idx: virtqueue index
+ * Returns virtqueue state (last_avail_idx)
+ * @get_vq_align:  Get the virtqueue align requirement
+ * for the device
+ * @mdev: mediated device
+ * Returns virtqueue algin requirement
+ * @get_features:  Get virtio features supported by the device
+ * @mdev: mediated device
+ * Returns the virtio features support by the
+ * device
+ * @get_features:  Set virtio features supported by the driver
+ * @mdev: mediated device
+ * @features: feature support by the driver
+ * Returns integer: success (0) or error (< 0)
+ * @set_config_cb: Set the config interrupt callback
+ * @mdev: mediated device
+ * @cb: virtio-mdev interrupt callback structure
+ * @get_vq_num_max:Get the max size of virtqueue
+ * @mdev: mediated device
+ * Returns u16: max size of virtqueue
+ * @get_device_id: Get virtio device id
+ * @mdev: mediated device
+ * Returns u32: virtio device id
+ * @get_vendor_id:   

[PATCH V3 2/7] mdev: bus uevent support

2019-10-11 Thread Jason Wang
This patch adds bus uevent support for mdev bus in order to allow
cooperation with userspace.

Signed-off-by: Jason Wang 
---
 drivers/vfio/mdev/mdev_driver.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c
index b7c40ce86ee3..319d886ffaf7 100644
--- a/drivers/vfio/mdev/mdev_driver.c
+++ b/drivers/vfio/mdev/mdev_driver.c
@@ -82,9 +82,17 @@ static int mdev_match(struct device *dev, struct 
device_driver *drv)
return 0;
 }
 
+static int mdev_uevent(struct device *dev, struct kobj_uevent_env *env)
+{
+   struct mdev_device *mdev = to_mdev_device(dev);
+
+   return add_uevent_var(env, "MODALIAS=mdev:c%02X", mdev->class_id);
+}
+
 struct bus_type mdev_bus_type = {
.name   = "mdev",
.match  = mdev_match,
+   .uevent = mdev_uevent,
.probe  = mdev_probe,
.remove = mdev_remove,
 };
-- 
2.19.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 1/7] mdev: class id support

2019-10-11 Thread Jason Wang
Mdev bus only supports vfio driver right now, so it doesn't implement
match method. But in the future, we may add drivers other than vfio,
the first driver could be virtio-mdev. This means we need to add
device class id support in bus match method to pair the mdev device
and mdev driver correctly.

So this patch adds id_table to mdev_driver and class_id for mdev
device with the match method for mdev bus.

Signed-off-by: Jason Wang 
---
 Documentation/driver-api/vfio-mediated-device.rst |  7 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
 drivers/s390/cio/vfio_ccw_ops.c   |  1 +
 drivers/s390/crypto/vfio_ap_ops.c |  1 +
 drivers/vfio/mdev/mdev_core.c | 11 +++
 drivers/vfio/mdev/mdev_driver.c   | 14 ++
 drivers/vfio/mdev/mdev_private.h  |  1 +
 drivers/vfio/mdev/vfio_mdev.c |  6 ++
 include/linux/mdev.h  |  8 
 include/linux/mod_devicetable.h   |  8 
 samples/vfio-mdev/mbochs.c|  1 +
 samples/vfio-mdev/mdpy.c  |  1 +
 samples/vfio-mdev/mtty.c  |  1 +
 13 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst 
b/Documentation/driver-api/vfio-mediated-device.rst
index 25eb7d5b834b..2035e48da7b2 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -102,12 +102,14 @@ structure to represent a mediated device's driver::
   * @probe: called when new device created
   * @remove: called when device removed
   * @driver: device driver structure
+  * @id_table: the ids serviced by this driver
   */
  struct mdev_driver {
 const char *name;
 int  (*probe)  (struct device *dev);
 void (*remove) (struct device *dev);
 struct device_driverdriver;
+const struct mdev_class_id *id_table;
  };
 
 A mediated bus driver for mdev should use this structure in the function calls
@@ -165,12 +167,15 @@ register itself with the mdev core driver::
extern int  mdev_register_device(struct device *dev,
 const struct mdev_parent_ops *ops);
 
+It is also required to specify the class_id through::
+
+   extern int mdev_set_class(struct device *dev, u16 id);
+
 However, the mdev_parent_ops structure is not required in the function call
 that a driver should use to unregister itself with the mdev core driver::
 
extern void mdev_unregister_device(struct device *dev);
 
-
 Mediated Device Management Interface Through sysfs
 ==
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 343d79c1cb7e..17e9d4634c84 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -678,6 +678,7 @@ static int intel_vgpu_create(struct kobject *kobj, struct 
mdev_device *mdev)
 dev_name(mdev_dev(mdev)));
ret = 0;
 
+   mdev_set_class(mdev, MDEV_ID_VFIO);
 out:
return ret;
 }
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index f0d71ab77c50..b5d223882c6c 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -129,6 +129,7 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, 
struct mdev_device *mdev)
   private->sch->schid.ssid,
   private->sch->schid.sch_no);
 
+   mdev_set_class(mdev, MDEV_ID_VFIO);
return 0;
 }
 
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 5c0f53c6dde7..47df1c593c35 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -343,6 +343,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
list_add(_mdev->node, _dev->mdev_list);
mutex_unlock(_dev->lock);
 
+   mdev_set_class(mdev, MDEV_ID_VFIO);
return 0;
 }
 
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..724e9b9841d8 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -45,6 +45,12 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data)
 }
 EXPORT_SYMBOL(mdev_set_drvdata);
 
+void mdev_set_class(struct mdev_device *mdev, u16 id)
+{
+   mdev->class_id = id;
+}
+EXPORT_SYMBOL(mdev_set_class);
+
 struct device *mdev_dev(struct mdev_device *mdev)
 {
return >dev;
@@ -135,6 +141,7 @@ static int mdev_device_remove_cb(struct device *dev, void 
*data)
  * mdev_register_device : Register a device
  * @dev: device structure representing parent device.
  * @ops: Parent device operation structure to be registered.
+ * @id: class id.
  *
  * Add device to list 

[PATCH V3 0/7] mdev based hardware virtio offloading support

2019-10-11 Thread Jason Wang
Hi all:

There are hardware that can do virtio datapath offloading while having
its own control path. This path tries to implement a mdev based
unified API to support using kernel virtio driver to drive those
devices. This is done by introducing a new mdev transport for virtio
(virtio_mdev) and register itself as a new kind of mdev driver. Then
it provides a unified way for kernel virtio driver to talk with mdev
device implementation.

Though the series only contains kernel driver support, the goal is to
make the transport generic enough to support userspace drivers. This
means vhost-mdev[1] could be built on top as well by resuing the
transport.

A sample driver is also implemented which simulate a virito-net
loopback ethernet device on top of vringh + workqueue. This could be
used as a reference implementation for real hardware driver.

Consider mdev framework only support VFIO device and driver right now,
this series also extend it to support other types. This is done
through introducing class id to the device and pairing it with
id_talbe claimed by the driver. On top, this seris also decouple
device specific parents ops out of the common ones.

Pktgen test was done with virito-net + mvnet loop back device.

Please review.

[1] https://lkml.org/lkml/2019/9/26/15

Changes from V2:

- fail when class_id is not specified
- drop the vringh patch
- match the doc to the code
- tweak the commit log
- move device_ops from parent to mdev device
- remove the unused MDEV_ID_VHOST

Changes from V1:

- move virtio_mdev.c to drivers/virtio
- store class_id in mdev_device instead of mdev_parent
- store device_ops in mdev_device instead of mdev_parent
- reorder the patch, vringh fix comes first
- really silent compiling warnings
- really switch to use u16 for class_id
- uevent and modpost support for mdev class_id
- vraious tweaks per comments from Parav

Changes from RFC-V2:

- silent compile warnings on some specific configuration
- use u16 instead u8 for class id
- reseve MDEV_ID_VHOST for future vhost-mdev work
- introduce "virtio" type for mvnet and make "vhost" type for future
  work
- add entries in MAINTAINER
- tweak and typos fixes in commit log

Changes from RFC-V1:

- rename device id to class id
- add docs for class id and device specific ops (device_ops)
- split device_ops into seperate headers
- drop the mdev_set_dma_ops()
- use device_ops to implement the transport API, then it's not a part
  of UAPI any more
- use GFP_ATOMIC in mvnet sample device and other tweaks
- set_vring_base/get_vring_base support for mvnet device

Jason Wang (7):
  mdev: class id support
  mdev: bus uevent support
  modpost: add support for mdev class id
  mdev: introduce device specific ops
  mdev: introduce virtio device and its device ops
  virtio: introduce a mdev based transport
  docs: sample driver to demonstrate how to implement virtio-mdev
framework

 .../driver-api/vfio-mediated-device.rst   |  25 +-
 MAINTAINERS   |   2 +
 drivers/gpu/drm/i915/gvt/kvmgt.c  |  17 +-
 drivers/s390/cio/vfio_ccw_ops.c   |  17 +-
 drivers/s390/crypto/vfio_ap_ops.c |  13 +-
 drivers/vfio/mdev/mdev_core.c |  18 +
 drivers/vfio/mdev/mdev_driver.c   |  22 +
 drivers/vfio/mdev/mdev_private.h  |   2 +
 drivers/vfio/mdev/vfio_mdev.c |  45 +-
 drivers/virtio/Kconfig|   7 +
 drivers/virtio/Makefile   |   1 +
 drivers/virtio/virtio_mdev.c  | 416 +++
 include/linux/mdev.h  |  49 +-
 include/linux/mod_devicetable.h   |   8 +
 include/linux/vfio_mdev.h |  52 ++
 include/linux/virtio_mdev.h   | 148 
 samples/Kconfig   |   7 +
 samples/vfio-mdev/Makefile|   1 +
 samples/vfio-mdev/mbochs.c|  19 +-
 samples/vfio-mdev/mdpy.c  |  20 +-
 samples/vfio-mdev/mtty.c  |  17 +-
 samples/vfio-mdev/mvnet.c | 691 ++
 scripts/mod/devicetable-offsets.c |   3 +
 scripts/mod/file2alias.c  |  10 +
 24 files changed, 1523 insertions(+), 87 deletions(-)
 create mode 100644 drivers/virtio/virtio_mdev.c
 create mode 100644 include/linux/vfio_mdev.h
 create mode 100644 include/linux/virtio_mdev.h
 create mode 100644 samples/vfio-mdev/mvnet.c

-- 
2.19.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization