Re: [PATCH 6/8] vhost_task: Allow vhost layer to use copy_process

2022-02-03 Thread kernel test robot
Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on tip/x86/core linus/master v5.17-rc2]
[cannot apply to davem-sparc/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Mike-Christie/Use-copy_process-in-vhost-layer/20220203-050454
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-randconfig-s021 
(https://download.01.org/0day-ci/archive/20220203/202202032136.uq6pxzyt-...@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# 
https://github.com/0day-ci/linux/commit/2c7380ae8136c224f4c7074027303b97b0a0f84c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Mike-Christie/Use-copy_process-in-vhost-layer/20220203-050454
git checkout 2c7380ae8136c224f4c7074027303b97b0a0f84c
# save the config file to linux build tree
mkdir build_dir
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir 
ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)
>> kernel/vhost_task.c:85:24: sparse: sparse: incorrect type in argument 1 
>> (different base types) @@ expected unsigned long [usertype] size @@ 
>> got restricted gfp_t @@
   kernel/vhost_task.c:85:24: sparse: expected unsigned long [usertype] size
   kernel/vhost_task.c:85:24: sparse: got restricted gfp_t
>> kernel/vhost_task.c:85:36: sparse: sparse: incorrect type in argument 2 
>> (different base types) @@ expected restricted gfp_t [usertype] flags @@  
>>got unsigned long @@
   kernel/vhost_task.c:85:36: sparse: expected restricted gfp_t [usertype] 
flags
   kernel/vhost_task.c:85:36: sparse: got unsigned long

vim +85 kernel/vhost_task.c

62  
63  /**
64   * vhost_task_create - create a copy of a process to be used by the 
kernel
65   * @fn: thread stack
66   * @arg: data to be passed to fn
67   * @node: numa node to allocate task from
68   *
69   * This returns a specialized task for use by the vhost layer or NULL on
70   * failure. The returned task is inactive, and the caller must fire it 
up
71   * through vhost_task_start().
72   */
73  struct vhost_task *vhost_task_create(int (*fn)(void *), void *arg, int 
node)
74  {
75  struct kernel_clone_args args = {
76  .flags  = CLONE_FS | CLONE_UNTRACED | CLONE_VM,
77  .exit_signal= 0,
78  .stack  = (unsigned long)vhost_task_fn,
79  .worker_flags   = USER_WORKER | USER_WORKER_NO_FILES |
80USER_WORKER_SIG_IGN,
81  };
82  struct vhost_task *vtsk;
83  struct task_struct *tsk;
84  
  > 85  vtsk = kzalloc(GFP_KERNEL, sizeof(*vtsk));
86  if (!vtsk)
87  return ERR_PTR(-ENOMEM);
88  
89  init_completion(>exited);
90  vtsk->data = arg;
91  vtsk->fn = fn;
92  args.stack_size =  (unsigned long)vtsk;
93  
94  tsk = copy_process(NULL, 0, node, );
95  if (IS_ERR(tsk)) {
96  kfree(vtsk);
97  return NULL;
98  }
99  
   100  vtsk->task = tsk;
   101  
   102  return vtsk;
   103  }
   104  EXPORT_SYMBOL_GPL(vhost_task_create);
   105  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 6/8] vhost_task: Allow vhost layer to use copy_process

2022-02-03 Thread kernel test robot
Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on tip/x86/core linus/master v5.17-rc2 next-20220203]
[cannot apply to davem-sparc/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Mike-Christie/Use-copy_process-in-vhost-layer/20220203-050454
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: sparc-randconfig-s032-20220130 
(https://download.01.org/0day-ci/archive/20220203/202202032131.gnmg7b6h-...@intel.com/config)
compiler: sparc-linux-gcc (GCC) 11.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-dirty
# 
https://github.com/0day-ci/linux/commit/2c7380ae8136c224f4c7074027303b97b0a0f84c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Mike-Christie/Use-copy_process-in-vhost-layer/20220203-050454
git checkout 2c7380ae8136c224f4c7074027303b97b0a0f84c
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=sparc 
SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)
>> kernel/vhost_task.c:85:24: sparse: sparse: incorrect type in argument 1 
>> (different base types) @@ expected unsigned int [usertype] size @@ 
>> got restricted gfp_t @@
   kernel/vhost_task.c:85:24: sparse: expected unsigned int [usertype] size
   kernel/vhost_task.c:85:24: sparse: got restricted gfp_t
>> kernel/vhost_task.c:85:36: sparse: sparse: incorrect type in argument 2 
>> (different base types) @@ expected restricted gfp_t [usertype] flags @@  
>>got unsigned int @@
   kernel/vhost_task.c:85:36: sparse: expected restricted gfp_t [usertype] 
flags
   kernel/vhost_task.c:85:36: sparse: got unsigned int

vim +85 kernel/vhost_task.c

62  
63  /**
64   * vhost_task_create - create a copy of a process to be used by the 
kernel
65   * @fn: thread stack
66   * @arg: data to be passed to fn
67   * @node: numa node to allocate task from
68   *
69   * This returns a specialized task for use by the vhost layer or NULL on
70   * failure. The returned task is inactive, and the caller must fire it 
up
71   * through vhost_task_start().
72   */
73  struct vhost_task *vhost_task_create(int (*fn)(void *), void *arg, int 
node)
74  {
75  struct kernel_clone_args args = {
76  .flags  = CLONE_FS | CLONE_UNTRACED | CLONE_VM,
77  .exit_signal= 0,
78  .stack  = (unsigned long)vhost_task_fn,
79  .worker_flags   = USER_WORKER | USER_WORKER_NO_FILES |
80USER_WORKER_SIG_IGN,
81  };
82  struct vhost_task *vtsk;
83  struct task_struct *tsk;
84  
  > 85  vtsk = kzalloc(GFP_KERNEL, sizeof(*vtsk));
86  if (!vtsk)
87  return ERR_PTR(-ENOMEM);
88  
89  init_completion(>exited);
90  vtsk->data = arg;
91  vtsk->fn = fn;
92  args.stack_size =  (unsigned long)vtsk;
93  
94  tsk = copy_process(NULL, 0, node, );
95  if (IS_ERR(tsk)) {
96  kfree(vtsk);
97  return NULL;
98  }
99  
   100  vtsk->task = tsk;
   101  
   102  return vtsk;
   103  }
   104  EXPORT_SYMBOL_GPL(vhost_task_create);
   105  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 7/8] VMCI: dma dg: add support for DMA datagrams sends

2022-02-03 Thread Jorgen Hansen
Use DMA based send operation from the transmit buffer instead of the
iowrite8_rep based datagram send when DMA datagrams are supported.

The outgoing datagram is sent as inline data in the VMCI transmit
buffer. Once the header has been configured, the send is initiated
by writing the lower 32 bit of the buffer base address to the
VMCI_DATA_OUT_LOW_ADDR register. Only then will the device process
the header and the datagram itself. Following that, the driver busy
waits (it isn't possible to sleep on the send path) for the header
busy flag to change - indicating that the send is complete.

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 45 --
 include/linux/vmw_vmci_defs.h  | 34 ++
 2 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index c207ca2ca42e..ae2fd9c791d0 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -114,6 +115,47 @@ static void vmci_write_reg(struct vmci_guest_device *dev, 
u32 val, u32 reg)
iowrite32(val, dev->iobase + reg);
 }
 
+static int vmci_write_data(struct vmci_guest_device *dev,
+  struct vmci_datagram *dg)
+{
+   int result;
+
+   if (dev->mmio_base != NULL) {
+   struct vmci_data_in_out_header *buffer_header = dev->tx_buffer;
+   u8 *dg_out_buffer = (u8 *)(buffer_header + 1);
+
+   if (VMCI_DG_SIZE(dg) > VMCI_MAX_DG_SIZE)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   /*
+* Initialize send buffer with outgoing datagram
+* and set up header for inline data. Device will
+* not access buffer asynchronously - only after
+* the write to VMCI_DATA_OUT_LOW_ADDR.
+*/
+   memcpy(dg_out_buffer, dg, VMCI_DG_SIZE(dg));
+   buffer_header->opcode = 0;
+   buffer_header->size = VMCI_DG_SIZE(dg);
+   buffer_header->busy = 1;
+
+   vmci_write_reg(dev, lower_32_bits(dev->tx_buffer_base),
+  VMCI_DATA_OUT_LOW_ADDR);
+
+   /* Caller holds a spinlock, so cannot block. */
+   spin_until_cond(buffer_header->busy == 0);
+
+   result = vmci_read_reg(vmci_dev_g, VMCI_RESULT_LOW_ADDR);
+   if (result == VMCI_SUCCESS)
+   result = (int)buffer_header->result;
+   } else {
+   iowrite8_rep(dev->iobase + VMCI_DATA_OUT_ADDR,
+dg, VMCI_DG_SIZE(dg));
+   result = vmci_read_reg(vmci_dev_g, VMCI_RESULT_LOW_ADDR);
+   }
+
+   return result;
+}
+
 /*
  * VM to hypervisor call mechanism. We use the standard VMware naming
  * convention since shared code is calling this function as well.
@@ -139,8 +181,7 @@ int vmci_send_datagram(struct vmci_datagram *dg)
spin_lock_irqsave(_dev_spinlock, flags);
 
if (vmci_dev_g) {
-   iowrite8_rep(vmci_dev_g->iobase + VMCI_DATA_OUT_ADDR,
-dg, VMCI_DG_SIZE(dg));
+   vmci_write_data(vmci_dev_g, dg);
result = vmci_read_reg(vmci_dev_g, VMCI_RESULT_LOW_ADDR);
} else {
result = VMCI_ERROR_UNAVAILABLE;
diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h
index 8bc37d8244a8..6fb663b36f72 100644
--- a/include/linux/vmw_vmci_defs.h
+++ b/include/linux/vmw_vmci_defs.h
@@ -110,6 +110,40 @@ enum {
 #define VMCI_MMIO_ACCESS_OFFSET((size_t)(128 * 1024))
 #define VMCI_MMIO_ACCESS_SIZE  ((size_t)(64 * 1024))
 
+/*
+ * For VMCI devices supporting the VMCI_CAPS_DMA_DATAGRAM capability, the
+ * sending and receiving of datagrams can be performed using DMA to/from
+ * a driver allocated buffer.
+ * Sending and receiving will be handled as follows:
+ * - when sending datagrams, the driver initializes the buffer where the
+ *   data part will refer to the outgoing VMCI datagram, sets the busy flag
+ *   to 1 and writes the address of the buffer to VMCI_DATA_OUT_HIGH_ADDR
+ *   and VMCI_DATA_OUT_LOW_ADDR. Writing to VMCI_DATA_OUT_LOW_ADDR triggers
+ *   the device processing of the buffer. When the device has processed the
+ *   buffer, it will write the result value to the buffer and then clear the
+ *   busy flag.
+ * - when receiving datagrams, the driver initializes the buffer where the
+ *   data part will describe the receive buffer, clears the busy flag and
+ *   writes the address of the buffer to VMCI_DATA_IN_HIGH_ADDR and
+ *   VMCI_DATA_IN_LOW_ADDR. Writing to VMCI_DATA_IN_LOW_ADDR triggers the
+ *   device processing of the buffer. The device will copy as many available
+ *   datagrams into the buffer as possible, and then sets the 

[PATCH v2 4/8] VMCI: dma dg: set OS page size

2022-02-03 Thread Jorgen Hansen
Tell the device the page size used by the OS.

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 9 +
 include/linux/vmw_vmci_defs.h  | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index 5a99d8e27873..808680dc0820 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -581,6 +581,15 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
/* Let the host know which capabilities we intend to use. */
vmci_write_reg(vmci_dev, caps_in_use, VMCI_CAPS_ADDR);
 
+   if (caps_in_use & VMCI_CAPS_DMA_DATAGRAM) {
+   uint32_t page_shift;
+
+   /* Let the device know the size for pages passed down. */
+   vmci_write_reg(vmci_dev, PAGE_SHIFT, VMCI_GUEST_PAGE_SHIFT);
+   page_shift = vmci_read_reg(vmci_dev, VMCI_GUEST_PAGE_SHIFT);
+   dev_info(>dev, "Using page shift %d\n", page_shift);
+   }
+
/* Set up global device so that we can start sending datagrams */
spin_lock_irq(_dev_spinlock);
vmci_dev_g = vmci_dev;
diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h
index 1ce2cffdc3ae..4167779469fd 100644
--- a/include/linux/vmw_vmci_defs.h
+++ b/include/linux/vmw_vmci_defs.h
@@ -21,6 +21,7 @@
 #define VMCI_CAPS_ADDR  0x18
 #define VMCI_RESULT_LOW_ADDR0x1c
 #define VMCI_RESULT_HIGH_ADDR   0x20
+#define VMCI_GUEST_PAGE_SHIFT   0x34
 
 /* Max number of devices. */
 #define VMCI_MAX_DEVICES 1
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 8/8] VMCI: dma dg: add support for DMA datagrams receive

2022-02-03 Thread Jorgen Hansen
Use the DMA based receive operation instead of the ioread8_rep
based datagram receive when DMA datagrams are supported.

In the receive operation, configure the header to point to the
page aligned VMCI_MAX_DG_SIZE part of the receive buffer
using s/g configuration for the header. This ensures that the
existing dispatch routine can be used with little modification.
Initiate the receive by writing the lower 32 bit of the buffer
to the VMCI_DATA_IN_LOW_ADDR register, and wait for the busy
flag to be changed by the device using a wait queue.

The existing dispatch routine for received  datagrams is reused
for the DMA datagrams with a few modifications:
- the receive buffer is always the maximum size for DMA datagrams
  (IO ports would try with a shorter buffer first to reduce
  overhead of the ioread8_rep operation).
- for DMA datagrams, datagrams are provided contiguous in the
  buffer as opposed to IO port datagrams, where they can start
  on any page boundary

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 103 ++---
 1 file changed, 79 insertions(+), 24 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index ae2fd9c791d0..67fac1b8f262 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -58,6 +58,7 @@ struct vmci_guest_device {
 
struct tasklet_struct datagram_tasklet;
struct tasklet_struct bm_tasklet;
+   struct wait_queue_head inout_wq;
 
void *data_buffer;
dma_addr_t data_buffer_base;
@@ -115,6 +116,36 @@ static void vmci_write_reg(struct vmci_guest_device *dev, 
u32 val, u32 reg)
iowrite32(val, dev->iobase + reg);
 }
 
+static void vmci_read_data(struct vmci_guest_device *vmci_dev,
+  void *dest, size_t size)
+{
+   if (vmci_dev->mmio_base == NULL)
+   ioread8_rep(vmci_dev->iobase + VMCI_DATA_IN_ADDR,
+   dest, size);
+   else {
+   /*
+* For DMA datagrams, the data_buffer will contain the header 
on the
+* first page, followed by the incoming datagram(s) on the 
following
+* pages. The header uses an S/G element immediately following 
the
+* header on the first page to point to the data area.
+*/
+   struct vmci_data_in_out_header *buffer_header = 
vmci_dev->data_buffer;
+   struct vmci_sg_elem *sg_array = (struct vmci_sg_elem 
*)(buffer_header + 1);
+   size_t buffer_offset = dest - vmci_dev->data_buffer;
+
+   buffer_header->opcode = 1;
+   buffer_header->size = 1;
+   buffer_header->busy = 0;
+   sg_array[0].addr = vmci_dev->data_buffer_base + buffer_offset;
+   sg_array[0].size = size;
+
+   vmci_write_reg(vmci_dev, 
lower_32_bits(vmci_dev->data_buffer_base),
+  VMCI_DATA_IN_LOW_ADDR);
+
+   wait_event(vmci_dev->inout_wq, buffer_header->busy == 1);
+   }
+}
+
 static int vmci_write_data(struct vmci_guest_device *dev,
   struct vmci_datagram *dg)
 {
@@ -261,15 +292,17 @@ static int vmci_check_host_caps(struct pci_dev *pdev)
 }
 
 /*
- * Reads datagrams from the data in port and dispatches them. We
- * always start reading datagrams into only the first page of the
- * datagram buffer. If the datagrams don't fit into one page, we
- * use the maximum datagram buffer size for the remainder of the
- * invocation. This is a simple heuristic for not penalizing
- * small datagrams.
+ * Reads datagrams from the device and dispatches them. For IO port
+ * based access to the device, we always start reading datagrams into
+ * only the first page of the datagram buffer. If the datagrams don't
+ * fit into one page, we use the maximum datagram buffer size for the
+ * remainder of the invocation. This is a simple heuristic for not
+ * penalizing small datagrams. For DMA-based datagrams, we always
+ * use the maximum datagram buffer size, since there is no performance
+ * penalty for doing so.
  *
  * This function assumes that it has exclusive access to the data
- * in port for the duration of the call.
+ * in register(s) for the duration of the call.
  */
 static void vmci_dispatch_dgs(unsigned long data)
 {
@@ -277,23 +310,41 @@ static void vmci_dispatch_dgs(unsigned long data)
u8 *dg_in_buffer = vmci_dev->data_buffer;
struct vmci_datagram *dg;
size_t dg_in_buffer_size = VMCI_MAX_DG_SIZE;
-   size_t current_dg_in_buffer_size = PAGE_SIZE;
+   size_t current_dg_in_buffer_size;
size_t remaining_bytes;
+   bool is_io_port = vmci_dev->mmio_base == NULL;
 
BUILD_BUG_ON(VMCI_MAX_DG_SIZE < PAGE_SIZE);
 
-   ioread8_rep(vmci_dev->iobase + VMCI_DATA_IN_ADDR,
-   vmci_dev->data_buffer, 

[PATCH v2 3/8] VMCI: dma dg: detect DMA datagram capability

2022-02-03 Thread Jorgen Hansen
Detect the VMCI DMA datagram capability, and if present, ack it
to the device.

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 11 +++
 include/linux/vmw_vmci_defs.h  |  1 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index 38ee7ed32ab9..5a99d8e27873 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -565,6 +565,17 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
}
}
 
+   if (mmio_base != NULL) {
+   if (capabilities & VMCI_CAPS_DMA_DATAGRAM) {
+   caps_in_use |= VMCI_CAPS_DMA_DATAGRAM;
+   } else {
+   dev_err(>dev,
+   "Missing capability: VMCI_CAPS_DMA_DATAGRAM\n");
+   error = -ENXIO;
+   goto err_free_data_buffer;
+   }
+   }
+
dev_info(>dev, "Using capabilities 0x%x\n", caps_in_use);
 
/* Let the host know which capabilities we intend to use. */
diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h
index 8fc00e2685cf..1ce2cffdc3ae 100644
--- a/include/linux/vmw_vmci_defs.h
+++ b/include/linux/vmw_vmci_defs.h
@@ -39,6 +39,7 @@
 #define VMCI_CAPS_DATAGRAM  BIT(2)
 #define VMCI_CAPS_NOTIFICATIONS BIT(3)
 #define VMCI_CAPS_PPN64 BIT(4)
+#define VMCI_CAPS_DMA_DATAGRAM  BIT(5)
 
 /* Interrupt Cause register bits. */
 #define VMCI_ICR_DATAGRAM  BIT(0)
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 6/8] VMCI: dma dg: allocate send and receive buffers for DMA datagrams

2022-02-03 Thread Jorgen Hansen
If DMA datagrams are used, allocate send and receive buffers
in coherent DMA memory.

This is done in preparation for the send and receive datagram
operations, where the buffers are used for the exchange of data
between driver and device.

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 66 ++
 include/linux/vmw_vmci_defs.h  |  4 ++
 2 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index 0920fbc6b64f..c207ca2ca42e 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -31,6 +31,12 @@
 
 #define VMCI_UTIL_NUM_RESOURCES 1
 
+/*
+ * Datagram buffers for DMA send/receive must accommodate at least
+ * a maximum sized datagram and the header.
+ */
+#define VMCI_DMA_DG_BUFFER_SIZE (VMCI_MAX_DG_SIZE + PAGE_SIZE)
+
 static bool vmci_disable_msi;
 module_param_named(disable_msi, vmci_disable_msi, bool, 0);
 MODULE_PARM_DESC(disable_msi, "Disable MSI use in driver - (default=0)");
@@ -53,6 +59,9 @@ struct vmci_guest_device {
struct tasklet_struct bm_tasklet;
 
void *data_buffer;
+   dma_addr_t data_buffer_base;
+   void *tx_buffer;
+   dma_addr_t tx_buffer_base;
void *notification_bitmap;
dma_addr_t notification_base;
 };
@@ -451,6 +460,24 @@ static irqreturn_t vmci_interrupt_dma_datagram(int irq, 
void *_dev)
return IRQ_HANDLED;
 }
 
+static void vmci_free_dg_buffers(struct vmci_guest_device *vmci_dev)
+{
+   if (vmci_dev->mmio_base != NULL) {
+   if (vmci_dev->tx_buffer != NULL)
+   dma_free_coherent(vmci_dev->dev,
+ VMCI_DMA_DG_BUFFER_SIZE,
+ vmci_dev->tx_buffer,
+ vmci_dev->tx_buffer_base);
+   if (vmci_dev->data_buffer != NULL)
+   dma_free_coherent(vmci_dev->dev,
+ VMCI_DMA_DG_BUFFER_SIZE,
+ vmci_dev->data_buffer,
+ vmci_dev->data_buffer_base);
+   } else {
+   vfree(vmci_dev->data_buffer);
+   }
+}
+
 /*
  * Most of the initialization at module load time is done here.
  */
@@ -520,11 +547,27 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
tasklet_init(_dev->bm_tasklet,
 vmci_process_bitmap, (unsigned long)vmci_dev);
 
-   vmci_dev->data_buffer = vmalloc(VMCI_MAX_DG_SIZE);
+   if (mmio_base != NULL) {
+   vmci_dev->tx_buffer = dma_alloc_coherent(>dev, 
VMCI_DMA_DG_BUFFER_SIZE,
+
_dev->tx_buffer_base,
+GFP_KERNEL);
+   if (!vmci_dev->tx_buffer) {
+   dev_err(>dev,
+   "Can't allocate memory for datagram tx 
buffer\n");
+   return -ENOMEM;
+   }
+
+   vmci_dev->data_buffer = dma_alloc_coherent(>dev, 
VMCI_DMA_DG_BUFFER_SIZE,
+  
_dev->data_buffer_base,
+  GFP_KERNEL);
+   } else {
+   vmci_dev->data_buffer = vmalloc(VMCI_MAX_DG_SIZE);
+   }
if (!vmci_dev->data_buffer) {
dev_err(>dev,
"Can't allocate memory for datagram buffer\n");
-   return -ENOMEM;
+   error = -ENOMEM;
+   goto err_free_data_buffers;
}
 
pci_set_master(pdev);   /* To enable queue_pair functionality. */
@@ -542,7 +585,7 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
if (!(capabilities & VMCI_CAPS_DATAGRAM)) {
dev_err(>dev, "Device does not support datagrams\n");
error = -ENXIO;
-   goto err_free_data_buffer;
+   goto err_free_data_buffers;
}
caps_in_use = VMCI_CAPS_DATAGRAM;
 
@@ -586,7 +629,7 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
dev_err(>dev,
"Missing capability: VMCI_CAPS_DMA_DATAGRAM\n");
error = -ENXIO;
-   goto err_free_data_buffer;
+   goto err_free_data_buffers;
}
}
 
@@ -602,6 +645,12 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
vmci_write_reg(vmci_dev, PAGE_SHIFT, VMCI_GUEST_PAGE_SHIFT);
page_shift = vmci_read_reg(vmci_dev, VMCI_GUEST_PAGE_SHIFT);
dev_info(>dev, "Using page shift %d\n", page_shift);
+
+   /* Configure the high order parts of the data in/out buffers. */
+   vmci_write_reg(vmci_dev, 

[PATCH v2 5/8] VMCI: dma dg: register dummy IRQ handlers for DMA datagrams

2022-02-03 Thread Jorgen Hansen
Register dummy interrupt handlers for DMA datagrams in preparation for
DMA datagram receive operations.

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 42 +++---
 include/linux/vmw_vmci_defs.h  | 14 --
 2 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index 808680dc0820..0920fbc6b64f 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -414,6 +414,9 @@ static irqreturn_t vmci_interrupt(int irq, void *_dev)
icr &= ~VMCI_ICR_NOTIFICATION;
}
 
+   if (icr & VMCI_ICR_DMA_DATAGRAM)
+   icr &= ~VMCI_ICR_DMA_DATAGRAM;
+
if (icr != 0)
dev_warn(dev->dev,
 "Ignoring unknown interrupt cause (%d)\n",
@@ -438,6 +441,16 @@ static irqreturn_t vmci_interrupt_bm(int irq, void *_dev)
return IRQ_HANDLED;
 }
 
+/*
+ * Interrupt handler for MSI-X interrupt vector VMCI_INTR_DMA_DATAGRAM,
+ * which is for the completion of a DMA datagram send or receive operation.
+ * Will only get called if we are using MSI-X with exclusive vectors.
+ */
+static irqreturn_t vmci_interrupt_dma_datagram(int irq, void *_dev)
+{
+   return IRQ_HANDLED;
+}
+
 /*
  * Most of the initialization at module load time is done here.
  */
@@ -447,6 +460,7 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
struct vmci_guest_device *vmci_dev;
void __iomem *iobase = NULL;
void __iomem *mmio_base = NULL;
+   unsigned int num_irq_vectors;
unsigned int capabilities;
unsigned int caps_in_use;
unsigned long cmd;
@@ -635,8 +649,12 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
 * Enable interrupts.  Try MSI-X first, then MSI, and then fallback on
 * legacy interrupts.
 */
-   error = pci_alloc_irq_vectors(pdev, VMCI_MAX_INTRS, VMCI_MAX_INTRS,
-   PCI_IRQ_MSIX);
+   if (vmci_dev->mmio_base != NULL)
+   num_irq_vectors = VMCI_MAX_INTRS;
+   else
+   num_irq_vectors = VMCI_MAX_INTRS_NOTIFICATION;
+   error = pci_alloc_irq_vectors(pdev, num_irq_vectors, num_irq_vectors,
+ PCI_IRQ_MSIX);
if (error < 0) {
error = pci_alloc_irq_vectors(pdev, 1, 1,
PCI_IRQ_MSIX | PCI_IRQ_MSI | PCI_IRQ_LEGACY);
@@ -674,6 +692,17 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
pci_irq_vector(pdev, 1), error);
goto err_free_irq;
}
+   if (caps_in_use & VMCI_CAPS_DMA_DATAGRAM) {
+   error = request_irq(pci_irq_vector(pdev, 2),
+   vmci_interrupt_dma_datagram,
+   0, KBUILD_MODNAME, vmci_dev);
+   if (error) {
+   dev_err(>dev,
+   "Failed to allocate irq %u: %d\n",
+   pci_irq_vector(pdev, 2), error);
+   goto err_free_bm_irq;
+   }
+   }
}
 
dev_dbg(>dev, "Registered device\n");
@@ -684,6 +713,8 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
cmd = VMCI_IMR_DATAGRAM;
if (caps_in_use & VMCI_CAPS_NOTIFICATIONS)
cmd |= VMCI_IMR_NOTIFICATION;
+   if (caps_in_use & VMCI_CAPS_DMA_DATAGRAM)
+   cmd |= VMCI_IMR_DMA_DATAGRAM;
vmci_write_reg(vmci_dev, cmd, VMCI_IMR_ADDR);
 
/* Enable interrupts. */
@@ -694,6 +725,8 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
vmci_call_vsock_callback(false);
return 0;
 
+err_free_bm_irq:
+   free_irq(pci_irq_vector(pdev, 1), vmci_dev);
 err_free_irq:
free_irq(pci_irq_vector(pdev, 0), vmci_dev);
tasklet_kill(_dev->datagram_tasklet);
@@ -759,8 +792,11 @@ static void vmci_guest_remove_device(struct pci_dev *pdev)
 * MSI-X, we might have multiple vectors, each with their own
 * IRQ, which we must free too.
 */
-   if (vmci_dev->exclusive_vectors)
+   if (vmci_dev->exclusive_vectors) {
free_irq(pci_irq_vector(pdev, 1), vmci_dev);
+   if (vmci_dev->mmio_base != NULL)
+   free_irq(pci_irq_vector(pdev, 2), vmci_dev);
+   }
free_irq(pci_irq_vector(pdev, 0), vmci_dev);
pci_free_irq_vectors(pdev);
 
diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h
index 4167779469fd..2b70c024dacb 100644
--- a/include/linux/vmw_vmci_defs.h
+++ b/include/linux/vmw_vmci_defs.h
@@ -45,13 +45,22 @@
 /* Interrupt Cause register bits. */
 #define 

[PATCH v2 1/8] VMCI: dma dg: whitespace formatting change for vmci register defines

2022-02-03 Thread Jorgen Hansen
Update formatting of existing register defines in preparation for
adding additional register definitions for the VMCI device.

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 include/linux/vmw_vmci_defs.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h
index e36cb114c188..9911ecfc18ba 100644
--- a/include/linux/vmw_vmci_defs.h
+++ b/include/linux/vmw_vmci_defs.h
@@ -12,15 +12,15 @@
 #include 
 
 /* Register offsets. */
-#define VMCI_STATUS_ADDR  0x00
-#define VMCI_CONTROL_ADDR 0x04
-#define VMCI_ICR_ADDR0x08
-#define VMCI_IMR_ADDR 0x0c
-#define VMCI_DATA_OUT_ADDR0x10
-#define VMCI_DATA_IN_ADDR 0x14
-#define VMCI_CAPS_ADDR0x18
-#define VMCI_RESULT_LOW_ADDR  0x1c
-#define VMCI_RESULT_HIGH_ADDR 0x20
+#define VMCI_STATUS_ADDR0x00
+#define VMCI_CONTROL_ADDR   0x04
+#define VMCI_ICR_ADDR   0x08
+#define VMCI_IMR_ADDR   0x0c
+#define VMCI_DATA_OUT_ADDR  0x10
+#define VMCI_DATA_IN_ADDR   0x14
+#define VMCI_CAPS_ADDR  0x18
+#define VMCI_RESULT_LOW_ADDR0x1c
+#define VMCI_RESULT_HIGH_ADDR   0x20
 
 /* Max number of devices. */
 #define VMCI_MAX_DEVICES 1
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/8] VMCI: dma dg: add MMIO access to registers

2022-02-03 Thread Jorgen Hansen
Detect the support for MMIO access through examination of the length
of the region requested in BAR1. If it is 256KB, the VMCI device
supports MMIO access to registers.

If MMIO access is supported, map the area of the region used for
MMIO access (64KB size at offset 128KB).

Add wrapper functions for accessing 32 bit register accesses through
either MMIO or IO ports based on device configuration.

Sending and receiving datagrams through iowrite8_rep/ioread8_rep is
left unchanged for now, and will be addressed in a later change.

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_guest.c | 68 ++
 include/linux/vmw_vmci_defs.h  | 12 ++
 2 files changed, 62 insertions(+), 18 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
index 1018dc77269d..38ee7ed32ab9 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -45,6 +45,7 @@ static u32 vm_context_id = VMCI_INVALID_ID;
 struct vmci_guest_device {
struct device *dev; /* PCI device we are attached to */
void __iomem *iobase;
+   void __iomem *mmio_base;
 
bool exclusive_vectors;
 
@@ -89,6 +90,21 @@ u32 vmci_get_vm_context_id(void)
return vm_context_id;
 }
 
+static unsigned int vmci_read_reg(struct vmci_guest_device *dev, u32 reg)
+{
+   if (dev->mmio_base != NULL)
+   return readl(dev->mmio_base + reg);
+   return ioread32(dev->iobase + reg);
+}
+
+static void vmci_write_reg(struct vmci_guest_device *dev, u32 val, u32 reg)
+{
+   if (dev->mmio_base != NULL)
+   writel(val, dev->mmio_base + reg);
+   else
+   iowrite32(val, dev->iobase + reg);
+}
+
 /*
  * VM to hypervisor call mechanism. We use the standard VMware naming
  * convention since shared code is calling this function as well.
@@ -116,7 +132,7 @@ int vmci_send_datagram(struct vmci_datagram *dg)
if (vmci_dev_g) {
iowrite8_rep(vmci_dev_g->iobase + VMCI_DATA_OUT_ADDR,
 dg, VMCI_DG_SIZE(dg));
-   result = ioread32(vmci_dev_g->iobase + VMCI_RESULT_LOW_ADDR);
+   result = vmci_read_reg(vmci_dev_g, VMCI_RESULT_LOW_ADDR);
} else {
result = VMCI_ERROR_UNAVAILABLE;
}
@@ -384,7 +400,7 @@ static irqreturn_t vmci_interrupt(int irq, void *_dev)
unsigned int icr;
 
/* Acknowledge interrupt and determine what needs doing. */
-   icr = ioread32(dev->iobase + VMCI_ICR_ADDR);
+   icr = vmci_read_reg(dev, VMCI_ICR_ADDR);
if (icr == 0 || icr == ~0)
return IRQ_NONE;
 
@@ -429,7 +445,8 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
   const struct pci_device_id *id)
 {
struct vmci_guest_device *vmci_dev;
-   void __iomem *iobase;
+   void __iomem *iobase = NULL;
+   void __iomem *mmio_base = NULL;
unsigned int capabilities;
unsigned int caps_in_use;
unsigned long cmd;
@@ -445,16 +462,32 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
return error;
}
 
-   error = pcim_iomap_regions(pdev, 1 << 0, KBUILD_MODNAME);
-   if (error) {
-   dev_err(>dev, "Failed to reserve/map IO regions\n");
-   return error;
+   /*
+* The VMCI device with mmio access to registers requests 256KB
+* for BAR1. If present, driver will use new VMCI device
+* functionality for register access and datagram send/recv.
+*/
+
+   if (pci_resource_len(pdev, 1) == VMCI_WITH_MMIO_ACCESS_BAR_SIZE) {
+   dev_info(>dev, "MMIO register access is available\n");
+   mmio_base = pci_iomap_range(pdev, 1, VMCI_MMIO_ACCESS_OFFSET,
+   VMCI_MMIO_ACCESS_SIZE);
+   /* If the map fails, we fall back to IOIO access. */
+   if (!mmio_base)
+   dev_warn(>dev, "Failed to map MMIO register 
access\n");
}
 
-   iobase = pcim_iomap_table(pdev)[0];
+   if (!mmio_base) {
+   error = pcim_iomap_regions(pdev, BIT(0), KBUILD_MODNAME);
+   if (error) {
+   dev_err(>dev, "Failed to reserve/map IO 
regions\n");
+   return error;
+   }
+   iobase = pcim_iomap_table(pdev)[0];
+   }
 
-   dev_info(>dev, "Found VMCI PCI device at %#lx, irq %u\n",
-(unsigned long)iobase, pdev->irq);
+   dev_info(>dev, "Found VMCI PCI device at %#lx, %#lx, irq %u\n",
+(unsigned long)iobase, (unsigned long)mmio_base, pdev->irq);
 
vmci_dev = devm_kzalloc(>dev, sizeof(*vmci_dev), GFP_KERNEL);
if (!vmci_dev) {
@@ -466,6 +499,7 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,

[PATCH v2 0/8] VMCI: dma dg: Add support for DMA datagrams

2022-02-03 Thread Jorgen Hansen
A new version of the VMCI device will introduce two new major changes:
- support MMIO access to device registers
- support send/receive of datagrams using DMA transfers instead of
  ioread8_rep/iowrite8_rep operations
This patch series updates the VMCI driver to support these new
features while maintaining backwards compatibility.

The DMA based datagram operations use a send and a receive buffer
allocated at module load time. The buffer contains a header
describing the layout of the buffer followed by either an SG list or
inline data. The header also contains a flag indicating whether the
buffer is currently owned by the driver or the device. Both for send
and receive, the driver will initialize the buffer, transfer ownership
to the device by writing the buffer address to a register, and then
wait for the ownership to be transferred back. The device will
generate an interrupt when this happens.

v2 (fixes issues flagged by kernel test robot ):
- changed type of mmio_base to void __iomem *
- made vmci_read_reg, vmci_write_reg and vmci_write_data static functions

Jorgen Hansen (8):
  VMCI: dma dg: whitespace formatting change for vmci register defines
  VMCI: dma dg: add MMIO access to registers
  VMCI: dma dg: detect DMA datagram capability
  VMCI: dma dg: set OS page size
  VMCI: dma dg: register dummy IRQ handlers for DMA datagrams
  VMCI: dma dg: allocate send and receive buffers for DMA datagrams
  VMCI: dma dg: add support for DMA datagrams sends
  VMCI: dma dg: add support for DMA datagrams receive

 drivers/misc/vmw_vmci/vmci_guest.c | 340 -
 include/linux/vmw_vmci_defs.h  |  84 ++-
 2 files changed, 361 insertions(+), 63 deletions(-)

-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] virtio_blk: simplify refcounting

2022-02-03 Thread Christoph Hellwig
On Thu, Feb 03, 2022 at 09:15:53AM +, Stefan Hajnoczi wrote:
>   /* Make sure no work handler is accessing the device. */
>   flush_work(>config_work);
>   
>   del_gendisk(vblk->disk);
>   blk_cleanup_disk(vblk->disk);
> ^--- is virtblk_free_disk() called here?
>   blk_mq_free_tag_set(>tag_set);
>^--- use after free

Yeah.  We need to split up blk_cleanup_disk again for this into
separate calls to blk_cleanup_queue and put_disk..
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] iommu/iova: Separate out rcache init

2022-02-03 Thread Michael S. Tsirkin
On Thu, Feb 03, 2022 at 05:59:20PM +0800, John Garry wrote:
> Currently the rcache structures are allocated for all IOVA domains, even if
> they do not use "fast" alloc+free interface. This is wasteful of memory.
> 
> In addition, fails in init_iova_rcaches() are not handled safely, which is
> less than ideal.
> 
> Make "fast" users call a separate rcache init explicitly, which includes
> error checking.
> 
> Signed-off-by: John Garry 

virtio things:

Acked-by: Michael S. Tsirkin 

> ---
> Differences to v1:
> - Drop stubs for iova_domain_init_rcaches() and iova_domain_free_rcaches()
> - Use put_iova_domain() in vdpa code
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index d85d54f2b549..b22034975301 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -525,6 +525,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
>   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   unsigned long order, base_pfn;
>   struct iova_domain *iovad;
> + int ret;
>  
>   if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
>   return -EINVAL;
> @@ -559,6 +560,9 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
>   }
>  
>   init_iova_domain(iovad, 1UL << order, base_pfn);
> + ret = iova_domain_init_rcaches(iovad);
> + if (ret)
> + return ret;
>  
>   /* If the FQ fails we can simply fall back to strict mode */
>   if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain))
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index b28c9435b898..7e9c3a97c040 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -15,13 +15,14 @@
>  /* The anchor node sits above the top of the usable address space */
>  #define IOVA_ANCHOR  ~0UL
>  
> +#define IOVA_RANGE_CACHE_MAX_SIZE 6  /* log of max cached IOVA range size 
> (in pages) */
> +
>  static bool iova_rcache_insert(struct iova_domain *iovad,
>  unsigned long pfn,
>  unsigned long size);
>  static unsigned long iova_rcache_get(struct iova_domain *iovad,
>unsigned long size,
>unsigned long limit_pfn);
> -static void init_iova_rcaches(struct iova_domain *iovad);
>  static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
> *iovad);
>  static void free_iova_rcaches(struct iova_domain *iovad);
>  
> @@ -64,8 +65,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
> granule,
>   iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
>   rb_link_node(>anchor.node, NULL, >rbroot.rb_node);
>   rb_insert_color(>anchor.node, >rbroot);
> - cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, 
> >cpuhp_dead);
> - init_iova_rcaches(iovad);
>  }
>  EXPORT_SYMBOL_GPL(init_iova_domain);
>  
> @@ -488,6 +487,13 @@ free_iova_fast(struct iova_domain *iovad, unsigned long 
> pfn, unsigned long size)
>  }
>  EXPORT_SYMBOL_GPL(free_iova_fast);
>  
> +static void iova_domain_free_rcaches(struct iova_domain *iovad)
> +{
> + cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD,
> + >cpuhp_dead);
> + free_iova_rcaches(iovad);
> +}
> +
>  /**
>   * put_iova_domain - destroys the iova domain
>   * @iovad: - iova domain in question.
> @@ -497,9 +503,9 @@ void put_iova_domain(struct iova_domain *iovad)
>  {
>   struct iova *iova, *tmp;
>  
> - cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD,
> - >cpuhp_dead);
> - free_iova_rcaches(iovad);
> + if (iovad->rcaches)
> + iova_domain_free_rcaches(iovad);
> +
>   rbtree_postorder_for_each_entry_safe(iova, tmp, >rbroot, node)
>   free_iova_mem(iova);
>  }
> @@ -608,6 +614,7 @@ EXPORT_SYMBOL_GPL(reserve_iova);
>   */
>  
>  #define IOVA_MAG_SIZE 128
> +#define MAX_GLOBAL_MAGS 32   /* magazines per bin */
>  
>  struct iova_magazine {
>   unsigned long size;
> @@ -620,6 +627,13 @@ struct iova_cpu_rcache {
>   struct iova_magazine *prev;
>  };
>  
> +struct iova_rcache {
> + spinlock_t lock;
> + unsigned long depot_size;
> + struct iova_magazine *depot[MAX_GLOBAL_MAGS];
> + struct iova_cpu_rcache __percpu *cpu_rcaches;
> +};
> +
>  static struct iova_magazine *iova_magazine_alloc(gfp_t flags)
>  {
>   return kzalloc(sizeof(struct iova_magazine), flags);
> @@ -693,28 +707,54 @@ static void iova_magazine_push(struct iova_magazine 
> *mag, unsigned long pfn)
>   mag->pfns[mag->size++] = pfn;
>  }
>  
> -static void init_iova_rcaches(struct iova_domain *iovad)
> +int iova_domain_init_rcaches(struct iova_domain *iovad)
>  {
> - struct iova_cpu_rcache *cpu_rcache;
> - struct iova_rcache *rcache;
>   unsigned int cpu;
> - int i;
> + int i, ret;
> +
> + iovad->rcaches = 

Re: [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value

2022-02-03 Thread Michael S. Tsirkin
On Thu, Feb 03, 2022 at 10:51:19AM +, Cristian Marussi wrote:
> On Tue, Feb 01, 2022 at 01:27:38PM -0500, Michael S. Tsirkin wrote:
> > Looks correct, thanks. Some minor comments below:
> > 
> 
> Hi Michael,
> 
> thanks for the feedback.
> 
> > On Tue, Feb 01, 2022 at 05:15:55PM +, Cristian Marussi wrote:
> > > Exported API virtqueue_poll() can be used to support polling mode 
> > > operation
> > > on top of virtio layer if needed; currently the parameter last_used_idx is
> > > the opaque value that needs to be passed to the virtqueue_poll() function
> > > to check if there are new pending used buffers in the queue: such opaque
> > > value would have been previously obtained by a call to the API function
> > > virtqueue_enable_cb_prepare().
> > > 
> > > Since such opaque value is indeed containing simply a snapshot in time of
> > > the internal
> > 
> > to add: 16 bit
> > 
> > > last_used_index (roughly), it is possible that,
> > 
> > to add here: 
> > 
> > if another thread calls virtqueue_add_*()
> > at the same time (which existing drivers don't do,
> > but does not seem to be documented as prohibited anywhere), and
> > 
> > > if exactly
> > > 2**16 buffers are marked as used between two successive calls to
> > > virtqueue_poll(), the caller is fooled into thinking that nothing is
> > > pending (ABA problem).
> > > Keep a full fledged internal wraps counter
> > 
> > s/full fledged/a 16 bit/
> > 
> > since I don't see why is a 16 bit counter full but not e.g. a 32 bit one
> > 
> .. :D I wanted to stress the fact that this being a 16bits counter has a
> higher rollover than a 1-bit one wrap_counter already used...but indeed
> all are just counters at the end, it's justthe wrapround that changes...
> 
> I'll fix.
> 
> > > per virtqueue and embed it into
> > > the upper 16bits of the returned opaque value, so that the above scenario
> > > can be detected transparently by virtqueue_poll(): this way each single
> > > possible last_used_idx value is really belonging to a different wrap.
> > 
> > Just to add here: the ABA problem can in theory still happen but
> > now that's after 2^32 requests, which seems sufficient in practice.
> > 
> 
> Sure, I'll fix the commit message as above advised.
> 
> > > Cc: "Michael S. Tsirkin" 
> > > Cc: Igor Skalkin 
> > > Cc: Peter Hilber 
> > > Cc: virtualization@lists.linux-foundation.org
> > > Signed-off-by: Cristian Marussi 
> > > ---
> > > Still no perf data on this, I was wondering what exactly to measure in
> > > term of perf metrics to evaluate the impact of the rolling vq->wraps
> > > counter.
> > > ---
> > >  drivers/virtio/virtio_ring.c | 51 +---
> > >  1 file changed, 47 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 00f64f2f8b72..613ec0503509 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -12,6 +12,8 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > > +#include 
> > >  #include 
> > >  
> > >  static bool force_used_validation = false;
> > > @@ -69,6 +71,17 @@ module_param(force_used_validation, bool, 0444);
> > >  #define LAST_ADD_TIME_INVALID(vq)
> > >  #endif
> > >  
> > > +#define VRING_IDX_MASK   GENMASK(15, 0)
> > > +#define VRING_GET_IDX(opaque)\
> > > + ((u16)FIELD_GET(VRING_IDX_MASK, (opaque)))
> > > +
> > > +#define VRING_WRAPS_MASK GENMASK(31, 16)
> > > +#define VRING_GET_WRAPS(opaque)  \
> > > + ((u16)FIELD_GET(VRING_WRAPS_MASK, (opaque)))
> > > +
> > > +#define VRING_BUILD_OPAQUE(idx, wraps)   \
> > > + (FIELD_PREP(VRING_WRAPS_MASK, (wraps)) | ((idx) & VRING_IDX_MASK))
> > > +
> > 
> > Maybe prefix with VRING_POLL_  since that is the only user.
> > 
> 
> I'll do.
> 
> > 
> > >  struct vring_desc_state_split {
> > >   void *data; /* Data for callback. */
> > >   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > > @@ -117,6 +130,8 @@ struct vring_virtqueue {
> > >   /* Last used index we've seen. */
> > >   u16 last_used_idx;
> > >  
> > > + u16 wraps;
> > > +
> > >   /* Hint for event idx: already triggered no need to disable. */
> > >   bool event_triggered;
> > >  
> > > @@ -806,6 +821,8 @@ static void *virtqueue_get_buf_ctx_split(struct 
> > > virtqueue *_vq,
> > >   ret = vq->split.desc_state[i].data;
> > >   detach_buf_split(vq, i, ctx);
> > >   vq->last_used_idx++;
> > > + if (unlikely(!vq->last_used_idx))
> > > + vq->wraps++;
> > >   /* If we expect an interrupt for the next entry, tell host
> > >* by writing event index and flush out the write before
> > >* the read in the next get_buf call. */
> > 
> > So most drivers don't call virtqueue_poll.
> > Concerned about the overhead here: another option is
> > with a flag that will have to be set whenever a driver
> > wants to 

Re: [PATCH 2/5] virtio_blk: simplify refcounting

2022-02-03 Thread Stefan Hajnoczi
On Wed, Feb 02, 2022 at 04:56:56PM +0100, Christoph Hellwig wrote:
> @@ -985,8 +947,6 @@ static void virtblk_remove(struct virtio_device *vdev)
>   kfree(vblk->vqs);
>  
>   mutex_unlock(>vdev_mutex);
> -
> - virtblk_put(vblk);
>  }

Thank you, this is a nice cleanup! One question:

File systems are unmounted and block devices are not open. PCI hot
unplug calls virtblk_remove(). It looks vblk is used after being freed
by virtblk_free_disk() halfway through virtblk_remove()?

  static void virtblk_remove(struct virtio_device *vdev)
  {
  struct virtio_blk *vblk = vdev->priv;
  
  /* Make sure no work handler is accessing the device. */
  flush_work(>config_work);
  
  del_gendisk(vblk->disk);
  blk_cleanup_disk(vblk->disk);
  ^--- is virtblk_free_disk() called here?
  blk_mq_free_tag_set(>tag_set);
 ^--- use after free
  
  mutex_lock(>vdev_mutex);
  
  /* Stop all the virtqueues. */
  virtio_reset_device(vdev);
  
  /* Virtqueues are stopped, nothing can use vblk->vdev anymore. */
  vblk->vdev = NULL;
  
  vdev->config->del_vqs(vdev);
  kfree(vblk->vqs);
  
  mutex_unlock(>vdev_mutex);
  }

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization