Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-17 Thread Jason Wang


On 2020/9/16 下午7:47, Kishon Vijay Abraham I wrote:

Hi Jason,

On 16/09/20 8:40 am, Jason Wang wrote:

On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:

Hi Jason,

On 15/09/20 1:48 pm, Jason Wang wrote:

Hi Kishon:

On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:

Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides
alternatives if
the queue_sel is hard in the EP implementation).

Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).

struct vdpa_config_ops {
  /* Virtqueue ops */
  int (*set_vq_address)(struct vdpa_device *vdev,
    u16 idx, u64 desc_area, u64 driver_area,
    u64 device_area);
  void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
  void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
  void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
    struct vdpa_callback *cb);
  void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool
ready);
  bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
  int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
  const struct vdpa_vq_state *state);
  int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
  struct vdpa_vq_state *state);
  struct vdpa_notification_area
  (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
  /* vq irq is not expected to be changed once DRIVER_OK is set */
  int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);

  /* Device ops */
  u32 (*get_vq_align)(struct vdpa_device *vdev);
  u64 (*get_features)(struct vdpa_device *vdev);
  int (*set_features)(struct vdpa_device *vdev, u64 features);
  void (*set_config_cb)(struct vdpa_device *vdev,
    struct vdpa_callback *cb);
  u16 (*get_vq_num_max)(struct vdpa_device *vdev);
  u32 (*get_device_id)(struct vdpa_device *vdev);
  u32 (*get_vendor_id)(struct vdpa_device *vdev);
  u8 (*get_status)(struct vdpa_device *vdev);
  void (*set_status)(struct vdpa_device *vdev, u8 status);
  void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
     void *buf, unsigned int len);
  void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
     const void *buf, unsigned int len);
  u32 (*get_generation)(struct vdpa_device *vdev);

  /* DMA ops */
  int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb
*iotlb);
  int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
     u64 pa, u32 perm);
  int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

  /* Free device resources */
  void (*free)(struct vdpa_device *vdev);
};

+struct vhost_config_ops {
+    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+  unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+  vhost_vq_callback_t *callbacks[],
+  const char * const names[]);
+    void (*del_vqs)(struct vhost_dev *vdev);
+    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
int len);
+    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
len);
+    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+    int (*set_status)(struct vhost_dev *vdev, u8 status);
+    u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.

I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].

Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not
directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.

I think vdpa always works with client drivers in userspace and
providing
userspace address for vring.

Sorry for being unclear. What I meant is not replacing vDPA with the
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
with vDPA in:

Okay, so the virtio back-end still use vhost and front end

Re: [vhost next 0/2] mlx5 vdpa fix netdev status

2020-09-17 Thread Jason Wang


On 2020/9/17 下午8:13, Eli Cohen wrote:

Hi Michael,

the following two patches aim to fix a failure to set the vdpa driver
status bit VIRTIO_NET_S_LINK_UP thus causing failure to bring the link
up. I break it to two patches:

1. Introduce proper mlx5 API to set 16 bit status fields per virtio
requirements.
2. Fix the failure to set the bit

Eli Cohen (2):
   vdpa/mlx5: Make use of a specific 16 bit endianness API
   vdpa/mlx5: Fix failure to bring link up

  drivers/vdpa/mlx5/net/mlx5_vnet.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 -next] vdpa: mlx5: change Kconfig depends to fix build errors

2020-09-17 Thread Jason Wang


On 2020/9/18 上午3:45, Randy Dunlap wrote:

From: Randy Dunlap 

drivers/vdpa/mlx5/ uses vhost_iotlb*() interfaces, so add a dependency
on VHOST to eliminate build errors.

ld: drivers/vdpa/mlx5/core/mr.o: in function `add_direct_chain':
mr.c:(.text+0x106): undefined reference to `vhost_iotlb_itree_first'
ld: mr.c:(.text+0x1cf): undefined reference to `vhost_iotlb_itree_next'
ld: mr.c:(.text+0x30d): undefined reference to `vhost_iotlb_itree_first'
ld: mr.c:(.text+0x3e8): undefined reference to `vhost_iotlb_itree_next'
ld: drivers/vdpa/mlx5/core/mr.o: in function `_mlx5_vdpa_create_mr':
mr.c:(.text+0x908): undefined reference to `vhost_iotlb_itree_first'
ld: mr.c:(.text+0x9e6): undefined reference to `vhost_iotlb_itree_next'
ld: drivers/vdpa/mlx5/core/mr.o: in function `mlx5_vdpa_handle_set_map':
mr.c:(.text+0xf1d): undefined reference to `vhost_iotlb_itree_first'

Signed-off-by: Randy Dunlap 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: virtualization@lists.linux-foundation.org
Cc: Saeed Mahameed 
Cc: Leon Romanovsky 
Cc: net...@vger.kernel.org
---
v2: change from select to depends (Saeed)

  drivers/vdpa/Kconfig |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next-20200917.orig/drivers/vdpa/Kconfig
+++ linux-next-20200917/drivers/vdpa/Kconfig
@@ -31,7 +31,7 @@ config IFCVF
  
  config MLX5_VDPA

bool "MLX5 VDPA support library for ConnectX devices"
-   depends on MLX5_CORE
+   depends on VHOST && MLX5_CORE



It looks to me that depending on VHOST is too heavyweight.

I guess what it really needs is VHOST_IOTLB. So we can use select 
VHOST_IOTLB here.


Thanks



default n
help
  Support library for Mellanox VDPA drivers. Provides code that is



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-15 Thread Jason Wang


On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:

Hi Jason,

On 15/09/20 1:48 pm, Jason Wang wrote:

Hi Kishon:

On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:

Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).

Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).

struct vdpa_config_ops {
 /* Virtqueue ops */
 int (*set_vq_address)(struct vdpa_device *vdev,
   u16 idx, u64 desc_area, u64 driver_area,
   u64 device_area);
 void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
 void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
 void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
   struct vdpa_callback *cb);
 void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
 bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
 int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
     const struct vdpa_vq_state *state);
 int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
     struct vdpa_vq_state *state);
 struct vdpa_notification_area
 (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
 /* vq irq is not expected to be changed once DRIVER_OK is set */
 int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);

 /* Device ops */
 u32 (*get_vq_align)(struct vdpa_device *vdev);
 u64 (*get_features)(struct vdpa_device *vdev);
 int (*set_features)(struct vdpa_device *vdev, u64 features);
 void (*set_config_cb)(struct vdpa_device *vdev,
   struct vdpa_callback *cb);
 u16 (*get_vq_num_max)(struct vdpa_device *vdev);
 u32 (*get_device_id)(struct vdpa_device *vdev);
 u32 (*get_vendor_id)(struct vdpa_device *vdev);
 u8 (*get_status)(struct vdpa_device *vdev);
 void (*set_status)(struct vdpa_device *vdev, u8 status);
 void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
    void *buf, unsigned int len);
 void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
    const void *buf, unsigned int len);
 u32 (*get_generation)(struct vdpa_device *vdev);

 /* DMA ops */
 int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
 int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
    u64 pa, u32 perm);
 int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

 /* Free device resources */
 void (*free)(struct vdpa_device *vdev);
};

+struct vhost_config_ops {
+    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+  unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+  vhost_vq_callback_t *callbacks[],
+  const char * const names[]);
+    void (*del_vqs)(struct vhost_dev *vdev);
+    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
int len);
+    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
len);
+    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+    int (*set_status)(struct vhost_dev *vdev, u8 status);
+    u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.

I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].

Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.

I think vdpa always works with client drivers in userspace and providing
userspace address for vring.


Sorry for being unclear. What I meant is not replacing vDPA with the
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
with vDPA in:

Okay, so the virtio back-end still use vhost and front end should use
vDPA. I see. So the host side PCI driver for EPF should populate
vdpa_config_ops and invoke vdpa_register_device().



Yes.



My

Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-15 Thread Jason Wang

Hi Kishon:

On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:

Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).

Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).

struct vdpa_config_ops {
/* Virtqueue ops */
int (*set_vq_address)(struct vdpa_device *vdev,
  u16 idx, u64 desc_area, u64 driver_area,
  u64 device_area);
void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
  struct vdpa_callback *cb);
void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
const struct vdpa_vq_state *state);
int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
struct vdpa_vq_state *state);
struct vdpa_notification_area
(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
/* vq irq is not expected to be changed once DRIVER_OK is set */
int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);

/* Device ops */
u32 (*get_vq_align)(struct vdpa_device *vdev);
u64 (*get_features)(struct vdpa_device *vdev);
int (*set_features)(struct vdpa_device *vdev, u64 features);
void (*set_config_cb)(struct vdpa_device *vdev,
  struct vdpa_callback *cb);
u16 (*get_vq_num_max)(struct vdpa_device *vdev);
u32 (*get_device_id)(struct vdpa_device *vdev);
u32 (*get_vendor_id)(struct vdpa_device *vdev);
u8 (*get_status)(struct vdpa_device *vdev);
void (*set_status)(struct vdpa_device *vdev, u8 status);
void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
   void *buf, unsigned int len);
void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
   const void *buf, unsigned int len);
u32 (*get_generation)(struct vdpa_device *vdev);

/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
   u64 pa, u32 perm);
int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

/* Free device resources */
void (*free)(struct vdpa_device *vdev);
};

+struct vhost_config_ops {
+   int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[]);
+   void (*del_vqs)(struct vhost_dev *vdev);
+   int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len);
+   int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len);
+   int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+   int (*set_status)(struct vhost_dev *vdev, u8 status);
+   u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.

I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].

Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.

I think vdpa always works with client drivers in userspace and providing
userspace address for vring.



Sorry for being unclear. What I meant is not replacing vDPA with the 
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf 
with vDPA in:


My question is basically for the part of 

Re: [PATCH] vhost: reduce stack usage in log_used

2020-09-15 Thread Jason Wang


On 2020/9/15 上午2:08, Li Wang wrote:

Fix the warning: [-Werror=-Wframe-larger-than=]

drivers/vhost/vhost.c: In function log_used:
drivers/vhost/vhost.c:1906:1:
warning: the frame size of 1040 bytes is larger than 1024 bytes

Signed-off-by: Li Wang 
---
  drivers/vhost/vhost.c | 2 +-
  drivers/vhost/vhost.h | 1 +
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index b45519c..31837a5
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1884,7 +1884,7 @@ static int log_write_hva(struct vhost_virtqueue *vq, u64 
hva, u64 len)
  
  static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)

  {
-   struct iovec iov[64];
+   struct iovec *iov = vq->log_iov;
int i, ret;
  
  	if (!vq->iotlb)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 9032d3c..5fe4b47
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -123,6 +123,7 @@ struct vhost_virtqueue {
/* Log write descriptors */
void __user *log_base;
struct vhost_log *log;
+   struct iovec log_iov[64];
  
  	/* Ring endianness. Defaults to legacy native endianness.

 * Set to true when starting a modern virtio device. */



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost_vdpa: Fix duplicate included kernel.h

2020-09-15 Thread Jason Wang


On 2020/9/15 上午8:51, Tian Tao wrote:

linux/kernel.h is included more than once, Remove the one that isn't
necessary.

Signed-off-by: Tian Tao 
---
  drivers/vhost/vdpa.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f..95e2b83 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,7 +22,6 @@
  #include 
  #include 
  #include 
-#include 
  
  #include "vhost.h"
  



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] i2c: virtio: add a virtio i2c frontend driver

2020-09-13 Thread Jason Wang


On 2020/9/11 上午11:48, Jie Deng wrote:

Add an I2C bus driver for virtio para-virtualization.

The controller can be emulated by the backend driver in
any device model software by following the virtio protocol.

This driver communicates with the backend driver through a
virtio I2C message structure which includes following parts:

- Header: i2c_msg addr, flags, len.
- Data buffer: the pointer to the I2C msg data.
- Status: the processing result from the backend.

People may implement different backend drivers to emulate
different controllers according to their needs. A backend
example can be found in the device model of the open source
project ACRN. For more information, please refer to
https://projectacrn.org.

The virtio device ID 34 is used for this I2C adpter since IDs
before 34 have been reserved by other virtio devices.

Co-developed-by: Conghui Chen 
Signed-off-by: Conghui Chen 
Signed-off-by: Jie Deng 
Reviewed-by: Shuo Liu 
Reviewed-by: Andy Shevchenko 
---
The device ID request:
https://github.com/oasis-tcs/virtio-spec/issues/85

Changes in v2:
- Addressed comments received from Michael, Andy and Jason.

  drivers/i2c/busses/Kconfig  |  11 ++
  drivers/i2c/busses/Makefile |   3 +
  drivers/i2c/busses/i2c-virtio.c | 271 
  include/uapi/linux/virtio_ids.h |   1 +
  4 files changed, 286 insertions(+)
  create mode 100644 drivers/i2c/busses/i2c-virtio.c

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 293e7a0..70c8e30 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
  This driver can also be built as a module.  If so, the module
  will be called i2c-ali1535.
  
+config I2C_VIRTIO

+   tristate "Virtio I2C Adapter"
+   depends on VIRTIO
+   help
+ If you say yes to this option, support will be included for the virtio
+ i2c adapter driver. The hardware can be emulated by any device model
+ software according to the virtio protocol.
+
+ This driver can also be built as a module. If so, the module
+ will be called i2c-virtio.
+
  config I2C_ALI1563
tristate "ALI 1563"
depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 19aff0e..821acfa 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o
  
+# VIRTIO I2C host controller driver

+obj-$(CONFIG_I2C_VIRTIO)   += i2c-virtio.o
+
  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
new file mode 100644
index 000..aff1a9a
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_I2C_MSG_OK  0
+#define VIRTIO_I2C_MSG_ERR 1
+
+/**
+ * struct virtio_i2c_hdr - the virtio I2C message header structure
+ * @addr: i2c_msg addr, the slave address
+ * @flags: i2c_msg flags
+ * @len: i2c_msg len
+ */
+struct virtio_i2c_hdr {
+   __le16 addr;
+   __le16 flags;
+   __le16 len;
+};



As said in v1, this should belong to uapi.



+
+/**
+ * struct virtio_i2c_msg - the virtio I2C message structure
+ * @hdr: the virtio I2C message header
+ * @buf: virtio I2C message data buffer
+ * @status: the processing result from the backend
+ */
+struct virtio_i2c_msg {
+   struct virtio_i2c_hdr hdr;
+   u8 *buf;
+   u8 status;
+};



I'm not quite sure this is the best layout.

E.g virtio scsi differ in buffer out of out one:

structvirtio_scsi_req_cmd{
...
u8 dataout[];
...
u8 datain[];

}

And I would like to have a look at the spec patch.

Thanks



+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @vmsg: the virtio I2C message for communication
+ * @adap: I2C adapter for this controller
+ * @i2c_lock: lock for virtqueue processing
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+   struct virtio_device *vdev;
+   struct completion completion;
+   struct virtio_i2c_msg vmsg;
+   struct i2c_adapter adap;
+   struct mutex i2c_lock;
+   struct virtqueue *vq;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+   struct virtio_i2c *vi = vq->vdev->priv;
+
+   complete(>completion);
+}
+
+static int virtio_i2c_add_msg(struct virtqueue *vq,
+ struct virtio_i2c_msg *vmsg,
+ 

Re: [PATCH] vhost: reduce stack usage in log_used

2020-09-13 Thread Jason Wang



- Original Message -
> Fix the warning: [-Werror=-Wframe-larger-than=]
> 
> drivers/vhost/vhost.c: In function log_used:
> drivers/vhost/vhost.c:1906:1:
> warning: the frame size of 1040 bytes is larger than 1024 bytes
> 
> Signed-off-by: Li Wang 
> ---
>  drivers/vhost/vhost.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index b45519c..41769de 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1884,25 +1884,31 @@ static int log_write_hva(struct vhost_virtqueue *vq,
> u64 hva, u64 len)
>  
>  static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
>  {
> - struct iovec iov[64];
> + struct iovec *iov;
>   int i, ret;
>  
>   if (!vq->iotlb)
>   return log_write(vq->log_base, vq->log_addr + used_offset, len);
>  
> + iov = kcalloc(64, sizeof(*iov), GFP_KERNEL);
> + if (!iov)
> + return -ENOMEM;

Let's preallocate it in e.g vhost_net_open().

We don't want to fail the log due to -ENOMEM.

Thanks

> +
>   ret = translate_desc(vq, (uintptr_t)vq->used + used_offset,
>len, iov, 64, VHOST_ACCESS_WO);
>   if (ret < 0)
> - return ret;
> + goto out;
>  
>   for (i = 0; i < ret; i++) {
>   ret = log_write_hva(vq, (uintptr_t)iov[i].iov_base,
>   iov[i].iov_len);
>   if (ret)
> - return ret;
> + goto out;
>   }
>  
> - return 0;
> +out:
> + kfree(iov);
> + return ret;
>  }
>  
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> --
> 2.7.4
> 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost-vdpa: fix memory leak in error path

2020-09-09 Thread Jason Wang


On 2020/9/9 下午11:41, Li Qiang wrote:

Free the 'page_list' when the 'npages' is zero.

Signed-off-by: Li Qiang 
---
  drivers/vhost/vdpa.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f88894..6a9fcaf1831d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -609,8 +609,10 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
gup_flags |= FOLL_WRITE;
  
  	npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;

-   if (!npages)
-   return -EINVAL;
+   if (!npages) {
+   ret = -EINVAL;
+   goto free_page;
+   }
  
  	mmap_read_lock(dev->mm);
  
@@ -666,6 +668,8 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,

atomic64_sub(npages, >mm->pinned_vm);
}
mmap_read_unlock(dev->mm);
+
+free_page:
free_page((unsigned long)page_list);
return ret;
  }



Cc: sta...@vger.kernel.org

Acked-by: Jason Wang 



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call

2020-09-09 Thread Jason Wang


On 2020/9/9 下午2:52, Zhu Lingshan wrote:

This commit removed unnecessary spin_locks in vhost_vring_call
and related operations. Because we manipulate irq offloading
contents in vhost_vdpa ioctl code path which is already
protected by dev mutex and vq mutex.

Signed-off-by: Zhu Lingshan 



Acked-by: Jason Wang 



---
  drivers/vhost/vdpa.c  | 8 +---
  drivers/vhost/vhost.c | 3 ---
  drivers/vhost/vhost.h | 1 -
  3 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f88894..bc679d0b7b87 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -97,26 +97,20 @@ static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, 
u16 qid)
return;
  
  	irq = ops->get_vq_irq(vdpa, qid);

-   spin_lock(>call_ctx.ctx_lock);
irq_bypass_unregister_producer(>call_ctx.producer);
-   if (!vq->call_ctx.ctx || irq < 0) {
-   spin_unlock(>call_ctx.ctx_lock);
+   if (!vq->call_ctx.ctx || irq < 0)
return;
-   }
  
  	vq->call_ctx.producer.token = vq->call_ctx.ctx;

vq->call_ctx.producer.irq = irq;
ret = irq_bypass_register_producer(>call_ctx.producer);
-   spin_unlock(>call_ctx.ctx_lock);
  }
  
  static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)

  {
struct vhost_virtqueue *vq = >vqs[qid];
  
-	spin_lock(>call_ctx.ctx_lock);

irq_bypass_unregister_producer(>call_ctx.producer);
-   spin_unlock(>call_ctx.ctx_lock);
  }
  
  static void vhost_vdpa_reset(struct vhost_vdpa *v)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index b45519ca66a7..99f27ce982da 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -302,7 +302,6 @@ static void vhost_vring_call_reset(struct vhost_vring_call 
*call_ctx)
  {
call_ctx->ctx = NULL;
memset(_ctx->producer, 0x0, sizeof(struct irq_bypass_producer));
-   spin_lock_init(_ctx->ctx_lock);
  }
  
  static void vhost_vq_reset(struct vhost_dev *dev,

@@ -1637,9 +1636,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
break;
}
  
-		spin_lock(>call_ctx.ctx_lock);

swap(ctx, vq->call_ctx.ctx);
-   spin_unlock(>call_ctx.ctx_lock);
break;
case VHOST_SET_VRING_ERR:
if (copy_from_user(, argp, sizeof f)) {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 9032d3c2a9f4..486dcf371e06 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -64,7 +64,6 @@ enum vhost_uaddr_type {
  struct vhost_vring_call {
struct eventfd_ctx *ctx;
struct irq_bypass_producer producer;
-   spinlock_t ctx_lock;
  };
  
  /* The virtqueue structure describes a queue attached to a device. */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] i2c: virtio: add a virtio i2c frontend driver

2020-09-09 Thread Jason Wang


On 2020/9/3 下午1:34, Jie Deng wrote:

--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o
  
+# VIRTIO I2C host controller driver

+obj-$(CONFIG_I2C_VIRTIO)   += i2c-virtio.o
+
  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
new file mode 100644
index 000..47f9fd1
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_I2C_MSG_OK  0
+#define VIRTIO_I2C_MSG_ERR 1
+
+/**
+ * struct virtio_i2c_hdr - the virtio I2C message header structure
+ * @addr: i2c_msg addr, the slave address
+ * @flags: i2c_msg flags
+ * @len: i2c_msg len
+ */
+struct virtio_i2c_hdr {
+   __virtio16 addr;
+   __virtio16 flags;
+   __virtio16 len;
+} __packed;



Btw, this part should belong to uAPI, and you need to define the status 
in uAPI.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] i2c: virtio: add a virtio i2c frontend driver

2020-09-09 Thread Jason Wang


On 2020/9/8 上午9:40, Jie Deng wrote:



On 2020/9/7 13:40, Jason Wang wrote:









+struct virtio_i2c_msg {
+    struct virtio_i2c_hdr hdr;
+    char *buf;
+    u8 status;



Any reason for separating status out of virtio_i2c_hdr?

The status is not from i2c_msg. 



You meant ic2_hdr? You embed status in virtio_i2c_msg anyway.



The "i2c_msg" structure defined in i2c.h.


So I put it out of virtio_i2c_hdr.



Something like status or response is pretty common in virtio request 
(e.g net or scsi), if no special reason, it's better to keep it in 
the hdr.



Mainly based on IN or OUT.

The addr, flags and len are from "i2c_msg". They are put in one 
structure as an OUT**scatterlist.

The buf can be an OUT**or an IN scatterlist depending on write or read.
The status is a result from the backend  which is defined as an IN 
scatterlis. 



Ok. I get this.

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-09 Thread Jason Wang


On 2020/9/9 上午12:37, Cornelia Huck wrote:

Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).

It seems I really need to read up on vDPA more... do you have a pointer
for diving into this alternatives aspect?



See vpda_config_ops in include/linux/vdpa.h

Especially this part:

    int (*set_vq_address)(struct vdpa_device *vdev,
              u16 idx, u64 desc_area, u64 driver_area,
              u64 device_area);

This means for the devices (e.g endpoint device) that is hard to 
implement virtio-pci layout, it can use any other register layout or 
vendor specific way to configure the virtqueue.






"Virtio Over NTB" should anyways be a new transport.

Does that make any sense?

yeah, in the approach I used the initial features are hard-coded in
vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
layer (vhost only for accessing virtio ring and use virtio drivers on
both front end and backend), based on the functionality (e.g, rpmsg),
the vhost should be configured with features (to be presented to the
virtio) and that's why additional layer or APIs will be required.

A question here, if we go with vhost bus approach, does it mean the
virtio device can only be implemented in EP's userspace?

Can we maybe implement an alternative bus as well that would allow us
to support different virtio device implementations (in addition to the
vhost bus + userspace combination)?



That should be fine, but I'm not quite sure that implementing the device 
in kerne (kthread) is the good approach.


Thanks






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: new vhost_vdpa SET/GET_BACKEND_FEATURES handlers

2020-09-08 Thread Jason Wang



- Original Message -
> This commit introduced vhost_vdpa_set/get_backend_features() to
> resolve these issues:
> (1)In vhost_vdpa ioctl SET_BACKEND_FEATURES path, currect code
> would try to acquire vhost dev mutex twice
> (first shown in vhost_vdpa_unlocked_ioctl), which can lead
> to a dead lock issue.
> (2)SET_BACKEND_FEATURES was blindly added to vring ioctl instead
> of vdpa device ioctl
> 
> To resolve these issues, this commit (1)removed mutex operations
> in vhost_set_backend_features. (2)Handle ioctl
> SET/GET_BACKEND_FEATURES in vdpa ioctl. (3)introduce a new
> function vhost_net_set_backend_features() for vhost_net,
> which is a wrap of vhost_set_backend_features() with
> necessary mutex lockings.
> 
> Signed-off-by: Zhu Lingshan 

So this patch touches not only vhost-vDPA.

Though the function looks correct, I'd prefer you do cleanup on top of
my patches which has been tested by Eli.

Note that, vhost generic handlers tend to hold mutex by itself.

So we probably need more thought on this.

Thanks

> ---
>  drivers/vhost/net.c   |  9 -
>  drivers/vhost/vdpa.c  | 47 ++-
>  drivers/vhost/vhost.c |  2 --
>  3 files changed, 41 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 531a00d703cd..e01da77538c8 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1679,6 +1679,13 @@ static long vhost_net_set_owner(struct vhost_net *n)
>   return r;
>  }
>  
> +static void vhost_net_set_backend_features(struct vhost_dev *dev, u64
> features)
> +{
> + mutex_lock(>mutex);
> + vhost_set_backend_features(dev, features);
> + mutex_unlock(>mutex);
> +}
> +
>  static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>   unsigned long arg)
>  {
> @@ -1715,7 +1722,7 @@ static long vhost_net_ioctl(struct file *f, unsigned
> int ioctl,
>   return -EFAULT;
>   if (features & ~VHOST_NET_BACKEND_FEATURES)
>   return -EOPNOTSUPP;
> - vhost_set_backend_features(>dev, features);
> + vhost_net_set_backend_features(>dev, features);
>   return 0;
>   case VHOST_RESET_OWNER:
>   return vhost_net_reset_owner(n);
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 3fab94f88894..ade33c566a81 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -344,6 +344,33 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa
> *v, u32 __user *argp)
>   return 0;
>  }
>  
> +
> +static long vhost_vdpa_get_backend_features(void __user *argp)
> +{
> + u64 features = VHOST_VDPA_BACKEND_FEATURES;
> + u64 __user *featurep = argp;
> + long r;
> +
> + r = copy_to_user(featurep, , sizeof(features));
> +
> + return r;
> +}
> +static long vhost_vdpa_set_backend_features(struct vhost_vdpa *v, void
> __user *argp)
> +{
> + u64 __user *featurep = argp;
> + u64 features;
> +
> + if (copy_from_user(, featurep, sizeof(features)))
> + return -EFAULT;
> +
> + if (features & ~VHOST_VDPA_BACKEND_FEATURES)
> + return -EOPNOTSUPP;
> +
> + vhost_set_backend_features(>vdev, features);
> +
> + return 0;
> +}
> +
>  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>  void __user *argp)
>  {
> @@ -353,8 +380,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v,
> unsigned int cmd,
>   struct vdpa_callback cb;
>   struct vhost_virtqueue *vq;
>   struct vhost_vring_state s;
> - u64 __user *featurep = argp;
> - u64 features;
>   u32 idx;
>   long r;
>  
> @@ -381,18 +406,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v,
> unsigned int cmd,
>  
>   vq->last_avail_idx = vq_state.avail_index;
>   break;
> - case VHOST_GET_BACKEND_FEATURES:
> - features = VHOST_VDPA_BACKEND_FEATURES;
> - if (copy_to_user(featurep, , sizeof(features)))
> - return -EFAULT;
> - return 0;
> - case VHOST_SET_BACKEND_FEATURES:
> - if (copy_from_user(, featurep, sizeof(features)))
> - return -EFAULT;
> - if (features & ~VHOST_VDPA_BACKEND_FEATURES)
> - return -EOPNOTSUPP;
> - vhost_set_backend_features(>vdev, features);
> - return 0;
>   }
>  
>   r = vhost_vring_ioctl(>vdev, cmd, argp);
> @@ -476,6 +489,12 @@ static long vhost_vdpa_unlocked_ioctl(struct file
> *filep,
>   case VHOST_VDPA_SET_CONFIG_CALL:
>   r = vhost_vdpa_set_config_call(v, argp);
>   break;
> + case VHOST_SET_BACKEND_FEATURES:
> + r = vhost_vdpa_set_backend_features(v, argp);
> + break;
> + case VHOST_GET_BACKEND_FEATURES:
> + r = vhost_vdpa_get_backend_features(argp);
> + break;
>   

Re: [PATCH v2] vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK

2020-09-08 Thread Jason Wang



- Original Message -
> set_map() is used by mlx5 vdpa to create a memory region based on the
> address map passed by the iotlb argument. If we get successive calls, we
> will destroy the current memory region and build another one based on
> the new address mapping. We also need to setup the hardware resources
> since they depend on the memory region.
> 
> If these calls happen before DRIVER_OK, It means that driver VQs may
> also not been setup and we may not create them yet. In this case we want
> to avoid setting up the other resources and defer this till we get
> DRIVER OK.
> 
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Signed-off-by: Eli Cohen 
> ---
> V1->V2: Improve changelog description
> 
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 9df69d5efe8c..c89cd48a0aab 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1645,6 +1645,9 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_net
> *ndev, struct vhost_iotlb *
>   if (err)
>   goto err_mr;
>  
> + if (!(ndev->mvdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
> + return 0;
> +

Is there any reason that we still need to do vq suspending and saving before?

Thanks

>   restore_channels_info(ndev);
>   err = setup_driver(ndev);
>   if (err)
> --
> 2.26.0
> 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK

2020-09-08 Thread Jason Wang



- Original Message -
> On Mon, Sep 07, 2020 at 06:53:23AM -0400, Jason Wang wrote:
> > 
> > 
> > - Original Message -
> > > If the memory map changes before the driver status is
> > > VIRTIO_CONFIG_S_DRIVER_OK, don't attempt to create resources because it
> > > may fail. For example, if the VQ is not ready there is no point in
> > > creating resources.
> > > 
> > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5
> > > devices")
> > > Signed-off-by: Eli Cohen 
> > > ---
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 9df69d5efe8c..c89cd48a0aab 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -1645,6 +1645,9 @@ static int mlx5_vdpa_change_map(struct
> > > mlx5_vdpa_net
> > > *ndev, struct vhost_iotlb *
> > >   if (err)
> > >   goto err_mr;
> > >  
> > > + if (!(ndev->mvdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
> > > + return 0;
> > > +
> > 
> > I'm not sure I get this.
> > 
> > It looks to me if set_map() is called before DRIVER_OK, we won't build
> > any mapping?
> > 
> What would prevent that? Is it some qemu logic you're relying upon?

Ok, I think the map is still there, we just avoid to create some
resources.

> With current qemu 5.1 with lack of batching support, I get plenty calls
> to set_map which result in calls to mlx5_vdpa_change_map().
> If that happens before VIRTIO_CONFIG_S_DRIVER_OK then Imay fail (in case
> I was not called to set VQs ready).

Right, this could be solved by adding the batched IOTLB updating.

Thanks

> 
> > 
> > >   restore_channels_info(ndev);
> > >   err = setup_driver(ndev);
> > >   if (err)
> > > --
> > > 2.26.0
> > > 
> > > 
> > 
> 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK

2020-09-07 Thread Jason Wang



- Original Message -
> If the memory map changes before the driver status is
> VIRTIO_CONFIG_S_DRIVER_OK, don't attempt to create resources because it
> may fail. For example, if the VQ is not ready there is no point in
> creating resources.
> 
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Signed-off-by: Eli Cohen 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 9df69d5efe8c..c89cd48a0aab 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1645,6 +1645,9 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_net
> *ndev, struct vhost_iotlb *
>   if (err)
>   goto err_mr;
>  
> + if (!(ndev->mvdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
> + return 0;
> +

I'm not sure I get this.

It looks to me if set_map() is called before DRIVER_OK, we won't build
any mapping?

Thanks

>   restore_channels_info(ndev);
>   err = setup_driver(ndev);
>   if (err)
> --
> 2.26.0
> 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost-vdpa: fix backend feature ioctls

2020-09-07 Thread Jason Wang
Commit 653055b9acd4 ("vhost-vdpa: support get/set backend features")
introduces two malfunction backend features ioctls:

1) the ioctls was blindly added to vring ioctl instead of vdpa device
   ioctl
2) vhost_set_backend_features() was called when dev mutex has already
   been held which will lead a deadlock

This patch fixes the above issues.

Cc: Eli Cohen 
Reported-by: Zhu Lingshan 
Fixes: 653055b9acd4 ("vhost-vdpa: support get/set backend features")
Signed-off-by: Jason Wang 
---
 drivers/vhost/vdpa.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f88894..796fe979f997 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -353,8 +353,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
struct vdpa_callback cb;
struct vhost_virtqueue *vq;
struct vhost_vring_state s;
-   u64 __user *featurep = argp;
-   u64 features;
u32 idx;
long r;
 
@@ -381,18 +379,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
 
vq->last_avail_idx = vq_state.avail_index;
break;
-   case VHOST_GET_BACKEND_FEATURES:
-   features = VHOST_VDPA_BACKEND_FEATURES;
-   if (copy_to_user(featurep, , sizeof(features)))
-   return -EFAULT;
-   return 0;
-   case VHOST_SET_BACKEND_FEATURES:
-   if (copy_from_user(, featurep, sizeof(features)))
-   return -EFAULT;
-   if (features & ~VHOST_VDPA_BACKEND_FEATURES)
-   return -EOPNOTSUPP;
-   vhost_set_backend_features(>vdev, features);
-   return 0;
}
 
r = vhost_vring_ioctl(>vdev, cmd, argp);
@@ -440,8 +426,20 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
struct vhost_vdpa *v = filep->private_data;
struct vhost_dev *d = >vdev;
void __user *argp = (void __user *)arg;
+   u64 __user *featurep = argp;
+   u64 features;
long r;
 
+   if (cmd == VHOST_SET_BACKEND_FEATURES) {
+   r = copy_from_user(, featurep, sizeof(features));
+   if (r)
+   return r;
+   if (features & ~VHOST_VDPA_BACKEND_FEATURES)
+   return -EOPNOTSUPP;
+   vhost_set_backend_features(>vdev, features);
+   return 0;
+   }
+
mutex_lock(>mutex);
 
switch (cmd) {
@@ -476,6 +474,10 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_SET_CONFIG_CALL:
r = vhost_vdpa_set_config_call(v, argp);
break;
+   case VHOST_GET_BACKEND_FEATURES:
+   features = VHOST_VDPA_BACKEND_FEATURES;
+   r = copy_to_user(featurep, , sizeof(features));
+   break;
default:
r = vhost_dev_ioctl(>vdev, cmd, argp);
if (r == -ENOIOCTLCMD)
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] i2c: virtio: add a virtio i2c frontend driver

2020-09-06 Thread Jason Wang


On 2020/9/4 下午9:21, Jie Deng wrote:


On 2020/9/4 12:06, Jason Wang wrote:



diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 293e7a0..70c8e30 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
    This driver can also be built as a module.  If so, the module
    will be called i2c-ali1535.
  +config I2C_VIRTIO
+    tristate "Virtio I2C Adapter"
+    depends on VIRTIO



I guess it should depend on some I2C module here.


The dependency of I2C is included in the Kconfig in its parent directory.
So there is nothing special to add here.



Ok.









+struct virtio_i2c_msg {
+    struct virtio_i2c_hdr hdr;
+    char *buf;
+    u8 status;



Any reason for separating status out of virtio_i2c_hdr?

The status is not from i2c_msg. 



You meant ic2_hdr? You embed status in virtio_i2c_msg anyway.



So I put it out of virtio_i2c_hdr.



Something like status or response is pretty common in virtio request 
(e.g net or scsi), if no special reason, it's better to keep it in the hdr.








+};
+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @adap: I2C adapter for this controller
+ * @i2c_lock: lock for virtqueue processing
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+    struct virtio_device *vdev;
+    struct completion completion;
+    struct i2c_adapter adap;
+    struct mutex i2c_lock;
+    struct virtqueue *vq;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+    struct virtio_i2c *vi = vq->vdev->priv;
+
+    complete(>completion);
+}
+
+static int virtio_i2c_add_msg(struct virtqueue *vq,
+  struct virtio_i2c_msg *vmsg,
+  struct i2c_msg *msg)
+{
+    struct scatterlist *sgs[3], hdr, bout, bin, status;
+    int outcnt = 0, incnt = 0;
+
+    if (!msg->len)
+    return -EINVAL;
+
+    vmsg->hdr.addr = msg->addr;
+    vmsg->hdr.flags = msg->flags;
+    vmsg->hdr.len = msg->len;



Missing endian conversion?


You are right. Need conversion here.



+
+    vmsg->buf = kzalloc(vmsg->hdr.len, GFP_KERNEL);
+    if (!vmsg->buf)
+    return -ENOMEM;
+
+    sg_init_one(, >hdr, sizeof(struct virtio_i2c_hdr));
+    sgs[outcnt++] = 
+    if (vmsg->hdr.flags & I2C_M_RD) {
+    sg_init_one(, vmsg->buf, msg->len);
+    sgs[outcnt + incnt++] = 
+    } else {
+    memcpy(vmsg->buf, msg->buf, msg->len);
+    sg_init_one(, vmsg->buf, msg->len);
+    sgs[outcnt++] = 
+    }
+    sg_init_one(, >status, sizeof(vmsg->status));
+    sgs[outcnt + incnt++] = 
+
+    return virtqueue_add_sgs(vq, sgs, outcnt, incnt, vmsg, 
GFP_KERNEL);

+}
+
+static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg 
*msgs, int num)

+{
+    struct virtio_i2c *vi = i2c_get_adapdata(adap);
+    struct virtio_i2c_msg *vmsg_o, *vmsg_i;
+    struct virtqueue *vq = vi->vq;
+    unsigned long time_left;
+    int len, i, ret = 0;
+
+    vmsg_o = kzalloc(sizeof(*vmsg_o), GFP_KERNEL);
+    if (!vmsg_o)
+    return -ENOMEM;



It looks to me we can avoid the allocation by embedding 
virtio_i2c_msg into struct virtio_i2c;



Yeah... That's better. Thanks.





+
+    mutex_lock(>i2c_lock);
+    vmsg_o->buf = NULL;
+    for (i = 0; i < num; i++) {
+    ret = virtio_i2c_add_msg(vq, vmsg_o, [i]);
+    if (ret) {
+    dev_err(>dev, "failed to add msg[%d] to 
virtqueue.\n", i);

+    goto err_unlock_free;
+    }
+
+    virtqueue_kick(vq);
+
+    time_left = wait_for_completion_timeout(>completion, 
adap->timeout);

+    if (!time_left) {
+    dev_err(>dev, "msg[%d]: addr=0x%x timeout.\n", i, 
msgs[i].addr);

+    ret = i;
+    goto err_unlock_free;
+    }
+
+    vmsg_i = (struct virtio_i2c_msg *)virtqueue_get_buf(vq, );
+    if (vmsg_i) {
+    /* vmsg_i should point to the same address with vmsg_o */
+    if (vmsg_i != vmsg_o) {
+    dev_err(>dev, "msg[%d]: addr=0x%x virtqueue 
error.\n",

+    i, vmsg_i->hdr.addr);
+    ret = i;
+    goto err_unlock_free;
+    }



Does this imply in order completion of i2c device?  (E.g what happens 
if multiple virtio i2c requests are submitted)


Btw, this always use a single descriptor once a time which makes me 
suspect if a virtqueue(virtio) is really needed. It looks to me we 
can utilize the virtqueue by submit the request in a batch.


I'm afraid not all physical devices support batch. 



Yes but I think I meant for the virtio device not the physical one. It's 
impossible to forbid batching if you have a queue anyway ...




I'd like to keep the current design and consider
your suggestion as a possible optimization in t

Re: [PATCH] i2c: virtio: add a virtio i2c frontend driver

2020-09-03 Thread Jason Wang


On 2020/9/3 下午1:34, Jie Deng wrote:

Add an I2C bus driver for virtio para-virtualization.

The controller can be emulated by the backend driver in
any device model software by following the virtio protocol.

This driver communicates with the backend driver through a
virtio I2C message structure which includes following parts:

- Header: i2c_msg addr, flags, len.
- Data buffer: the pointer to the i2c msg data.
- Status: the processing result from the backend.

People may implement different backend drivers to emulate
different controllers according to their needs. A backend
example can be found in the device model of the open source
project ACRN. For more information, please refer to
https://projectacrn.org.

The virtio device ID 34 is used for this I2C adpter since IDs
before 34 have been reserved by other virtio devices.

Co-developed-by: Conghui Chen 
Signed-off-by: Conghui Chen 
Signed-off-by: Jie Deng 
Reviewed-by: Shuo Liu 
Reviewed-by: Andy Shevchenko 
---
  drivers/i2c/busses/Kconfig  |  11 ++
  drivers/i2c/busses/Makefile |   3 +
  drivers/i2c/busses/i2c-virtio.c | 276 
  include/uapi/linux/virtio_ids.h |   1 +
  4 files changed, 291 insertions(+)
  create mode 100644 drivers/i2c/busses/i2c-virtio.c

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 293e7a0..70c8e30 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
  This driver can also be built as a module.  If so, the module
  will be called i2c-ali1535.
  
+config I2C_VIRTIO

+   tristate "Virtio I2C Adapter"
+   depends on VIRTIO



I guess it should depend on some I2C module here.



+   help
+ If you say yes to this option, support will be included for the virtio
+ i2c adapter driver. The hardware can be emulated by any device model
+ software according to the virtio protocol.
+
+ This driver can also be built as a module. If so, the module
+ will be called i2c-virtio.
+
  config I2C_ALI1563
tristate "ALI 1563"
depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 19aff0e..821acfa 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o
  
+# VIRTIO I2C host controller driver

+obj-$(CONFIG_I2C_VIRTIO)   += i2c-virtio.o
+
  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
new file mode 100644
index 000..47f9fd1
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_I2C_MSG_OK  0
+#define VIRTIO_I2C_MSG_ERR 1
+
+/**
+ * struct virtio_i2c_hdr - the virtio I2C message header structure
+ * @addr: i2c_msg addr, the slave address
+ * @flags: i2c_msg flags
+ * @len: i2c_msg len
+ */
+struct virtio_i2c_hdr {
+   __virtio16 addr;
+   __virtio16 flags;
+   __virtio16 len;
+} __packed;
+
+/**
+ * struct virtio_i2c_msg - the virtio I2C message structure
+ * @hdr: the virtio I2C message header
+ * @buf: virtio I2C message data buffer
+ * @status: the processing result from the backend
+ */
+struct virtio_i2c_msg {
+   struct virtio_i2c_hdr hdr;
+   char *buf;
+   u8 status;



Any reason for separating status out of virtio_i2c_hdr?



+};
+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @adap: I2C adapter for this controller
+ * @i2c_lock: lock for virtqueue processing
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+   struct virtio_device *vdev;
+   struct completion completion;
+   struct i2c_adapter adap;
+   struct mutex i2c_lock;
+   struct virtqueue *vq;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+   struct virtio_i2c *vi = vq->vdev->priv;
+
+   complete(>completion);
+}
+
+static int virtio_i2c_add_msg(struct virtqueue *vq,
+ struct virtio_i2c_msg *vmsg,
+ struct i2c_msg *msg)
+{
+   struct scatterlist *sgs[3], hdr, bout, bin, status;
+   int outcnt = 0, incnt = 0;
+
+   if (!msg->len)
+   return -EINVAL;
+
+   vmsg->hdr.addr = msg->addr;
+   vmsg->hdr.flags = msg->flags;
+   vmsg->hdr.len = msg->len;



Missing endian conversion?



+
+   vmsg->buf = kzalloc(vmsg->hdr.len, GFP_KERNEL);
+   if 

Re: [PATCH] i2c: virtio: add a virtio i2c frontend driver

2020-09-03 Thread Jason Wang


On 2020/9/3 下午3:19, Jie Deng wrote:


On 2020/9/3 14:12, Jason Wang wrote:


On 2020/9/3 下午1:34, Jie Deng wrote:

Add an I2C bus driver for virtio para-virtualization.

The controller can be emulated by the backend driver in
any device model software by following the virtio protocol.

This driver communicates with the backend driver through a
virtio I2C message structure which includes following parts:

- Header: i2c_msg addr, flags, len.
- Data buffer: the pointer to the i2c msg data.
- Status: the processing result from the backend.

People may implement different backend drivers to emulate
different controllers according to their needs. A backend
example can be found in the device model of the open source
project ACRN. For more information, please refer to
https://projectacrn.org.



May I know the reason why don't you use i2c or virtio directly?


We don't want to add virtio drivers for every I2C devices in the guests.
This bus driver is designed to provide a way to flexibly expose the 
physical
I2C slave devices to the guest without adding or changing the drivers 
of the

I2C slave devices in the guest OS.



Ok, if I understand this correctly, this is virtio transport of i2c 
message (similar to virtio-scsi).










The virtio device ID 34 is used for this I2C adpter since IDs
before 34 have been reserved by other virtio devices.



Is there a link to the spec patch?

Thanks


I haven't submitted the patch to reserve the ID in spec yet.
I write the ID here because I want to see your opinions first.



It would be helpful to send a spec draft for early review.

Thanks




Thanks




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] i2c: virtio: add a virtio i2c frontend driver

2020-09-03 Thread Jason Wang


On 2020/9/3 下午1:34, Jie Deng wrote:

Add an I2C bus driver for virtio para-virtualization.

The controller can be emulated by the backend driver in
any device model software by following the virtio protocol.

This driver communicates with the backend driver through a
virtio I2C message structure which includes following parts:

- Header: i2c_msg addr, flags, len.
- Data buffer: the pointer to the i2c msg data.
- Status: the processing result from the backend.

People may implement different backend drivers to emulate
different controllers according to their needs. A backend
example can be found in the device model of the open source
project ACRN. For more information, please refer to
https://projectacrn.org.



May I know the reason why don't you use i2c or virtio directly?




The virtio device ID 34 is used for this I2C adpter since IDs
before 34 have been reserved by other virtio devices.



Is there a link to the spec patch?

Thanks




Co-developed-by: Conghui Chen 
Signed-off-by: Conghui Chen 
Signed-off-by: Jie Deng 
Reviewed-by: Shuo Liu 
Reviewed-by: Andy Shevchenko 
---
  drivers/i2c/busses/Kconfig  |  11 ++
  drivers/i2c/busses/Makefile |   3 +
  drivers/i2c/busses/i2c-virtio.c | 276 
  include/uapi/linux/virtio_ids.h |   1 +
  4 files changed, 291 insertions(+)
  create mode 100644 drivers/i2c/busses/i2c-virtio.c

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 293e7a0..70c8e30 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
  This driver can also be built as a module.  If so, the module
  will be called i2c-ali1535.
  
+config I2C_VIRTIO

+   tristate "Virtio I2C Adapter"
+   depends on VIRTIO
+   help
+ If you say yes to this option, support will be included for the virtio
+ i2c adapter driver. The hardware can be emulated by any device model
+ software according to the virtio protocol.
+
+ This driver can also be built as a module. If so, the module
+ will be called i2c-virtio.
+
  config I2C_ALI1563
tristate "ALI 1563"
depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 19aff0e..821acfa 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o
  
+# VIRTIO I2C host controller driver

+obj-$(CONFIG_I2C_VIRTIO)   += i2c-virtio.o
+
  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
new file mode 100644
index 000..47f9fd1
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_I2C_MSG_OK  0
+#define VIRTIO_I2C_MSG_ERR 1
+
+/**
+ * struct virtio_i2c_hdr - the virtio I2C message header structure
+ * @addr: i2c_msg addr, the slave address
+ * @flags: i2c_msg flags
+ * @len: i2c_msg len
+ */
+struct virtio_i2c_hdr {
+   __virtio16 addr;
+   __virtio16 flags;
+   __virtio16 len;
+} __packed;
+
+/**
+ * struct virtio_i2c_msg - the virtio I2C message structure
+ * @hdr: the virtio I2C message header
+ * @buf: virtio I2C message data buffer
+ * @status: the processing result from the backend
+ */
+struct virtio_i2c_msg {
+   struct virtio_i2c_hdr hdr;
+   char *buf;
+   u8 status;
+};
+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @adap: I2C adapter for this controller
+ * @i2c_lock: lock for virtqueue processing
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+   struct virtio_device *vdev;
+   struct completion completion;
+   struct i2c_adapter adap;
+   struct mutex i2c_lock;
+   struct virtqueue *vq;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+   struct virtio_i2c *vi = vq->vdev->priv;
+
+   complete(>completion);
+}
+
+static int virtio_i2c_add_msg(struct virtqueue *vq,
+ struct virtio_i2c_msg *vmsg,
+ struct i2c_msg *msg)
+{
+   struct scatterlist *sgs[3], hdr, bout, bin, status;
+   int outcnt = 0, incnt = 0;
+
+   if (!msg->len)
+   return -EINVAL;
+
+   vmsg->hdr.addr = msg->addr;
+   vmsg->hdr.flags = msg->flags;
+   vmsg->hdr.len = msg->len;
+
+   vmsg->buf = kzalloc(vmsg->hdr.len, GFP_KERNEL);
+   if (!vmsg->buf)
+   return 

Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-01 Thread Jason Wang


On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote:

Hi,

On 28/08/20 4:04 pm, Cornelia Huck wrote:

On Thu, 9 Jul 2020 14:26:53 +0800
Jason Wang  wrote:

[Let me note right at the beginning that I first noted this while
listening to Kishon's talk at LPC on Wednesday. I might be very
confused about the background here, so let me apologize beforehand for
any confusion I might spread.]


On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:

Hi Jason,

On 7/8/2020 4:52 PM, Jason Wang wrote:

On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:

Hi Jason,

On 7/7/2020 3:17 PM, Jason Wang wrote:

On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:

Hi Jason,

On 7/3/2020 12:46 PM, Jason Wang wrote:

On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:

Hi Jason,

On 7/2/2020 3:40 PM, Jason Wang wrote:

On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay 
Abraham I wrote:

This series enhances Linux Vhost support to enable SoC-to-SoC
communication over MMIO. This series enables rpmsg 
communication between

two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2

1) Modify vhost to use standard Linux driver model
2) Add support in vring to access virtqueue over MMIO
3) Add vhost client driver for rpmsg
4) Add PCIe RC driver (uses virtio) and PCIe EP driver 
(uses vhost) for
     rpmsg communication between two SoCs connected to 
each other
5) Add NTB Virtio driver and NTB Vhost driver for rpmsg 
communication

     between two SoCs connected via NTB
6) Add configfs to configure the components

UseCase1 :

   VHOST RPMSG VIRTIO RPMSG
+   +
|   |
|   |
|   |
|   |
+-v--+ +--v---+
|   Linux    | | Linux    |
|  Endpoint  | | Root Complex |
| <->  |
|    | |  |
|    SOC1    | | SOC2 |
++ +--+

UseCase 2:

   VHOST RPMSG VIRTIO RPMSG
+ +
| |
| |
| |
| |
+--v--+ +--v--+
     | | | |
     |    HOST1 |   | 
HOST2    |

     | | | |
+--^--+ +--^--+
| |
| |
+-+ 


| +--v--+ +--v--+  |
|  | | | |  |
|  | EP |   | EP  
|  |
|  | CONTROLLER1 |   | 
CONTROLLER2 |  |

|  | <---> |  |
|  | | | |  |
|  | | | |  |
|  | |  SoC With Multiple EP Instances   
| |  |
|  | |  (Configured using NTB Function)  
| |  |

| +-+ +-+  |
+-+ 



First of all, to clarify the terminology:
Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just


Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.

virtqueues + the exiting vhost interfaces?


It's implemented to provide the full 'device' functionality.




Software Layering:

The high-level SW layering should look something like 
below. This series
adds support only for RPMSG VHOST, however something 
similar should be
done for net and scsi. With that any vhost device (PCI, 
NTB, Platform

device, user) can use any of the vhost client driver.


      ++ +---+  ++ 
+--+
      |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST 
|  |    X |
      +---^+ +-^-+  +-^--+ 
+^-+

      | |  |  |
      | |  |  |
      | |  |  |
+---v-v--v--v--+ 

|    VHOST 
CORE    |
+^---^^--^-+ 


   | |    |  |
   | |    |  |
   | |    |  |
+v---+  +v--+ +--v--+  
+v-+
|  PCI EPF VHOST |  | NTB VHOST | |PLATFORM DEVICE VHOST|  
|    X |
++  +---+ +-+  
+--+


So, the upper half is basically various functionality types, e.g. a net
device. What is the lower half, a hardware interface? Would it be
equivalent to e.g. a normal PCI device?


Right, the upper half should provide the functionality.
The bottom layer could be a HW interface (like PCIe device or NTB 
device) or it could be a SW interface (for

Re: [PATCH net-next] vhost: fix typo in error message

2020-08-31 Thread Jason Wang


On 2020/9/1 上午10:39, Yunsheng Lin wrote:

"enable" should be "disable" when the function name is
vhost_disable_notify(), which does the disabling work.

Signed-off-by: Yunsheng Lin 
---
  drivers/vhost/vhost.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5857d4e..b45519c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2537,7 +2537,7 @@ void vhost_disable_notify(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
r = vhost_update_used_flags(vq);
if (r)
-   vq_err(vq, "Failed to enable notification at %p: %d\n",
+   vq_err(vq, "Failed to disable notification at %p: %d\n",
   >used->flags, r);
}
  }



Acked-by: Jason Wang 



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 2/3] vhost: vdpa: report iova range

2020-08-31 Thread Jason Wang


On 2020/8/23 下午2:40, Eli Cohen wrote:

+static void vhost_vdpa_set_iova_range(struct vhost_vdpa *v)
+{
+   struct vdpa_iova_range *range = >range;
+   struct iommu_domain_geometry geo;
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   if (ops->get_iova_range) {
+   *range = ops->get_iova_range(vdpa);
+   } else if (v->domain &&
+  !iommu_domain_get_attr(v->domain,
+  DOMAIN_ATTR_GEOMETRY, ) &&
+  geo.force_aperture) {
+   range->first = geo.aperture_start;
+   range->last = geo.aperture_end;
+   } else {
+   range->first = 0;
+   range->last = ULLONG_MAX;
+   }

Shouldn't we require drivers that publish VIRTIO_F_ACCESS_PLATFORM to
implement get_iova_range?



Probably not, since ACCESS_PLATFORM does not exclude the device that 
depends on the chipset IOMMU to work. So in that case, we should query 
IOMMU driver instead of vDPA device driver.


Thanks





+}


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH V2 3/3] vdpa_sim: implement get_iova_range()

2020-08-21 Thread Jason Wang
Signed-off-by: Jason Wang 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 62d640327145..89854e17c3c2 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -574,6 +574,16 @@ static u32 vdpasim_get_generation(struct vdpa_device *vdpa)
return vdpasim->generation;
 }
 
+struct vdpa_iova_range vdpasim_get_iova_range(struct vdpa_device *vdpa)
+{
+   struct vdpa_iova_range range = {
+   .first = 0ULL,
+   .last = ULLONG_MAX,
+   };
+
+   return range;
+}
+
 static int vdpasim_set_map(struct vdpa_device *vdpa,
   struct vhost_iotlb *iotlb)
 {
@@ -657,6 +667,7 @@ static const struct vdpa_config_ops vdpasim_net_config_ops 
= {
.get_config = vdpasim_get_config,
.set_config = vdpasim_set_config,
.get_generation = vdpasim_get_generation,
+   .get_iova_range = vdpasim_get_iova_range,
.dma_map= vdpasim_dma_map,
.dma_unmap  = vdpasim_dma_unmap,
.free   = vdpasim_free,
@@ -683,6 +694,7 @@ static const struct vdpa_config_ops 
vdpasim_net_batch_config_ops = {
.get_config = vdpasim_get_config,
.set_config = vdpasim_set_config,
.get_generation = vdpasim_get_generation,
+   .get_iova_range = vdpasim_get_iova_range,
.set_map= vdpasim_set_map,
.free   = vdpasim_free,
 };
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 2/3] vhost: vdpa: report iova range

2020-08-21 Thread Jason Wang
This patch introduces a new ioctl for vhost-vdpa device that can
report the iova range by the device.

For device that implements get_iova_range() method, we fetch it from
the vDPA device. If device doesn't implement get_iova_range() but
depends on platform IOMMU, we will query via DOMAIN_ATTR_GEOMETRY,
otherwise [0, ULLONG_MAX] is assumed.

For safety, this patch also rules out the map request which is not in
the valid range.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vdpa.c | 41 
 include/uapi/linux/vhost.h   |  4 
 include/uapi/linux/vhost_types.h |  9 +++
 3 files changed, 54 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f88894..1adb4adb0345 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -48,6 +48,7 @@ struct vhost_vdpa {
int minor;
struct eventfd_ctx *config_ctx;
int in_batch;
+   struct vdpa_iova_range range;
 };
 
 static DEFINE_IDA(vhost_vdpa_ida);
@@ -344,6 +345,16 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa 
*v, u32 __user *argp)
return 0;
 }
 
+static long vhost_vdpa_get_iova_range(struct vhost_vdpa *v, u32 __user *argp)
+{
+   struct vhost_vdpa_iova_range range = {
+   .first = v->range.first,
+   .last = v->range.last,
+   };
+
+   return copy_to_user(argp, , sizeof(range));
+}
+
 static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
   void __user *argp)
 {
@@ -476,6 +487,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_SET_CONFIG_CALL:
r = vhost_vdpa_set_config_call(v, argp);
break;
+   case VHOST_VDPA_GET_IOVA_RANGE:
+   r = vhost_vdpa_get_iova_range(v, argp);
+   break;
default:
r = vhost_dev_ioctl(>vdev, cmd, argp);
if (r == -ENOIOCTLCMD)
@@ -597,6 +611,10 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
u64 iova = msg->iova;
int ret = 0;
 
+   if (msg->iova < v->range.first ||
+   msg->iova + msg->size - 1 > v->range.last)
+   return -EINVAL;
+
if (vhost_iotlb_itree_first(iotlb, msg->iova,
msg->iova + msg->size - 1))
return -EEXIST;
@@ -762,6 +780,27 @@ static void vhost_vdpa_free_domain(struct vhost_vdpa *v)
v->domain = NULL;
 }
 
+static void vhost_vdpa_set_iova_range(struct vhost_vdpa *v)
+{
+   struct vdpa_iova_range *range = >range;
+   struct iommu_domain_geometry geo;
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   if (ops->get_iova_range) {
+   *range = ops->get_iova_range(vdpa);
+   } else if (v->domain &&
+  !iommu_domain_get_attr(v->domain,
+  DOMAIN_ATTR_GEOMETRY, ) &&
+  geo.force_aperture) {
+   range->first = geo.aperture_start;
+   range->last = geo.aperture_end;
+   } else {
+   range->first = 0;
+   range->last = ULLONG_MAX;
+   }
+}
+
 static int vhost_vdpa_open(struct inode *inode, struct file *filep)
 {
struct vhost_vdpa *v;
@@ -802,6 +841,8 @@ static int vhost_vdpa_open(struct inode *inode, struct file 
*filep)
if (r)
goto err_init_iotlb;
 
+   vhost_vdpa_set_iova_range(v);
+
filep->private_data = v;
 
return 0;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 75232185324a..c998860d7bbc 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -146,4 +146,8 @@
 
 /* Set event fd for config interrupt*/
 #define VHOST_VDPA_SET_CONFIG_CALL _IOW(VHOST_VIRTIO, 0x77, int)
+
+/* Get the valid iova range */
+#define VHOST_VDPA_GET_IOVA_RANGE  _IOR(VHOST_VIRTIO, 0x78, \
+struct vhost_vdpa_iova_range)
 #endif
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 9a269a88a6ff..f7f6a3a28977 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -138,6 +138,15 @@ struct vhost_vdpa_config {
__u8 buf[0];
 };
 
+/* vhost vdpa IOVA range
+ * @first: First address that can be mapped by vhost-vDPA
+ * @last: Last address that can be mapped by vhost-vDPA
+ */
+struct vhost_vdpa_iova_range {
+   __u64 first;
+   __u64 last;
+};
+
 /* Feature bits */
 /* Log all write descriptors. Can be changed while device is active. */
 #define VHOST_F_LOG_ALL 26
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 0/3] vDPA: API for reporting IOVA range

2020-08-21 Thread Jason Wang
Hi All:

This series introduces API for reporing IOVA range. This is a must for
userspace to work correclty:

- for the process that uses vhost-vDPA directly to properly allocate
  IOVA
- for VM(qemu), when vIOMMU is not enabled, fail early if GPA is out
  of range
- for VM(qemu), when vIOMMU is enabled, determine a valid guest
  address width

Please review.

Changes from V1:

- do not mandate get_iova_range() for device with its own DMA
  translation logic and assume a [0, ULLONG_MAX] range
- mandate IOVA range only for IOMMU that forcing aperture
- forbid the map which is out of the IOVA range in vhost-vDPA

Thanks

Jason Wang (3):
  vdpa: introduce config op to get valid iova range
  vhost: vdpa: report iova range
  vdpa_sim: implement get_iova_range()

 drivers/vdpa/vdpa_sim/vdpa_sim.c | 12 ++
 drivers/vhost/vdpa.c | 41 
 include/linux/vdpa.h | 15 
 include/uapi/linux/vhost.h   |  4 
 include/uapi/linux/vhost_types.h |  9 +++
 5 files changed, 81 insertions(+)

-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 1/3] vdpa: introduce config op to get valid iova range

2020-08-21 Thread Jason Wang
This patch introduce a config op to get valid iova range from the vDPA
device.

Signed-off-by: Jason Wang 
---
 include/linux/vdpa.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index eae0bfd87d91..30bc7a7223bb 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -52,6 +52,16 @@ struct vdpa_device {
int nvqs;
 };
 
+/**
+ * vDPA IOVA range - the IOVA range support by the device
+ * @first: start of the IOVA range
+ * @last: end of the IOVA range
+ */
+struct vdpa_iova_range {
+   u64 first;
+   u64 last;
+};
+
 /**
  * vDPA_config_ops - operations for configuring a vDPA device.
  * Note: vDPA device drivers are required to implement all of the
@@ -151,6 +161,10 @@ struct vdpa_device {
  * @get_generation:Get device config generation (optional)
  * @vdev: vdpa device
  * Returns u32: device generation
+ * @get_iova_range:Get supported iova range (optional)
+ * @vdev: vdpa device
+ * Returns the iova range supported by
+ * the device.
  * @set_map:   Set device memory mapping (optional)
  * Needed for device that using device
  * specific DMA translation (on-chip IOMMU)
@@ -216,6 +230,7 @@ struct vdpa_config_ops {
void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
   const void *buf, unsigned int len);
u32 (*get_generation)(struct vdpa_device *vdev);
+   struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev);
 
/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -next] vdpa: Remove duplicate include

2020-08-18 Thread Jason Wang


On 2020/8/18 下午7:49, YueHaibing wrote:

Remove duplicate include file

Signed-off-by: YueHaibing 
---
  drivers/vhost/vdpa.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f88894..95e2b8307a2a 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,7 +22,6 @@
  #include 
  #include 
  #include 
-#include 
  
  #include "vhost.h"
  



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH -next] vdpa/mlx5: Remove duplicate include

2020-08-18 Thread Jason Wang


On 2020/8/18 下午7:46, YueHaibing wrote:

Remove duplicate include file

Signed-off-by: YueHaibing 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 9df69d5efe8c..12fb83dc1de9 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -8,7 +8,6 @@
  #include 
  #include 
  #include 
-#include 
  #include "mlx5_vnet.h"
  #include "mlx5_vdpa_ifc.h"
  #include "mlx5_vdpa.h"



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: VDPA Debug/Statistics

2020-08-11 Thread Jason Wang


On 2020/8/11 下午7:58, Eli Cohen wrote:

On Tue, Aug 11, 2020 at 11:26:20AM +, Eli Cohen wrote:

Hi All

Currently, the only statistics we get for a VDPA instance comes from the 
virtio_net device instance. Since VDPA involves hardware acceleration, there 
can be quite a lot of information that can be fetched from the underlying 
device. Currently there is no generic method to fetch this information.

One way of doing this can be to create a the host, a net device for
each VDPA instance, and use it to get this information or do some
configuration. Ethtool can be used in such a case



The problems are:

- vDPA is not net specific
- vDPA should be transparent to host networking stack




I would like to hear what you think about this or maybe you have some other 
ideas to address this topic.

Thanks,
Eli

Something I'm not sure I understand is how are vdpa instances created on 
mellanox cards? There's a devlink command for that, is that right?
Can that be extended for stats?

Currently any VF will be probed as VDPA device. We're adding devlink support 
but I am not sure if devlink is suitable for displaying statistics. We will 
discuss internally but I wanted to know why you guys think.



I agree with Michael, if it's possible, integrating stats with devlink 
should be the best. Having another interface with is just for stats 
looks not good.


Thanks




--
MST



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: vdpa: handling of VIRTIO_F_ACCESS_PLATFORM/VIRTIO_F_ORDER_PLATFORM

2020-08-11 Thread Jason Wang


On 2020/8/11 下午5:52, Michael S. Tsirkin wrote:

Hi!
I'd like to raise the question of whether we can drop the requirement
of VIRTIO_F_ACCESS_PLATFORM from vdpa?
As far as I can see, it is merely required for virtio vdpa -
so should we not enforce it there?



If we don't enforce it, virtio will use PA which breaks the setup when 
IOMMU is enabled. As discussed in the past, mandating DMA API for virito 
can just solve this issue.





The point is support for legacy guests - which mostly just works
on x86.



Legacy guest should work even if we mandate ACCESS_PLATFORM.

This is because we don't simply pass through guest features (qemu will 
always set ACCESS_PLATFORM to vhost-vdpa).





Also, what is the plan for VIRTIO_F_ORDER_PLATFORM?



I think we should mandate ORDER_PLATFORM, (even for guest).

Thanks





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] vdpa: introduce config op to get valid iova range

2020-08-11 Thread Jason Wang


On 2020/8/11 下午4:29, Michael S. Tsirkin wrote:

On Tue, Aug 11, 2020 at 10:53:09AM +0800, Jason Wang wrote:

On 2020/8/10 下午8:05, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 03:43:54PM +0300, Eli Cohen wrote:

On Thu, Aug 06, 2020 at 08:29:22AM -0400, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 03:03:55PM +0300, Eli Cohen wrote:

On Wed, Aug 05, 2020 at 08:51:56AM -0400, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:44AM +0800, Jason Wang wrote:

This patch introduce a config op to get valid iova range from the vDPA
device.

Signed-off-by: Jason Wang
---
   include/linux/vdpa.h | 14 ++
   1 file changed, 14 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..b7633ed2500c 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -41,6 +41,16 @@ struct vdpa_device {
unsigned int index;
   };
+/**
+ * vDPA IOVA range - the IOVA range support by the device
+ * @start: start of the IOVA range
+ * @end: end of the IOVA range
+ */
+struct vdpa_iova_range {
+   u64 start;
+   u64 end;
+};
+

This is ambiguous. Is end in the range or just behind it?
How about first/last?

It is customary in the kernel to use start-end where end corresponds to
the byte following the last in the range. See struct vm_area_struct
vm_start and vm_end fields

Exactly my point:

include/linux/mm_types.h:   unsigned long vm_end;   /* The first 
byte after our end address

in this case Jason wants it to be the last byte, not one behind.



Maybe start, size? Not ambiguous, and you don't need to do annoying
calculations like size = last - start + 1

Size has a bunch of issues: can overlap, can not cover the entire 64 bit
range. The requisite checks are arguably easier to get wrong than
getting the size if you need it.

Yes, so do you still prefer first/last or just begin/end which is consistent
with iommu_domain_geometry?

Thanks

I prefer first/last I think, these are unambiguous.
E.g.

 dma_addr_t aperture_start; /* First address that can be mapped*/
 dma_addr_t aperture_end;   /* Last address that can be mapped */

instead of addressing ambiguity with a comment, let's just name the field well.



Ok, will do.

Thanks









___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 1/2] MAINTAINERS: add a dedicated entry for vDPA

2020-08-10 Thread Jason Wang
vDPA is an independent subsystem, so use a dedicated entry for that.

Signed-off-by: Jason Wang 
---
 MAINTAINERS | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4e2698cc7e23..314398f0e276 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18173,13 +18173,19 @@ F:Documentation/devicetree/bindings/virtio/
 F: drivers/block/virtio_blk.c
 F: drivers/crypto/virtio/
 F: drivers/net/virtio_net.c
-F: drivers/vdpa/
 F: drivers/virtio/
-F: include/linux/vdpa.h
 F: include/linux/virtio*.h
 F: include/uapi/linux/virtio_*.h
 F: tools/virtio/
 
+VDPA CORE AND DRIVERS
+M: "Michael S. Tsirkin" 
+M: Jason Wang 
+L: virtualization@lists.linux-foundation.org
+S: Maintained
+F: drivers/vdpa/
+F: include/linux/vdpa.h
+
 VIRTIO BALLOON
 M: "Michael S. Tsirkin" 
 M: David Hildenbrand 
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] vDPA: add Eli Cohen as mellanox vDPA driver supporter

2020-08-10 Thread Jason Wang
Cc: Eli Cohen 
Signed-off-by: Jason Wang 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 314398f0e276..ed1851413fcc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18186,6 +18186,12 @@ S: Maintained
 F: drivers/vdpa/
 F: include/linux/vdpa.h
 
+MELLANOX VDPA DRIVERS
+M: Eli Cohen 
+L: virtualization@lists.linux-foundation.org
+S: Supported
+F: drivers/vdpa/mlx5/
+
 VIRTIO BALLOON
 M: "Michael S. Tsirkin" 
 M: David Hildenbrand 
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa_sim: init iommu lock

2020-08-10 Thread Jason Wang


On 2020/8/10 下午8:48, Michael S. Tsirkin wrote:

The patch adding the iommu lock did not initialize it.
The struct is zero-initialized so this is mostly a problem
when using lockdep.

Reported-by: kernel test robot 
Cc: Max Gurtovoy 
Fixes: 0ea9ee430e74 ("vdpasim: protect concurrent access to iommu iotlb")
Signed-off-by: Michael S. Tsirkin 



Acked-by: Jason Wang 



---
  drivers/vdpa/vdpa_sim/vdpa_sim.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index df3224b138ee..604d9d25ca47 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -358,6 +358,7 @@ static struct vdpasim *vdpasim_create(void)
  
  	INIT_WORK(>work, vdpasim_work);

spin_lock_init(>lock);
+   spin_lock_init(>iommu_lock);
  
  	dev = >vdpa.dev;

dev->coherent_dma_mask = DMA_BIT_MASK(64);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 1/6] vhost: introduce vhost_vring_call

2020-08-10 Thread Jason Wang


On 2020/8/10 下午9:37, Michael S. Tsirkin wrote:

On Wed, Aug 05, 2020 at 10:16:16AM +0800, Jason Wang wrote:

On 2020/8/4 下午5:21, Michael S. Tsirkin wrote:

    +struct vhost_vring_call {
+    struct eventfd_ctx *ctx;
+    struct irq_bypass_producer producer;
+    spinlock_t ctx_lock;

It's not clear to me why we need ctx_lock here.

Thanks

Hi Jason,

we use this lock to protect the eventfd_ctx and irq from race conditions,

We don't support irq notification from vDPA device driver in this version,
do we still have race condition?

Thanks

Jason I'm not sure what you are trying to say here.


I meant we change the API from V4 so driver won't notify us if irq is
changed.

Then it looks to me there's no need for the ctx_lock, everyhing could be
synchronized with vq mutex.

Thanks

Jason do you want to post a cleanup patch simplifying code along these
lines?



Ling Shan promised to post a patch to fix this.

Thanks




Thanks,






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] vdpa: introduce config op to get valid iova range

2020-08-10 Thread Jason Wang


On 2020/8/10 下午8:05, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 03:43:54PM +0300, Eli Cohen wrote:

On Thu, Aug 06, 2020 at 08:29:22AM -0400, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 03:03:55PM +0300, Eli Cohen wrote:

On Wed, Aug 05, 2020 at 08:51:56AM -0400, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:44AM +0800, Jason Wang wrote:

This patch introduce a config op to get valid iova range from the vDPA
device.

Signed-off-by: Jason Wang 
---
  include/linux/vdpa.h | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..b7633ed2500c 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -41,6 +41,16 @@ struct vdpa_device {
unsigned int index;
  };
  
+/**

+ * vDPA IOVA range - the IOVA range support by the device
+ * @start: start of the IOVA range
+ * @end: end of the IOVA range
+ */
+struct vdpa_iova_range {
+   u64 start;
+   u64 end;
+};
+


This is ambiguous. Is end in the range or just behind it?
How about first/last?

It is customary in the kernel to use start-end where end corresponds to
the byte following the last in the range. See struct vm_area_struct
vm_start and vm_end fields

Exactly my point:

include/linux/mm_types.h:   unsigned long vm_end;   /* The first 
byte after our end address

in this case Jason wants it to be the last byte, not one behind.



Maybe start, size? Not ambiguous, and you don't need to do annoying
calculations like size = last - start + 1

Size has a bunch of issues: can overlap, can not cover the entire 64 bit
range. The requisite checks are arguably easier to get wrong than
getting the size if you need it.



Yes, so do you still prefer first/last or just begin/end which is 
consistent with iommu_domain_geometry?


Thanks






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa/mlx5: Fix erroneous null pointer checks

2020-08-06 Thread Jason Wang


On 2020/8/7 上午11:37, Jason Wang wrote:


On 2020/8/7 上午3:18, Alex Dewar wrote:

In alloc_inout() in net/mlx5_vnet.c, there are a few places where memory
is allocated to *in and *out, but only the values of in and out are
null-checked (i.e. there is a missing dereference). Fix this.

Addresses-Coverity: ("CID 1496603: (REVERSE_INULL)")
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
devices")

Signed-off-by: Alex Dewar 



Acked-by: Jason Wang 



Colin posted something similar: [PATCH][next] vdpa/mlx5: fix memory 
allocation failure checks


And I think his fix is better since it prevent raw pointers to be freed.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH][next] vdpa/mlx5: fix memory allocation failure checks

2020-08-06 Thread Jason Wang


On 2020/8/7 上午12:08, Colin King wrote:

From: Colin Ian King 

The memory allocation failure checking for in and out is currently
checking if the pointers are valid rather than the contents of what
they point to. Hence the null check on failed memory allocations is
incorrect.  Fix this by adding the missing indirection in the check.
Also for the default case, just set the *in and *out to null as
these don't have any thing allocated to kfree. Finally remove the
redundant *in and *out check as these have been already done on each
allocation in the case statement.

Addresses-Coverity: ("Null pointer dereference")
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Colin Ian King 



Acked-by: Jason Wang 



---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 3ec44a4f0e45..55bc58e1dae9 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -867,7 +867,7 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
*outlen = MLX5_ST_SZ_BYTES(qp_2rst_out);
*in = kzalloc(*inlen, GFP_KERNEL);
*out = kzalloc(*outlen, GFP_KERNEL);
-   if (!in || !out)
+   if (!*in || !*out)
goto outerr;
  
  		MLX5_SET(qp_2rst_in, *in, opcode, cmd);

@@ -879,7 +879,7 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
*outlen = MLX5_ST_SZ_BYTES(rst2init_qp_out);
*in = kzalloc(*inlen, GFP_KERNEL);
*out = kzalloc(MLX5_ST_SZ_BYTES(rst2init_qp_out), GFP_KERNEL);
-   if (!in || !out)
+   if (!*in || !*out)
goto outerr;
  
  		MLX5_SET(rst2init_qp_in, *in, opcode, cmd);

@@ -896,7 +896,7 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
*outlen = MLX5_ST_SZ_BYTES(init2rtr_qp_out);
*in = kzalloc(*inlen, GFP_KERNEL);
*out = kzalloc(MLX5_ST_SZ_BYTES(init2rtr_qp_out), GFP_KERNEL);
-   if (!in || !out)
+   if (!*in || !*out)
goto outerr;
  
  		MLX5_SET(init2rtr_qp_in, *in, opcode, cmd);

@@ -914,7 +914,7 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
*outlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_out);
*in = kzalloc(*inlen, GFP_KERNEL);
*out = kzalloc(MLX5_ST_SZ_BYTES(rtr2rts_qp_out), GFP_KERNEL);
-   if (!in || !out)
+   if (!*in || !*out)
goto outerr;
  
  		MLX5_SET(rtr2rts_qp_in, *in, opcode, cmd);

@@ -927,16 +927,15 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
MLX5_SET(qpc, qpc, rnr_retry, 7);
break;
default:
-   goto outerr;
+   goto outerr_nullify;
}
-   if (!*in || !*out)
-   goto outerr;
  
  	return;
  
  outerr:

kfree(*in);
kfree(*out);
+outerr_nullify:
*in = NULL;
*out = NULL;
  }


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa/mlx5: Fix uninitialised variable in core/mr.c

2020-08-06 Thread Jason Wang


On 2020/8/7 上午2:56, Alex Dewar wrote:

If the kernel is unable to allocate memory for the variable dmr then
err will be returned without being set. Set err to -ENOMEM in this
case.

Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
Addresses-Coverity: ("Uninitialized variables")
Signed-off-by: Alex Dewar 



Acked-by: Jason Wang 



---
  drivers/vdpa/mlx5/core/mr.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index f5dec0274133..ef1c550f8266 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -319,8 +319,10 @@ static int add_direct_chain(struct mlx5_vdpa_dev *mvdev, 
u64 start, u64 size, u8
while (size) {
sz = (u32)min_t(u64, MAX_KLM_SIZE, size);
dmr = kzalloc(sizeof(*dmr), GFP_KERNEL);
-   if (!dmr)
+   if (!dmr) {
+   err = -ENOMEM;
goto err_alloc;
+   }

dmr->start = st;
dmr->end = st + sz;
--
2.28.0



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/2] vdpa: ifcvf: return err when fail to request config irq

2020-08-06 Thread Jason Wang


On 2020/7/23 下午5:12, Jason Wang wrote:

We ignore the err of requesting config interrupt, fix this.

Fixes: e7991f376a4d ("ifcvf: implement config interrupt in IFCVF")
Cc: Zhu Lingshan 
Signed-off-by: Jason Wang 
---
  drivers/vdpa/ifcvf/ifcvf_main.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index f5a60c14b979..ae7110955a44 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -76,6 +76,10 @@ static int ifcvf_request_irq(struct ifcvf_adapter *adapter)
ret = devm_request_irq(>dev, irq,
   ifcvf_config_changed, 0,
   vf->config_msix_name, vf);
+   if (ret) {
+   IFCVF_ERR(pdev, "Failed to request config irq\n");
+   return ret;
+   }
  
  	for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) {

snprintf(vf->vring[i].msix_name, 256, "ifcvf[%s]-%d\n",



Hi Michael:

Any comments on this series?

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa/mlx5: Fix erroneous null pointer checks

2020-08-06 Thread Jason Wang


On 2020/8/7 上午3:18, Alex Dewar wrote:

In alloc_inout() in net/mlx5_vnet.c, there are a few places where memory
is allocated to *in and *out, but only the values of in and out are
null-checked (i.e. there is a missing dereference). Fix this.

Addresses-Coverity: ("CID 1496603: (REVERSE_INULL)")
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Alex Dewar 



Acked-by: Jason Wang 



---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 3ec44a4f0e45..bcb6600c2839 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -867,7 +867,7 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
*outlen = MLX5_ST_SZ_BYTES(qp_2rst_out);
*in = kzalloc(*inlen, GFP_KERNEL);
*out = kzalloc(*outlen, GFP_KERNEL);
-   if (!in || !out)
+   if (!*in || !*out)
goto outerr;
  
  		MLX5_SET(qp_2rst_in, *in, opcode, cmd);

@@ -879,7 +879,7 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
*outlen = MLX5_ST_SZ_BYTES(rst2init_qp_out);
*in = kzalloc(*inlen, GFP_KERNEL);
*out = kzalloc(MLX5_ST_SZ_BYTES(rst2init_qp_out), GFP_KERNEL);
-   if (!in || !out)
+   if (!*in || !*out)
goto outerr;
  
  		MLX5_SET(rst2init_qp_in, *in, opcode, cmd);

@@ -896,7 +896,7 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
*outlen = MLX5_ST_SZ_BYTES(init2rtr_qp_out);
*in = kzalloc(*inlen, GFP_KERNEL);
*out = kzalloc(MLX5_ST_SZ_BYTES(init2rtr_qp_out), GFP_KERNEL);
-   if (!in || !out)
+   if (!*in || !*out)
goto outerr;
  
  		MLX5_SET(init2rtr_qp_in, *in, opcode, cmd);

@@ -914,7 +914,7 @@ static void alloc_inout(struct mlx5_vdpa_net *ndev, int 
cmd, void **in, int *inl
*outlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_out);
*in = kzalloc(*inlen, GFP_KERNEL);
*out = kzalloc(MLX5_ST_SZ_BYTES(rtr2rts_qp_out), GFP_KERNEL);
-   if (!in || !out)
+   if (!*in || !*out)
goto outerr;
  
  		MLX5_SET(rtr2rts_qp_in, *in, opcode, cmd);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 03/24] virtio: allow __virtioXX, __leXX in config space

2020-08-06 Thread Jason Wang


On 2020/8/6 下午1:58, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 11:37:38AM +0800, Jason Wang wrote:

On 2020/8/5 下午7:45, Michael S. Tsirkin wrote:

#define virtio_cread(vdev, structname, member, ptr) \
do {\
might_sleep();  \
/* Must match the member's type, and be integer */  \
-   if (!typecheck(typeofstructname*)0)->member)), *(ptr))) \
+   if (!__virtio_typecheck(structname, member, *(ptr)))\
(*ptr) = 1; \

A silly question,  compare to using set()/get() directly, what's the value
of the accessors macro here?

Thanks

get/set don't convert to the native endian, I guess that's why
drivers use cread/cwrite. It is also nice that there's type
safety, checking the correct integer width is used.


Yes, but this is simply because a macro is used here, how about just doing
things similar like virtio_cread_bytes():

static inline void virtio_cread(struct virtio_device *vdev,
                   unsigned int offset,
                   void *buf, size_t len)


And do the endian conversion inside?

Thanks


Then you lose type safety. It's very easy to have an le32 field
and try to read it into a u16 by mistake.

These macros are all about preventing bugs: and the whole patchset
is about several bugs sparse found - that is what prompted me to make
type checks more strict.



Yes, but we need to do the macro with compiler extensions. I wonder 
whether or not the kernel has already had something since this request 
here is pretty common?


Thanks







___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] vdpa: introduce config op to get valid iova range

2020-08-06 Thread Jason Wang


On 2020/8/6 下午8:29, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 03:03:55PM +0300, Eli Cohen wrote:

On Wed, Aug 05, 2020 at 08:51:56AM -0400, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:44AM +0800, Jason Wang wrote:

This patch introduce a config op to get valid iova range from the vDPA
device.

Signed-off-by: Jason Wang
---
  include/linux/vdpa.h | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..b7633ed2500c 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -41,6 +41,16 @@ struct vdpa_device {
unsigned int index;
  };
  
+/**

+ * vDPA IOVA range - the IOVA range support by the device
+ * @start: start of the IOVA range
+ * @end: end of the IOVA range
+ */
+struct vdpa_iova_range {
+   u64 start;
+   u64 end;
+};
+

This is ambiguous. Is end in the range or just behind it?
How about first/last?

It is customary in the kernel to use start-end where end corresponds to
the byte following the last in the range. See struct vm_area_struct
vm_start and vm_end fields

Exactly my point:

include/linux/mm_types.h:   unsigned long vm_end;   /* The first 
byte after our end address

in this case Jason wants it to be the last byte, not one behind.



Ok, I somehow recall the reason :)

See:

struct iommu_domain_geometry {
    dma_addr_t aperture_start; /* First address that can be mapped    */
    dma_addr_t aperture_end;   /* Last address that can be mapped */
    bool force_aperture;   /* DMA only allowed in mappable range? */
};


So what I proposed here is to be consistent with it.

Thanks







___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] vdpa: introduce config op to get valid iova range

2020-08-06 Thread Jason Wang


On 2020/8/6 下午8:10, Eli Cohen wrote:

On Wed, Jun 17, 2020 at 06:29:44AM +0300, Jason Wang wrote:

This patch introduce a config op to get valid iova range from the vDPA
device.

Signed-off-by: Jason Wang
---
  include/linux/vdpa.h | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..b7633ed2500c 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -41,6 +41,16 @@ struct vdpa_device {
unsigned int index;
  };
  
+/**

+ * vDPA IOVA range - the IOVA range support by the device
+ * @start: start of the IOVA range
+ * @end: end of the IOVA range
+ */
+struct vdpa_iova_range {
+   u64 start;
+   u64 end;
+};
+

What do you do with this information? Suppose some device tells you it
supports some limited range, say, from 0x4000 to 0x8000. What
does qemu do with this information?



For qemu, when qemu will fail the vDPA device creation when:

1) vIOMMU is not enabled and GPA is out of this range
2) vIOMMU is enabled but it can't report such range to guest

For other userspace application, it will know it can only use this range 
as its IOVA.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 19/24] vdpa: make sure set_features in invoked for legacy

2020-08-06 Thread Jason Wang


On 2020/8/6 下午6:00, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 03:27:38PM +0800, Jason Wang wrote:

On 2020/8/6 下午1:53, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 11:23:05AM +0800, Jason Wang wrote:

On 2020/8/5 下午7:40, Michael S. Tsirkin wrote:

On Wed, Aug 05, 2020 at 02:14:07PM +0800, Jason Wang wrote:

On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:

Some legacy guests just assume features are 0 after reset.
We detect that config space is accessed before features are
set and set features to 0 automatically.
Note: some legacy guests might not even access config space, if this is
reported in the field we might need to catch a kick to handle these.

I wonder whether it's easier to just support modern device?

Thanks

Well hardware vendors are I think interested in supporting legacy
guests. Limiting vdpa to modern only would make it uncompetitive.

My understanding is that, IOMMU_PLATFORM is mandatory for hardware vDPA to
work.

Hmm I don't really see why. Assume host maps guest memory properly,
VM does not have an IOMMU, legacy guest can just work.


Yes, guest may not set IOMMU_PLATFORM.



Care explaining what's wrong with this picture?


The problem is virtio_vdpa, without IOMMU_PLATFORM it uses PA which can not
work if IOMMU is enabled.

Thanks

So that's a virtio_vdpa limitation.



Probably not, I think this goes back to the long debate of whether to 
use DMA API unconditionally. If we did that, we can support legacy 
virtio driver.


The vDPA device needs to provide a DMA device and the virtio core and 
perform DMA API with that device which should work for all of the cases.


But a big question is, do upstream care about out of tree virtio drivers?

Thanks



In the same way, if a device
does not have an on-device iommu *and* is not behind an iommu,
then vdpa can't bind to it.

But this virtio_vdpa specific hack does not belong in a generic vdpa code.




So it can only work for modern device ...

Thanks






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 19/24] vdpa: make sure set_features in invoked for legacy

2020-08-06 Thread Jason Wang


On 2020/8/6 下午1:53, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 11:23:05AM +0800, Jason Wang wrote:

On 2020/8/5 下午7:40, Michael S. Tsirkin wrote:

On Wed, Aug 05, 2020 at 02:14:07PM +0800, Jason Wang wrote:

On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:

Some legacy guests just assume features are 0 after reset.
We detect that config space is accessed before features are
set and set features to 0 automatically.
Note: some legacy guests might not even access config space, if this is
reported in the field we might need to catch a kick to handle these.

I wonder whether it's easier to just support modern device?

Thanks

Well hardware vendors are I think interested in supporting legacy
guests. Limiting vdpa to modern only would make it uncompetitive.


My understanding is that, IOMMU_PLATFORM is mandatory for hardware vDPA to
work.

Hmm I don't really see why. Assume host maps guest memory properly,
VM does not have an IOMMU, legacy guest can just work.



Yes, guest may not set IOMMU_PLATFORM.




Care explaining what's wrong with this picture?



The problem is virtio_vdpa, without IOMMU_PLATFORM it uses PA which can 
not work if IOMMU is enabled.


Thanks






So it can only work for modern device ...

Thanks







___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 03/24] virtio: allow __virtioXX, __leXX in config space

2020-08-05 Thread Jason Wang


On 2020/8/5 下午7:45, Michael S. Tsirkin wrote:

   #define virtio_cread(vdev, structname, member, ptr)  \
do {\
might_sleep();  \
/* Must match the member's type, and be integer */  \
-   if (!typecheck(typeofstructname*)0)->member)), *(ptr))) \
+   if (!__virtio_typecheck(structname, member, *(ptr)))\
(*ptr) = 1; \

A silly question,  compare to using set()/get() directly, what's the value
of the accessors macro here?

Thanks

get/set don't convert to the native endian, I guess that's why
drivers use cread/cwrite. It is also nice that there's type
safety, checking the correct integer width is used.



Yes, but this is simply because a macro is used here, how about just 
doing things similar like virtio_cread_bytes():


static inline void virtio_cread(struct virtio_device *vdev,
                  unsigned int offset,
                  void *buf, size_t len)


And do the endian conversion inside?

Thanks






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/4] vhost: vdpa: report iova range

2020-08-05 Thread Jason Wang


On 2020/8/5 下午8:58, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:47AM +0800, Jason Wang wrote:

This patch introduces a new ioctl for vhost-vdpa device that can
report the iova range by the device. For device that depends on
platform IOMMU, we fetch the iova range via DOMAIN_ATTR_GEOMETRY. For
devices that has its own DMA translation unit, we fetch it directly
from vDPA bus operation.

Signed-off-by: Jason Wang 
---
  drivers/vhost/vdpa.c | 27 +++
  include/uapi/linux/vhost.h   |  4 
  include/uapi/linux/vhost_types.h |  5 +
  3 files changed, 36 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 77a0c9fb6cc3..ad23e66cbf57 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -332,6 +332,30 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa 
*v, u32 __user *argp)
  
  	return 0;

  }
+
+static long vhost_vdpa_get_iova_range(struct vhost_vdpa *v, u32 __user *argp)
+{
+   struct iommu_domain_geometry geo;
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+   struct vhost_vdpa_iova_range range;
+   struct vdpa_iova_range vdpa_range;
+
+   if (!ops->set_map && !ops->dma_map) {

Why not just check if (ops->get_iova_range) directly?



Because set_map || dma_ops is a hint that the device has its own DMA 
translation logic.


Device without get_iova_range does not necessarily meant it use IOMMU 
driver.


Thanks








+   iommu_domain_get_attr(v->domain,
+ DOMAIN_ATTR_GEOMETRY, );
+   range.start = geo.aperture_start;
+   range.end = geo.aperture_end;
+   } else {
+   vdpa_range = ops->get_iova_range(vdpa);
+   range.start = vdpa_range.start;
+   range.end = vdpa_range.end;
+   }
+
+   return copy_to_user(argp, , sizeof(range));
+
+}
+
  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
   void __user *argp)
  {
@@ -442,6 +466,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_SET_CONFIG_CALL:
r = vhost_vdpa_set_config_call(v, argp);
break;
+   case VHOST_VDPA_GET_IOVA_RANGE:
+   r = vhost_vdpa_get_iova_range(v, argp);
+   break;
default:
r = vhost_dev_ioctl(>vdev, cmd, argp);
if (r == -ENOIOCTLCMD)
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 0c2349612e77..850956980e27 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -144,4 +144,8 @@
  
  /* Set event fd for config interrupt*/

  #define VHOST_VDPA_SET_CONFIG_CALL_IOW(VHOST_VIRTIO, 0x77, int)
+
+/* Get the valid iova range */
+#define VHOST_VDPA_GET_IOVA_RANGE  _IOW(VHOST_VIRTIO, 0x78, \
+struct vhost_vdpa_iova_range)
  #endif
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 669457ce5c48..4025b5a36177 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -127,6 +127,11 @@ struct vhost_vdpa_config {
__u8 buf[0];
  };
  
+struct vhost_vdpa_iova_range {

+   __u64 start;
+   __u64 end;
+};
+


Pls document fields. And I think first/last is a better API ...


  /* Feature bits */
  /* Log all write descriptors. Can be changed while device is active. */
  #define VHOST_F_LOG_ALL 26
--
2.20.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 3/4] vdpa: get_iova_range() is mandatory for device specific DMA translation

2020-08-05 Thread Jason Wang


On 2020/8/5 下午8:55, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:46AM +0800, Jason Wang wrote:

In order to let userspace work correctly, get_iova_range() is a must
for the device that has its own DMA translation logic.

I guess you mean for a device.

However in absence of ths op, I don't see what is wrong with just
assuming device can access any address.



It's just for safe, if you want, we can assume any address without this op.





Signed-off-by: Jason Wang 
---
  drivers/vdpa/vdpa.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index de211ef3738c..ab7af978ef70 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -82,6 +82,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
if (!!config->dma_map != !!config->dma_unmap)
goto err;
  
+	if ((config->dma_map || config->set_map) &&

+   !config->get_iova_range)
+   goto err;
+
err = -ENOMEM;
vdev = kzalloc(size, GFP_KERNEL);
if (!vdev)

What about devices using an IOMMU for translation?
IOMMUs generally have a limited IOVA range too, right?



See patch 4 which query the IOMMU geometry in this case:

+        iommu_domain_get_attr(v->domain,
+                  DOMAIN_ATTR_GEOMETRY, );
+        range.start = geo.aperture_start;
+        range.end = geo.aperture_end;

Thanks







--
2.20.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] vdpa: introduce config op to get valid iova range

2020-08-05 Thread Jason Wang


On 2020/8/5 下午8:51, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:44AM +0800, Jason Wang wrote:

This patch introduce a config op to get valid iova range from the vDPA
device.

Signed-off-by: Jason Wang
---
  include/linux/vdpa.h | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..b7633ed2500c 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -41,6 +41,16 @@ struct vdpa_device {
unsigned int index;
  };
  
+/**

+ * vDPA IOVA range - the IOVA range support by the device
+ * @start: start of the IOVA range
+ * @end: end of the IOVA range
+ */
+struct vdpa_iova_range {
+   u64 start;
+   u64 end;
+};
+

This is ambiguous. Is end in the range or just behind it?



In the range.



How about first/last?



Sure.

Thanks








___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 19/24] vdpa: make sure set_features in invoked for legacy

2020-08-05 Thread Jason Wang


On 2020/8/5 下午7:40, Michael S. Tsirkin wrote:

On Wed, Aug 05, 2020 at 02:14:07PM +0800, Jason Wang wrote:

On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:

Some legacy guests just assume features are 0 after reset.
We detect that config space is accessed before features are
set and set features to 0 automatically.
Note: some legacy guests might not even access config space, if this is
reported in the field we might need to catch a kick to handle these.

I wonder whether it's easier to just support modern device?

Thanks

Well hardware vendors are I think interested in supporting legacy
guests. Limiting vdpa to modern only would make it uncompetitive.



My understanding is that, IOMMU_PLATFORM is mandatory for hardware vDPA 
to work. So it can only work for modern device ...


Thanks








___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 22/24] vdpa_sim: fix endian-ness of config space

2020-08-05 Thread Jason Wang


On 2020/8/5 下午8:06, Michael S. Tsirkin wrote:

On Wed, Aug 05, 2020 at 02:21:07PM +0800, Jason Wang wrote:

On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:

VDPA sim accesses config space as native endian - this is
wrong since it's a modern device and actually uses LE.

It only supports modern guests so we could punt and
just force LE, but let's use the full virtio APIs since people
tend to copy/paste code, and this is not data path anyway.

Signed-off-by: Michael S. Tsirkin
---
   drivers/vdpa/vdpa_sim/vdpa_sim.c | 31 ++-
   1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a9bc5e0fb353..fa05e065ff69 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -24,6 +24,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -72,6 +73,23 @@ struct vdpasim {
u64 features;
   };
+/* TODO: cross-endian support */
+static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
+{
+   return virtio_legacy_is_little_endian() ||
+   (vdpasim->features & (1ULL << VIRTIO_F_VERSION_1));
+}
+
+static inline u16 vdpasim16_to_cpu(struct vdpasim *vdpasim, __virtio16 val)
+{
+   return __virtio16_to_cpu(vdpasim_is_little_endian(vdpasim), val);
+}
+
+static inline __virtio16 cpu_to_vdpasim16(struct vdpasim *vdpasim, u16 val)
+{
+   return __cpu_to_virtio16(vdpasim_is_little_endian(vdpasim), val);
+}
+
   static struct vdpasim *vdpasim_dev;
   static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
@@ -306,7 +324,6 @@ static const struct vdpa_config_ops vdpasim_net_config_ops;
   static struct vdpasim *vdpasim_create(void)
   {
-   struct virtio_net_config *config;
struct vdpasim *vdpasim;
struct device *dev;
int ret = -ENOMEM;
@@ -331,10 +348,7 @@ static struct vdpasim *vdpasim_create(void)
if (!vdpasim->buffer)
goto err_iommu;
-   config = >config;
-   config->mtu = 1500;
-   config->status = VIRTIO_NET_S_LINK_UP;
-   eth_random_addr(config->mac);
+   eth_random_addr(vdpasim->config.mac);
vringh_set_iotlb(>vqs[0].vring, vdpasim->iommu);
vringh_set_iotlb(>vqs[1].vring, vdpasim->iommu);
@@ -448,6 +462,7 @@ static u64 vdpasim_get_features(struct vdpa_device *vdpa)
   static int vdpasim_set_features(struct vdpa_device *vdpa, u64 features)
   {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+   struct virtio_net_config *config = >config;
/* DMA mapping must be done by driver */
if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM)))
@@ -455,6 +470,12 @@ static int vdpasim_set_features(struct vdpa_device *vdpa, 
u64 features)
vdpasim->features = features & vdpasim_features;
+   /* We only know whether guest is using the legacy interface here, so
+* that's the earliest we can set config fields.
+*/

We check whether or not ACCESS_PLATFORM is set before which is probably a
hint that only modern device is supported. So I wonder just force LE and
fail if VERSION_1 is not set is better?

Thanks

So how about I add a comment along the lines of

/*
  * vdpasim ATM requires VIRTIO_F_ACCESS_PLATFORM, so we don't need to
  * support legacy guests. Keep transitional device code around for
  * the benefit of people who might copy-and-paste this into transitional
  * device code.
  */



That's fine.

Thanks







___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 linux-next 00/12] VDPA support for Mellanox ConnectX devices

2020-08-05 Thread Jason Wang


On 2020/8/5 下午1:01, Eli Cohen wrote:

On Tue, Aug 04, 2020 at 05:29:09PM -0400, Michael S. Tsirkin wrote:

On Tue, Aug 04, 2020 at 07:20:36PM +0300, Eli Cohen wrote:

Hi Michael,
please note that this series depends on mlx5 core device driver patches
in mlx5-next branch in
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git.

Thanks! OK so what's the plan for merging this?
Do patches at least build well enough that I can push them
upstream? Or do they have to go on top of the mellanox tree?


The patches are built on your linux-next branch which I updated
yesterday.

I am based on this commit:
776b7b25f10b (origin/linux-next) vhost: add an RPMsg API

On top of that I merged
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

and after that I have Jason's patches (five patches), than one patch
from Max and then my patches (seven).

It builds fine on x84_64.
I fixed some conflicts on Jason's patches.

I also tested it to verify it's working.

BTW, for some reason I did not get all the patches into my mailbox and I
suspect they were not all sent. Did you get all the series 0-13?



I can see patch 0 to patch 12 but not patch 13 (I guess 12 is all).

Thanks




Please let me know, and if needed I'll resend.


git pull git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
mlx5-next

They also depend Jason Wang's patches: https://lkml.org/lkml/2020/7/1/301

The ones you included, right?


Right.
  

Jason, I had to resolve some conflicts so I would appreciate of you can verify
that it is ok.

The following series of patches provide VDPA support for Mellanox
devices. The supported devices are ConnectX6 DX and newer.

Currently, only a network driver is implemented; future patches will
introduce a block device driver. iperf performance on a single queue is
around 12 Gbps.  Future patches will introduce multi queue support.

The files are organized in such a way that code that can be used by
different VDPA implementations will be placed in a common are resides in
drivers/vdpa/mlx5/core.

Only virtual functions are currently supported. Also, certain firmware
capabilities must be set to enable the driver. Physical functions (PFs)
are skipped by the driver.

To make use of the VDPA net driver, one must load mlx5_vdpa. In such
case, VFs will be operated by the VDPA driver. Although one can see a
regular instance of a network driver on the VF, the VDPA driver takes
precedence over the NIC driver, steering-wize.

Currently, the device/interface infrastructure in mlx5_core is used to
probe drivers. Future patches will introduce virtbus as a means to
register devices and drivers and VDPA will be adapted to it.

The mlx5 mode of operation required to support VDPA is switchdev mode.
Once can use Linux or OVS bridge to take care of layer 2 switching.

In order to provide virtio networking to a guest, an updated version of
qemu is required. This version has been tested by the following quemu
version:

url: https://github.com/jasowang/qemu.git
branch: vdpa
Commit ID: 6f4e59b807db


V2->V3
Fix makefile to use include path relative to the root of the kernel

V3-V4
Rebase Jason's patches on linux-next branch
Fix krobot error on mips arch
Make use of the free callback to destroy resoruces on unload
Use VIRTIO_F_ACCESS_PLATFORM instead of legacy VIRTIO_F_IOMMU_PLATFORM
Add empty implementations for get_vq_notification() and get_vq_irq()


Eli Cohen (6):
   net/vdpa: Use struct for set/get vq state
   vdpa: Modify get_vq_state() to return error code
   vdpa/mlx5: Add hardware descriptive header file
   vdpa/mlx5: Add support library for mlx5 VDPA implementation
   vdpa/mlx5: Add shared memory registration code
   vdpa/mlx5: Add VDPA driver for supported mlx5 devices

Jason Wang (5):
   vhost-vdpa: refine ioctl pre-processing
   vhost: generialize backend features setting/getting
   vhost-vdpa: support get/set backend features
   vhost-vdpa: support IOTLB batching hints
   vdpasim: support batch updating

Max Gurtovoy (1):
   vdpa: remove hard coded virtq num

  drivers/vdpa/Kconfig   |   19 +
  drivers/vdpa/Makefile  |1 +
  drivers/vdpa/ifcvf/ifcvf_base.c|4 +-
  drivers/vdpa/ifcvf/ifcvf_base.h|4 +-
  drivers/vdpa/ifcvf/ifcvf_main.c|   13 +-
  drivers/vdpa/mlx5/Makefile |4 +
  drivers/vdpa/mlx5/core/mlx5_vdpa.h |   91 ++
  drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h |  168 ++
  drivers/vdpa/mlx5/core/mr.c|  484 ++
  drivers/vdpa/mlx5/core/resources.c |  284 
  drivers/vdpa/mlx5/net/main.c   |   76 +
  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 1965 
  drivers/vdpa/mlx5/net/mlx5_vnet.h  |   24 +
  drivers/vdpa/vdpa.c|3 +
  drivers/vdpa/vdpa_sim/vdpa_sim.c   |   53 +-
  drivers/vhost/net.c|   18 +-
  drivers/vhost/vdpa.c   |   76 +-
  drivers/vhost/vhost.c  |   15 +
  driv

Re: [PATCH V4 linux-next 00/12] VDPA support for Mellanox ConnectX devices

2020-08-05 Thread Jason Wang


On 2020/8/5 上午12:20, Eli Cohen wrote:

Hi Michael,
please note that this series depends on mlx5 core device driver patches
in mlx5-next branch in
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git.

git pull git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
mlx5-next

They also depend Jason Wang's patches:https://lkml.org/lkml/2020/7/1/301

Jason, I had to resolve some conflicts so I would appreciate of you can verify
that it is ok.



Looks good to me.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 linux-next 08/12] vdpa: Modify get_vq_state() to return error code

2020-08-05 Thread Jason Wang


On 2020/8/5 上午12:20, Eli Cohen wrote:

Modify get_vq_state() so it returns an error code. In case of hardware
acceleration, the available index may be retrieved from the device, an
operation that can possibly fail.

Reviewed-by: Parav Pandit 
Signed-off-by: Eli Cohen 



Acked-by: Jason Wang 



---
  drivers/vdpa/ifcvf/ifcvf_main.c  | 5 +++--
  drivers/vdpa/vdpa_sim/vdpa_sim.c | 5 +++--
  drivers/vhost/vdpa.c | 5 -
  include/linux/vdpa.h | 4 ++--
  4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index dc311e972b9e..076d7ac5e723 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -237,12 +237,13 @@ static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device 
*vdpa_dev)
return IFCVF_QUEUE_MAX;
  }
  
-static void ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,

-   struct vdpa_vq_state *state)
+static int ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
+  struct vdpa_vq_state *state)
  {
struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
  
  	state->avail_index = ifcvf_get_vq_state(vf, qid);

+   return 0;
  }
  
  static int ifcvf_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index f1c351d5959b..c68741363643 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -464,14 +464,15 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, 
u16 idx,
return 0;
  }
  
-static void vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,

-struct vdpa_vq_state *state)
+static int vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
+   struct vdpa_vq_state *state)
  {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
struct vdpasim_virtqueue *vq = >vqs[idx];
struct vringh *vrh = >vring;
  
  	state->avail_index = vrh->last_avail_idx;

+   return 0;
  }
  
  static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index c2e1e2d55084..a0b7c91948e1 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -392,7 +392,10 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
ops->set_vq_ready(vdpa, idx, s.num);
return 0;
case VHOST_GET_VRING_BASE:
-   ops->get_vq_state(v->vdpa, idx, _state);
+   r = ops->get_vq_state(v->vdpa, idx, _state);
+   if (r)
+   return r;
+
vq->last_avail_idx = vq_state.avail_index;
break;
case VHOST_GET_BACKEND_FEATURES:
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 8f412620071d..fd6e560d70f9 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -192,8 +192,8 @@ struct vdpa_config_ops {
bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
const struct vdpa_vq_state *state);
-   void (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
-struct vdpa_vq_state *state);
+   int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
+   struct vdpa_vq_state *state);
struct vdpa_notification_area
(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 linux-next 12/12] vdpa/mlx5: Add VDPA driver for supported mlx5 devices

2020-08-05 Thread Jason Wang


On 2020/8/5 上午12:20, Eli Cohen wrote:

Add a front end VDPA driver that registers in the VDPA bus and provides
networking to a guest. The VDPA driver creates the necessary resources
on the VF it is driving such that data path will be offloaded.

Notifications are being communicated through the driver.

Currently, only VFs are supported. In subsequent patches we will have
devlink support to control which VF is used for VDPA and which function
is used for regular networking.

Reviewed-by: Parav Pandit
Signed-off-by: Eli Cohen
---



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-08-05 Thread Jason Wang


On 2020/7/31 下午2:55, Zhu Lingshan wrote:

+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, u16 qid)
+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);



Btw, this assumes that get_vq_irq is mandatory. This looks wrong since 
there's no guarantee that the vDPA device driver can see irq. And this 
break vdpa simulator.


Let's add a check and make it optional by document this assumption in 
the vdpa.h.


Thanks



+   if (!vq->call_ctx.ctx || irq < 0) {
+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 03/24] virtio: allow __virtioXX, __leXX in config space

2020-08-05 Thread Jason Wang


On 2020/8/4 上午4:58, Michael S. Tsirkin wrote:

Currently all config space fields are of the type __uXX.
This confuses people and some drivers (notably vdpa)
access them using CPU endian-ness - which only
works well for legacy or LE platforms.

Update virtio_cread/virtio_cwrite macros to allow __virtioXX
and __leXX field types. Follow-up patches will convert
config space to use these types.

Signed-off-by: Michael S. Tsirkin 
---
  include/linux/virtio_config.h | 50 +--
  1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 3b4eae5ac5e3..64da491936f7 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -6,6 +6,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  struct irq_affinity;

@@ -287,12 +288,57 @@ static inline __virtio64 cpu_to_virtio64(struct 
virtio_device *vdev, u64 val)
return __cpu_to_virtio64(virtio_is_little_endian(vdev), val);
  }
  
+/*

+ * Only the checker differentiates between __virtioXX and __uXX types. But we
+ * try to share as much code as we can with the regular GCC build.
+ */
+#if !defined(CONFIG_CC_IS_GCC) && !defined(__CHECKER__)
+
+/* Not a checker - we can keep things simple */
+#define __virtio_native_typeof(x) typeof(x)
+
+#else
+
+/*
+ * We build this out of a couple of helper macros in a vain attempt to
+ * help you keep your lunch down while reading it.
+ */
+#define __virtio_pick_value(x, type, then, otherwise)  \
+   __builtin_choose_expr(__same_type(x, type), then, otherwise)
+
+#define __virtio_pick_type(x, type, then, otherwise)   \
+   __virtio_pick_value(x, type, (then)0, otherwise)
+
+#define __virtio_pick_endian(x, x16, x32, x64, otherwise)  
\
+   __virtio_pick_type(x, x16, __u16,   
\
+   __virtio_pick_type(x, x32, __u32,   
\
+   __virtio_pick_type(x, x64, __u64,   
\
+   otherwise)))
+
+#define __virtio_native_typeof(x) typeof(  
\
+   __virtio_pick_type(x, __u8, __u8,   
\
+   __virtio_pick_endian(x, __virtio16, __virtio32, __virtio64, 
\
+   __virtio_pick_endian(x, __le16, __le32, __le64, 
\
+   __virtio_pick_endian(x, __u16, __u32, __u64,
\
+   /* No other type allowed */ 
\
+   (void)0)
+
+#endif
+
+#define __virtio_native_type(structname, member) \
+   __virtio_native_typeof(((structname*)0)->member)
+
+#define __virtio_typecheck(structname, member, val) \
+   /* Must match the member's type, and be integer */ \
+   typecheck(__virtio_native_type(structname, member), (val))
+
+
  /* Config space accessors. */
  #define virtio_cread(vdev, structname, member, ptr)   \
do {\
might_sleep();  \
/* Must match the member's type, and be integer */  \
-   if (!typecheck(typeofstructname*)0)->member)), *(ptr))) \
+   if (!__virtio_typecheck(structname, member, *(ptr)))\
(*ptr) = 1; \



A silly question,  compare to using set()/get() directly, what's the 
value of the accessors macro here?


Thanks



\
switch (sizeof(*ptr)) { \
@@ -322,7 +368,7 @@ static inline __virtio64 cpu_to_virtio64(struct 
virtio_device *vdev, u64 val)
do {\
might_sleep();  \
/* Must match the member's type, and be integer */  \
-   if (!typecheck(typeofstructname*)0)->member)), *(ptr))) \
+   if (!__virtio_typecheck(structname, member, *(ptr)))\
BUG_ON((*ptr) == 1);\
\
switch (sizeof(*ptr)) { \


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 22/24] vdpa_sim: fix endian-ness of config space

2020-08-05 Thread Jason Wang


On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:

VDPA sim accesses config space as native endian - this is
wrong since it's a modern device and actually uses LE.

It only supports modern guests so we could punt and
just force LE, but let's use the full virtio APIs since people
tend to copy/paste code, and this is not data path anyway.

Signed-off-by: Michael S. Tsirkin 
---
  drivers/vdpa/vdpa_sim/vdpa_sim.c | 31 ++-
  1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a9bc5e0fb353..fa05e065ff69 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -24,6 +24,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -72,6 +73,23 @@ struct vdpasim {
u64 features;
  };
  
+/* TODO: cross-endian support */

+static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
+{
+   return virtio_legacy_is_little_endian() ||
+   (vdpasim->features & (1ULL << VIRTIO_F_VERSION_1));
+}
+
+static inline u16 vdpasim16_to_cpu(struct vdpasim *vdpasim, __virtio16 val)
+{
+   return __virtio16_to_cpu(vdpasim_is_little_endian(vdpasim), val);
+}
+
+static inline __virtio16 cpu_to_vdpasim16(struct vdpasim *vdpasim, u16 val)
+{
+   return __cpu_to_virtio16(vdpasim_is_little_endian(vdpasim), val);
+}
+
  static struct vdpasim *vdpasim_dev;
  
  static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)

@@ -306,7 +324,6 @@ static const struct vdpa_config_ops vdpasim_net_config_ops;
  
  static struct vdpasim *vdpasim_create(void)

  {
-   struct virtio_net_config *config;
struct vdpasim *vdpasim;
struct device *dev;
int ret = -ENOMEM;
@@ -331,10 +348,7 @@ static struct vdpasim *vdpasim_create(void)
if (!vdpasim->buffer)
goto err_iommu;
  
-	config = >config;

-   config->mtu = 1500;
-   config->status = VIRTIO_NET_S_LINK_UP;
-   eth_random_addr(config->mac);
+   eth_random_addr(vdpasim->config.mac);
  
  	vringh_set_iotlb(>vqs[0].vring, vdpasim->iommu);

vringh_set_iotlb(>vqs[1].vring, vdpasim->iommu);
@@ -448,6 +462,7 @@ static u64 vdpasim_get_features(struct vdpa_device *vdpa)
  static int vdpasim_set_features(struct vdpa_device *vdpa, u64 features)
  {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+   struct virtio_net_config *config = >config;
  
  	/* DMA mapping must be done by driver */

if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM)))
@@ -455,6 +470,12 @@ static int vdpasim_set_features(struct vdpa_device *vdpa, 
u64 features)
  
  	vdpasim->features = features & vdpasim_features;
  
+	/* We only know whether guest is using the legacy interface here, so

+* that's the earliest we can set config fields.
+*/



We check whether or not ACCESS_PLATFORM is set before which is probably 
a hint that only modern device is supported. So I wonder just force LE 
and fail if VERSION_1 is not set is better?


Thanks



+
+   config->mtu = cpu_to_vdpasim16(vdpasim, 1500);
+   config->status = cpu_to_vdpasim16(vdpasim, VIRTIO_NET_S_LINK_UP);
return 0;
  }
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 19/24] vdpa: make sure set_features in invoked for legacy

2020-08-05 Thread Jason Wang


On 2020/8/4 上午5:00, Michael S. Tsirkin wrote:

Some legacy guests just assume features are 0 after reset.
We detect that config space is accessed before features are
set and set features to 0 automatically.
Note: some legacy guests might not even access config space, if this is
reported in the field we might need to catch a kick to handle these.



I wonder whether it's easier to just support modern device?

Thanks




Signed-off-by: Michael S. Tsirkin 
---
  drivers/vdpa/vdpa.c  |  1 +
  include/linux/vdpa.h | 34 ++
  2 files changed, 35 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index de211ef3738c..7105265e4793 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -96,6 +96,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent,
vdev->dev.release = vdpa_release_dev;
vdev->index = err;
vdev->config = config;
+   vdev->features_valid = false;
  
  	err = dev_set_name(>dev, "vdpa%u", vdev->index);

if (err)
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..29b8296f1414 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -33,12 +33,14 @@ struct vdpa_notification_area {
   * @dma_dev: the actual device that is performing DMA
   * @config: the configuration ops for this device.
   * @index: device index
+ * @features_valid: were features initialized? for legacy guests
   */
  struct vdpa_device {
struct device dev;
struct device *dma_dev;
const struct vdpa_config_ops *config;
unsigned int index;
+   bool features_valid;
  };
  
  /**

@@ -266,4 +268,36 @@ static inline struct device *vdpa_get_dma_dev(struct 
vdpa_device *vdev)
  {
return vdev->dma_dev;
  }
+
+static inline void vdpa_reset(struct vdpa_device *vdev)
+{
+const struct vdpa_config_ops *ops = vdev->config;
+
+   vdev->features_valid = false;
+ops->set_status(vdev, 0);
+}
+
+static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features)
+{
+const struct vdpa_config_ops *ops = vdev->config;
+
+   vdev->features_valid = true;
+return ops->set_features(vdev, features);
+}
+
+
+static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,
+  void *buf, unsigned int len)
+{
+const struct vdpa_config_ops *ops = vdev->config;
+
+   /*
+* Config accesses aren't supposed to trigger before features are set.
+* If it does happen we assume a legacy guest.
+*/
+   if (!vdev->features_valid)
+   vdpa_set_features(vdev, 0);
+   ops->get_config(vdev, offset, buf, len);
+}
+
  #endif /* _LINUX_VDPA_H */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 1/6] vhost: introduce vhost_vring_call

2020-08-04 Thread Jason Wang


On 2020/8/5 下午1:49, Zhu, Lingshan wrote:



On 8/5/2020 10:16 AM, Jason Wang wrote:


On 2020/8/4 下午5:21, Michael S. Tsirkin wrote:

   +struct vhost_vring_call {
+    struct eventfd_ctx *ctx;
+    struct irq_bypass_producer producer;
+    spinlock_t ctx_lock;

It's not clear to me why we need ctx_lock here.

Thanks

Hi Jason,

we use this lock to protect the eventfd_ctx and irq from race 
conditions,
We don't support irq notification from vDPA device driver in this 
version,

do we still have race condition?

Thanks

Jason I'm not sure what you are trying to say here.



I meant we change the API from V4 so driver won't notify us if irq is 
changed.


Then it looks to me there's no need for the ctx_lock, everyhing could 
be synchronized with vq mutex.


Thanks

from V4 to V5, there are only some minor improvements and bug fix, get_vq_irq() 
almost stays untouched, mutex can work for this, however I see the vq mutex is 
used in many scenarios.
We only use this lock to protect the producer information, can this help to get 
less coupling, defensive code for less bugs?



I think not, vq mutex is used to protect all vq related data structure, 
introducing another one will increase the complexity.


Thanks




Thanks









___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-08-04 Thread Jason Wang


On 2020/8/5 下午1:45, Zhu, Lingshan wrote:



On 8/5/2020 10:36 AM, Jason Wang wrote:


On 2020/8/4 下午5:31, Zhu, Lingshan wrote:



On 8/4/2020 4:51 PM, Jason Wang wrote:


On 2020/7/31 下午2:55, Zhu Lingshan wrote:

This patch introduce a set of functions for setup/unsetup
and update irq offloading respectively by register/unregister
and re-register the irq_bypass_producer.

With these functions, this commit can setup/unsetup
irq offloading through setting DRIVER_OK/!DRIVER_OK, and
update irq offloading through SET_VRING_CALL.

Signed-off-by: Zhu Lingshan 
Suggested-by: Jason Wang 
---
  drivers/vhost/Kconfig |  1 +
  drivers/vhost/vdpa.c  | 79 
++-

  2 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index d3688c6afb87..587fbae06182 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -65,6 +65,7 @@ config VHOST_VDPA
  tristate "Vhost driver for vDPA-based backend"
  depends on EVENTFD
  select VHOST
+    select IRQ_BYPASS_MANAGER
  depends on VDPA
  help
    This kernel module can be loaded in host kernel to accelerate
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index df3cf386b0cd..278ea2f00172 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -115,6 +115,55 @@ static irqreturn_t vhost_vdpa_config_cb(void 
*private)

  return IRQ_HANDLED;
  }
  +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, u16 qid)
+{
+    struct vhost_virtqueue *vq = >vqs[qid];
+    const struct vdpa_config_ops *ops = v->vdpa->config;
+    struct vdpa_device *vdpa = v->vdpa;
+    int ret, irq;
+
+    spin_lock(>call_ctx.ctx_lock);
+    irq = ops->get_vq_irq(vdpa, qid);
+    if (!vq->call_ctx.ctx || irq < 0) {
+ spin_unlock(>call_ctx.ctx_lock);
+    return;
+    }
+
+    vq->call_ctx.producer.token = vq->call_ctx.ctx;
+    vq->call_ctx.producer.irq = irq;
+    ret = irq_bypass_register_producer(>call_ctx.producer);
+    spin_unlock(>call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)
+{
+    struct vhost_virtqueue *vq = >vqs[qid];
+
+    spin_lock(>call_ctx.ctx_lock);
+ irq_bypass_unregister_producer(>call_ctx.producer);



Any reason for not checking vq->call_ctx.producer.irq as below here?
we only need ctx as a token to unregister vq from irq bypass 
manager, if vq->call_ctx.producer.irq is 0, means it is a unused or 
disabled vq,



This is not how the code is wrote? See above you only check whether 
irq is negative, irq 0 seems acceptable.

Yes, IRQ 0 is valid, so we check whether it is < 0.


+    spin_lock(>call_ctx.ctx_lock);
+    irq = ops->get_vq_irq(vdpa, qid);
+    if (!vq->call_ctx.ctx || irq < 0) {
+    spin_unlock(>call_ctx.ctx_lock);
+    return;
+    }
+
+    vq->call_ctx.producer.token = vq->call_ctx.ctx;
+    vq->call_ctx.producer.irq = irq;
+    ret = irq_bypass_register_producer(>call_ctx.producer);
+    spin_unlock(>call_ctx.ctx_lock);



no harm if we
perform an unregister on it.




+ spin_unlock(>call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_update_vq_irq(struct vhost_virtqueue *vq)
+{
+    spin_lock(>call_ctx.ctx_lock);
+    /*
+ * if it has a non-zero irq, means there is a
+ * previsouly registered irq_bypass_producer,
+ * we should update it when ctx (its token)
+ * changes.
+ */
+    if (!vq->call_ctx.producer.irq) {
+ spin_unlock(>call_ctx.ctx_lock);
+    return;
+    }
+
+ irq_bypass_unregister_producer(>call_ctx.producer);
+    vq->call_ctx.producer.token = vq->call_ctx.ctx;
+ irq_bypass_register_producer(>call_ctx.producer);
+    spin_unlock(>call_ctx.ctx_lock);
+}



I think setup_irq() and update_irq() could be unified with the 
following logic:


irq_bypass_unregister_producer(>call_ctx.producer);
irq = ops->get_vq_irq(vdpa, qid);
    if (!vq->call_ctx.ctx || irq < 0) {
        spin_unlock(>call_ctx.ctx_lock);
        return;
    }

vq->call_ctx.producer.token = vq->call_ctx.ctx;
vq->call_ctx.producer.irq = irq;
ret = irq_bypass_register_producer(>call_ctx.producer);
Yes, this code piece can do both register and update. Though it's 
rare to call undate_irq(), however
setup_irq() is very likely to be called for every vq, so this may 
cause several rounds of useless irq_bypass_unregister_producer().



I'm not sure I get this but do you have a case for this?

I mean if we use this routine to setup irq offloading, it is very likely to do 
a unregister producer for every vq first, but for nothing.



Does it really harm? See vfio_msi_set_vector_signal()






is it worth for simplify the code?



Less code(bug).

I can do this if we are chasing for perfection, however I believe bug number 
has positive correlation with the complexity in the logic than code lines, if 
we only merge lines, th

Re: [PATCH V5 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-08-04 Thread Jason Wang


On 2020/8/4 下午5:31, Zhu, Lingshan wrote:



On 8/4/2020 4:51 PM, Jason Wang wrote:


On 2020/7/31 下午2:55, Zhu Lingshan wrote:

This patch introduce a set of functions for setup/unsetup
and update irq offloading respectively by register/unregister
and re-register the irq_bypass_producer.

With these functions, this commit can setup/unsetup
irq offloading through setting DRIVER_OK/!DRIVER_OK, and
update irq offloading through SET_VRING_CALL.

Signed-off-by: Zhu Lingshan 
Suggested-by: Jason Wang 
---
  drivers/vhost/Kconfig |  1 +
  drivers/vhost/vdpa.c  | 79 
++-

  2 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index d3688c6afb87..587fbae06182 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -65,6 +65,7 @@ config VHOST_VDPA
  tristate "Vhost driver for vDPA-based backend"
  depends on EVENTFD
  select VHOST
+    select IRQ_BYPASS_MANAGER
  depends on VDPA
  help
    This kernel module can be loaded in host kernel to accelerate
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index df3cf386b0cd..278ea2f00172 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -115,6 +115,55 @@ static irqreturn_t vhost_vdpa_config_cb(void 
*private)

  return IRQ_HANDLED;
  }
  +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, u16 qid)
+{
+    struct vhost_virtqueue *vq = >vqs[qid];
+    const struct vdpa_config_ops *ops = v->vdpa->config;
+    struct vdpa_device *vdpa = v->vdpa;
+    int ret, irq;
+
+    spin_lock(>call_ctx.ctx_lock);
+    irq = ops->get_vq_irq(vdpa, qid);
+    if (!vq->call_ctx.ctx || irq < 0) {
+    spin_unlock(>call_ctx.ctx_lock);
+    return;
+    }
+
+    vq->call_ctx.producer.token = vq->call_ctx.ctx;
+    vq->call_ctx.producer.irq = irq;
+    ret = irq_bypass_register_producer(>call_ctx.producer);
+    spin_unlock(>call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)
+{
+    struct vhost_virtqueue *vq = >vqs[qid];
+
+    spin_lock(>call_ctx.ctx_lock);
+ irq_bypass_unregister_producer(>call_ctx.producer);



Any reason for not checking vq->call_ctx.producer.irq as below here?

we only need ctx as a token to unregister vq from irq bypass manager, if 
vq->call_ctx.producer.irq is 0, means it is a unused or disabled vq,



This is not how the code is wrote? See above you only check whether irq 
is negative, irq 0 seems acceptable.


+    spin_lock(>call_ctx.ctx_lock);
+    irq = ops->get_vq_irq(vdpa, qid);
+    if (!vq->call_ctx.ctx || irq < 0) {
+    spin_unlock(>call_ctx.ctx_lock);
+    return;
+    }
+
+    vq->call_ctx.producer.token = vq->call_ctx.ctx;
+    vq->call_ctx.producer.irq = irq;
+    ret = irq_bypass_register_producer(>call_ctx.producer);
+    spin_unlock(>call_ctx.ctx_lock);



no harm if we
perform an unregister on it.




+ spin_unlock(>call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_update_vq_irq(struct vhost_virtqueue *vq)
+{
+    spin_lock(>call_ctx.ctx_lock);
+    /*
+ * if it has a non-zero irq, means there is a
+ * previsouly registered irq_bypass_producer,
+ * we should update it when ctx (its token)
+ * changes.
+ */
+    if (!vq->call_ctx.producer.irq) {
+    spin_unlock(>call_ctx.ctx_lock);
+    return;
+    }
+
+ irq_bypass_unregister_producer(>call_ctx.producer);
+    vq->call_ctx.producer.token = vq->call_ctx.ctx;
+ irq_bypass_register_producer(>call_ctx.producer);
+    spin_unlock(>call_ctx.ctx_lock);
+}



I think setup_irq() and update_irq() could be unified with the 
following logic:


irq_bypass_unregister_producer(>call_ctx.producer);
irq = ops->get_vq_irq(vdpa, qid);
    if (!vq->call_ctx.ctx || irq < 0) {
        spin_unlock(>call_ctx.ctx_lock);
        return;
    }

vq->call_ctx.producer.token = vq->call_ctx.ctx;
vq->call_ctx.producer.irq = irq;
ret = irq_bypass_register_producer(>call_ctx.producer);

Yes, this code piece can do both register and update. Though it's rare to call 
undate_irq(), however
setup_irq() is very likely to be called for every vq, so this may cause several 
rounds of useless irq_bypass_unregister_producer().



I'm not sure I get this but do you have a case for this?



is it worth for simplify the code?



Less code(bug).





+
  static void vhost_vdpa_reset(struct vhost_vdpa *v)
  {
  struct vdpa_device *vdpa = v->vdpa;
@@ -155,11 +204,15 @@ static long vhost_vdpa_set_status(struct 
vhost_vdpa *v, u8 __user *statusp)

  {
  struct vdpa_device *vdpa = v->vdpa;
  const struct vdpa_config_ops *ops = vdpa->config;
-    u8 status;
+    u8 status, status_old;
+    int nvqs = v->nvqs;
+    u16 i;
    if (copy_from_user(, statusp, sizeof(status)))
  return -EFAULT;
  +    status_old = ops->get_

Re: [PATCH V5 1/6] vhost: introduce vhost_vring_call

2020-08-04 Thread Jason Wang


On 2020/8/4 下午5:21, Zhu, Lingshan wrote:

Hi Jason,

we use this lock to protect the eventfd_ctx and irq from race 
conditions,



We don't support irq notification from vDPA device driver in this 
version, do we still have race condition?

as we discussed before:
(1)if vendor change IRQ after DRIVER_OK, through they should not do this, but 
they can.
(2)if user space changes ctx.

Thanks



Yes, but then everything happens in context of syscall (ioctl), so vq 
mutex is sufficient I guess?


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 1/6] vhost: introduce vhost_vring_call

2020-08-04 Thread Jason Wang


On 2020/8/4 下午5:21, Michael S. Tsirkin wrote:

   +struct vhost_vring_call {
+    struct eventfd_ctx *ctx;
+    struct irq_bypass_producer producer;
+    spinlock_t ctx_lock;

It's not clear to me why we need ctx_lock here.

Thanks

Hi Jason,

we use this lock to protect the eventfd_ctx and irq from race conditions,

We don't support irq notification from vDPA device driver in this version,
do we still have race condition?

Thanks

Jason I'm not sure what you are trying to say here.



I meant we change the API from V4 so driver won't notify us if irq is 
changed.


Then it looks to me there's no need for the ctx_lock, everyhing could be 
synchronized with vq mutex.


Thanks






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 1/6] vhost: introduce vhost_vring_call

2020-08-04 Thread Jason Wang


On 2020/8/4 下午4:42, Zhu, Lingshan wrote:



On 8/4/2020 4:38 PM, Jason Wang wrote:


On 2020/7/31 下午2:55, Zhu Lingshan wrote:

This commit introduces struct vhost_vring_call which replaced
raw struct eventfd_ctx *call_ctx in struct vhost_virtqueue.
Besides eventfd_ctx, it contains a spin lock and an
irq_bypass_producer in its structure.

Signed-off-by: Zhu Lingshan 
Suggested-by: Jason Wang 
---
  drivers/vhost/vdpa.c  |  4 ++--
  drivers/vhost/vhost.c | 22 --
  drivers/vhost/vhost.h |  9 -
  3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index a54b60d6623f..df3cf386b0cd 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -96,7 +96,7 @@ static void handle_vq_kick(struct vhost_work *work)
  static irqreturn_t vhost_vdpa_virtqueue_cb(void *private)
  {
  struct vhost_virtqueue *vq = private;
-    struct eventfd_ctx *call_ctx = vq->call_ctx;
+    struct eventfd_ctx *call_ctx = vq->call_ctx.ctx;
    if (call_ctx)
  eventfd_signal(call_ctx, 1);
@@ -382,7 +382,7 @@ static long vhost_vdpa_vring_ioctl(struct 
vhost_vdpa *v, unsigned int cmd,

  break;
    case VHOST_SET_VRING_CALL:
-    if (vq->call_ctx) {
+    if (vq->call_ctx.ctx) {
  cb.callback = vhost_vdpa_virtqueue_cb;
  cb.private = vq;
  } else {
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d7b8df3edffc..9f1a845a9302 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -298,6 +298,13 @@ static void vhost_vq_meta_reset(struct 
vhost_dev *d)

  __vhost_vq_meta_reset(d->vqs[i]);
  }
  +static void vhost_vring_call_reset(struct vhost_vring_call 
*call_ctx)

+{
+    call_ctx->ctx = NULL;
+    memset(_ctx->producer, 0x0, sizeof(struct 
irq_bypass_producer));

+    spin_lock_init(_ctx->ctx_lock);
+}
+
  static void vhost_vq_reset(struct vhost_dev *dev,
 struct vhost_virtqueue *vq)
  {
@@ -319,13 +326,13 @@ static void vhost_vq_reset(struct vhost_dev *dev,
  vq->log_base = NULL;
  vq->error_ctx = NULL;
  vq->kick = NULL;
-    vq->call_ctx = NULL;
  vq->log_ctx = NULL;
  vhost_reset_is_le(vq);
  vhost_disable_cross_endian(vq);
  vq->busyloop_timeout = 0;
  vq->umem = NULL;
  vq->iotlb = NULL;
+    vhost_vring_call_reset(>call_ctx);
  __vhost_vq_meta_reset(vq);
  }
  @@ -685,8 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
  if (dev->vqs[i]->kick)
  fput(dev->vqs[i]->kick);
-    if (dev->vqs[i]->call_ctx)
- eventfd_ctx_put(dev->vqs[i]->call_ctx);
+    if (dev->vqs[i]->call_ctx.ctx)
+ eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
  vhost_vq_reset(dev, dev->vqs[i]);
  }
  vhost_dev_free_iovecs(dev);
@@ -1629,7 +1636,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, 
unsigned int ioctl, void __user *arg

  r = PTR_ERR(ctx);
  break;
  }
-    swap(ctx, vq->call_ctx);
+
+    spin_lock(>call_ctx.ctx_lock);
+    swap(ctx, vq->call_ctx.ctx);
+    spin_unlock(>call_ctx.ctx_lock);
  break;
  case VHOST_SET_VRING_ERR:
  if (copy_from_user(, argp, sizeof f)) {
@@ -2440,8 +2450,8 @@ static bool vhost_notify(struct vhost_dev 
*dev, struct vhost_virtqueue *vq)

  void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
  {
  /* Signal the Guest tell them we used something up. */
-    if (vq->call_ctx && vhost_notify(dev, vq))
-    eventfd_signal(vq->call_ctx, 1);
+    if (vq->call_ctx.ctx && vhost_notify(dev, vq))
+    eventfd_signal(vq->call_ctx.ctx, 1);
  }
  EXPORT_SYMBOL_GPL(vhost_signal);
  diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index c8e96a095d3b..38eb1aa3b68d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
    struct vhost_work;
  typedef void (*vhost_work_fn_t)(struct vhost_work *work);
@@ -60,6 +61,12 @@ enum vhost_uaddr_type {
  VHOST_NUM_ADDRS = 3,
  };
  +struct vhost_vring_call {
+    struct eventfd_ctx *ctx;
+    struct irq_bypass_producer producer;
+    spinlock_t ctx_lock;



It's not clear to me why we need ctx_lock here.

Thanks

Hi Jason,

we use this lock to protect the eventfd_ctx and irq from race conditions,



We don't support irq notification from vDPA device driver in this 
version, do we still have race condition?


Thanks



  are you suggesting a better name?

Thanks




+};
+
  /* The virtqueue structure describes a queue attached to a device. */
  struct vhost_virtqueue {
  struct vhost_dev *dev;
@@ -72,7 +79,7 @@ struct vhost_virtqueue {
  vring_used_t __user *used;
  const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS];
  struct file *kick;

Re: [PATCH V5 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-08-04 Thread Jason Wang


On 2020/7/31 下午2:55, Zhu Lingshan wrote:

This patch introduce a set of functions for setup/unsetup
and update irq offloading respectively by register/unregister
and re-register the irq_bypass_producer.

With these functions, this commit can setup/unsetup
irq offloading through setting DRIVER_OK/!DRIVER_OK, and
update irq offloading through SET_VRING_CALL.

Signed-off-by: Zhu Lingshan 
Suggested-by: Jason Wang 
---
  drivers/vhost/Kconfig |  1 +
  drivers/vhost/vdpa.c  | 79 ++-
  2 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index d3688c6afb87..587fbae06182 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -65,6 +65,7 @@ config VHOST_VDPA
tristate "Vhost driver for vDPA-based backend"
depends on EVENTFD
select VHOST
+   select IRQ_BYPASS_MANAGER
depends on VDPA
help
  This kernel module can be loaded in host kernel to accelerate
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index df3cf386b0cd..278ea2f00172 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -115,6 +115,55 @@ static irqreturn_t vhost_vdpa_config_cb(void *private)
return IRQ_HANDLED;
  }
  
+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, u16 qid)

+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);
+   if (!vq->call_ctx.ctx || irq < 0) {
+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+
+   vq->call_ctx.producer.token = vq->call_ctx.ctx;
+   vq->call_ctx.producer.irq = irq;
+   ret = irq_bypass_register_producer(>call_ctx.producer);
+   spin_unlock(>call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)
+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq_bypass_unregister_producer(>call_ctx.producer);



Any reason for not checking vq->call_ctx.producer.irq as below here?



+   spin_unlock(>call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_update_vq_irq(struct vhost_virtqueue *vq)
+{
+   spin_lock(>call_ctx.ctx_lock);
+   /*
+* if it has a non-zero irq, means there is a
+* previsouly registered irq_bypass_producer,
+* we should update it when ctx (its token)
+* changes.
+*/
+   if (!vq->call_ctx.producer.irq) {
+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+
+   irq_bypass_unregister_producer(>call_ctx.producer);
+   vq->call_ctx.producer.token = vq->call_ctx.ctx;
+   irq_bypass_register_producer(>call_ctx.producer);
+   spin_unlock(>call_ctx.ctx_lock);
+}



I think setup_irq() and update_irq() could be unified with the following 
logic:


irq_bypass_unregister_producer(>call_ctx.producer);
irq = ops->get_vq_irq(vdpa, qid);
    if (!vq->call_ctx.ctx || irq < 0) {
        spin_unlock(>call_ctx.ctx_lock);
        return;
    }

vq->call_ctx.producer.token = vq->call_ctx.ctx;
vq->call_ctx.producer.irq = irq;
ret = irq_bypass_register_producer(>call_ctx.producer);


+
  static void vhost_vdpa_reset(struct vhost_vdpa *v)
  {
struct vdpa_device *vdpa = v->vdpa;
@@ -155,11 +204,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, 
u8 __user *statusp)
  {
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
-   u8 status;
+   u8 status, status_old;
+   int nvqs = v->nvqs;
+   u16 i;
  
  	if (copy_from_user(, statusp, sizeof(status)))

return -EFAULT;
  
+	status_old = ops->get_status(vdpa);

+
/*
 * Userspace shouldn't remove status bits unless reset the
 * status to 0.
@@ -169,6 +222,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
  
  	ops->set_status(vdpa, status);
  
+	/* vq irq is not expected to be changed once DRIVER_OK is set */



Let's move this comment to the get_vq_irq bus operation.



+   if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & 
VIRTIO_CONFIG_S_DRIVER_OK))
+   for (i = 0; i < nvqs; i++)
+   vhost_vdpa_setup_vq_irq(v, i);
+
+   if ((status_old & VIRTIO_CONFIG_S_DRIVER_OK) && !(status & 
VIRTIO_CONFIG_S_DRIVER_OK))
+   for (i = 0; i < nvqs; i++)
+   vhost_vdpa_unsetup_vq_irq(v, i);
+
return 0;
  }
  
@@ -332,6 +394,7 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa *v, u32 __user *argp)
  
  	return 0;

  }
+
  static long vhost_vd

Re: [PATCH V5 1/6] vhost: introduce vhost_vring_call

2020-08-04 Thread Jason Wang


On 2020/7/31 下午2:55, Zhu Lingshan wrote:

This commit introduces struct vhost_vring_call which replaced
raw struct eventfd_ctx *call_ctx in struct vhost_virtqueue.
Besides eventfd_ctx, it contains a spin lock and an
irq_bypass_producer in its structure.

Signed-off-by: Zhu Lingshan 
Suggested-by: Jason Wang 
---
  drivers/vhost/vdpa.c  |  4 ++--
  drivers/vhost/vhost.c | 22 --
  drivers/vhost/vhost.h |  9 -
  3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index a54b60d6623f..df3cf386b0cd 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -96,7 +96,7 @@ static void handle_vq_kick(struct vhost_work *work)
  static irqreturn_t vhost_vdpa_virtqueue_cb(void *private)
  {
struct vhost_virtqueue *vq = private;
-   struct eventfd_ctx *call_ctx = vq->call_ctx;
+   struct eventfd_ctx *call_ctx = vq->call_ctx.ctx;
  
  	if (call_ctx)

eventfd_signal(call_ctx, 1);
@@ -382,7 +382,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
break;
  
  	case VHOST_SET_VRING_CALL:

-   if (vq->call_ctx) {
+   if (vq->call_ctx.ctx) {
cb.callback = vhost_vdpa_virtqueue_cb;
cb.private = vq;
} else {
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d7b8df3edffc..9f1a845a9302 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -298,6 +298,13 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
__vhost_vq_meta_reset(d->vqs[i]);
  }
  
+static void vhost_vring_call_reset(struct vhost_vring_call *call_ctx)

+{
+   call_ctx->ctx = NULL;
+   memset(_ctx->producer, 0x0, sizeof(struct irq_bypass_producer));
+   spin_lock_init(_ctx->ctx_lock);
+}
+
  static void vhost_vq_reset(struct vhost_dev *dev,
   struct vhost_virtqueue *vq)
  {
@@ -319,13 +326,13 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->log_base = NULL;
vq->error_ctx = NULL;
vq->kick = NULL;
-   vq->call_ctx = NULL;
vq->log_ctx = NULL;
vhost_reset_is_le(vq);
vhost_disable_cross_endian(vq);
vq->busyloop_timeout = 0;
vq->umem = NULL;
vq->iotlb = NULL;
+   vhost_vring_call_reset(>call_ctx);
__vhost_vq_meta_reset(vq);
  }
  
@@ -685,8 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)

eventfd_ctx_put(dev->vqs[i]->error_ctx);
if (dev->vqs[i]->kick)
fput(dev->vqs[i]->kick);
-   if (dev->vqs[i]->call_ctx)
-   eventfd_ctx_put(dev->vqs[i]->call_ctx);
+   if (dev->vqs[i]->call_ctx.ctx)
+   eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
vhost_vq_reset(dev, dev->vqs[i]);
}
vhost_dev_free_iovecs(dev);
@@ -1629,7 +1636,10 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
r = PTR_ERR(ctx);
break;
}
-   swap(ctx, vq->call_ctx);
+
+   spin_lock(>call_ctx.ctx_lock);
+   swap(ctx, vq->call_ctx.ctx);
+   spin_unlock(>call_ctx.ctx_lock);
break;
case VHOST_SET_VRING_ERR:
if (copy_from_user(, argp, sizeof f)) {
@@ -2440,8 +2450,8 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
  void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
  {
/* Signal the Guest tell them we used something up. */
-   if (vq->call_ctx && vhost_notify(dev, vq))
-   eventfd_signal(vq->call_ctx, 1);
+   if (vq->call_ctx.ctx && vhost_notify(dev, vq))
+   eventfd_signal(vq->call_ctx.ctx, 1);
  }
  EXPORT_SYMBOL_GPL(vhost_signal);
  
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h

index c8e96a095d3b..38eb1aa3b68d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  
  struct vhost_work;

  typedef void (*vhost_work_fn_t)(struct vhost_work *work);
@@ -60,6 +61,12 @@ enum vhost_uaddr_type {
VHOST_NUM_ADDRS = 3,
  };
  
+struct vhost_vring_call {

+   struct eventfd_ctx *ctx;
+   struct irq_bypass_producer producer;
+   spinlock_t ctx_lock;



It's not clear to me why we need ctx_lock here.

Thanks



+};
+
  /* The virtqueue structure describes a queue attached to a device. */
  struct vhost_virtqueue {
struct vhost_dev *dev;
@@ -72,7 +79,7 @@ struct vhost_virtqueue {
vring_used_t __user *used;
const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS];
struct file *kick;
-   struct eve

Re: [PATCH -next v3] virtio_ring: Avoid loop when vq is broken in virtqueue_poll

2020-08-03 Thread Jason Wang


On 2020/8/2 下午3:44, Mao Wenan wrote:

The loop may exist if vq->broken is true,
virtqueue_get_buf_ctx_packed or virtqueue_get_buf_ctx_split
will return NULL, so virtnet_poll will reschedule napi to
receive packet, it will lead cpu usage(si) to 100%.

call trace as below:
virtnet_poll
virtnet_receive
virtqueue_get_buf_ctx
virtqueue_get_buf_ctx_packed
virtqueue_get_buf_ctx_split
virtqueue_napi_complete
virtqueue_poll   //return true
virtqueue_napi_schedule //it will reschedule napi

to fix this, return false if vq is broken in virtqueue_poll.

Signed-off-by: Mao Wenan 
Acked-by: Michael S. Tsirkin 
---
  v2->v3: change subject, original is : "virtio_net: Avoid loop in virtnet_poll"
  v1->v2: fix it in virtqueue_poll suggested by Michael S. Tsirkin 

  drivers/virtio/virtio_ring.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 58b96ba..4f7c73e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1960,6 +1960,9 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned 
last_used_idx)
  {
struct vring_virtqueue *vq = to_vvq(_vq);
  
+	if (unlikely(vq->broken))

+   return false;
+
virtio_mb(vq->weak_barriers);
return vq->packed_ring ? virtqueue_poll_packed(_vq, last_used_idx) :
 virtqueue_poll_split(_vq, last_used_idx);



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_pci_modern: Fix the comment of virtio_pci_find_capability()

2020-08-03 Thread Jason Wang


On 2020/8/3 下午7:52, Yi Wang wrote:

From: Liao Pingfang 

Fix the comment of virtio_pci_find_capability() by adding missing comment
for the last parameter: bars.

Fixes: 59a5b0f7bf74 ("virtio-pci: alloc only resources actually used.")
Signed-off-by: Liao Pingfang 
Signed-off-by: Yi Wang 



Acked-by: Jason Wang 



---
  drivers/virtio/virtio_pci_modern.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index db93cedd262f..9bdc6f68221f 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -481,6 +481,7 @@ static const struct virtio_config_ops virtio_pci_config_ops 
= {
   * @dev: the pci device
   * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
   * @ioresource_types: IORESOURCE_MEM and/or IORESOURCE_IO.
+ * @bars: the bitmask of BARs
   *
   * Returns offset of the capability, or 0.
   */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] vdpasim: protect concurrent access to iommu iotlb

2020-07-31 Thread Jason Wang
From: Max Gurtovoy 

Iommu iotlb can be accessed by different cores for performing IO using
multiple virt queues. Add a spinlock to synchronize iotlb accesses.

This could be easily reproduced when using more than 1 pktgen threads
to inject traffic to vdpa simulator.

Fixes: 2c53d0f64c06f("vdpasim: vDPA device simulator")
Cc: sta...@vger.kernel.org
Signed-off-by: Max Gurtovoy 
Signed-off-by: Jason Wang 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a9bc5e0fb353..5b5725d951ce 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -70,6 +70,8 @@ struct vdpasim {
u32 status;
u32 generation;
u64 features;
+   /* spinlock to synchronize iommu table */
+   spinlock_t iommu_lock;
 };
 
 static struct vdpasim *vdpasim_dev;
@@ -118,7 +120,9 @@ static void vdpasim_reset(struct vdpasim *vdpasim)
for (i = 0; i < VDPASIM_VQ_NUM; i++)
vdpasim_vq_reset(>vqs[i]);
 
+   spin_lock(>iommu_lock);
vhost_iotlb_reset(vdpasim->iommu);
+   spin_unlock(>iommu_lock);
 
vdpasim->features = 0;
vdpasim->status = 0;
@@ -236,8 +240,10 @@ static dma_addr_t vdpasim_map_page(struct device *dev, 
struct page *page,
/* For simplicity, use identical mapping to avoid e.g iova
 * allocator.
 */
+   spin_lock(>iommu_lock);
ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1,
pa, dir_to_perm(dir));
+   spin_unlock(>iommu_lock);
if (ret)
return DMA_MAPPING_ERROR;
 
@@ -251,8 +257,10 @@ static void vdpasim_unmap_page(struct device *dev, 
dma_addr_t dma_addr,
struct vdpasim *vdpasim = dev_to_sim(dev);
struct vhost_iotlb *iommu = vdpasim->iommu;
 
+   spin_lock(>iommu_lock);
vhost_iotlb_del_range(iommu, (u64)dma_addr,
  (u64)dma_addr + size - 1);
+   spin_unlock(>iommu_lock);
 }
 
 static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
@@ -264,9 +272,10 @@ static void *vdpasim_alloc_coherent(struct device *dev, 
size_t size,
void *addr = kmalloc(size, flag);
int ret;
 
-   if (!addr)
+   spin_lock(>iommu_lock);
+   if (!addr) {
*dma_addr = DMA_MAPPING_ERROR;
-   else {
+   } else {
u64 pa = virt_to_phys(addr);
 
ret = vhost_iotlb_add_range(iommu, (u64)pa,
@@ -279,6 +288,7 @@ static void *vdpasim_alloc_coherent(struct device *dev, 
size_t size,
} else
*dma_addr = (dma_addr_t)pa;
}
+   spin_unlock(>iommu_lock);
 
return addr;
 }
@@ -290,8 +300,11 @@ static void vdpasim_free_coherent(struct device *dev, 
size_t size,
struct vdpasim *vdpasim = dev_to_sim(dev);
struct vhost_iotlb *iommu = vdpasim->iommu;
 
+   spin_lock(>iommu_lock);
vhost_iotlb_del_range(iommu, (u64)dma_addr,
  (u64)dma_addr + size - 1);
+   spin_unlock(>iommu_lock);
+
kfree(phys_to_virt((uintptr_t)dma_addr));
 }
 
@@ -532,6 +545,7 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,
u64 start = 0ULL, last = 0ULL - 1;
int ret;
 
+   spin_lock(>iommu_lock);
vhost_iotlb_reset(vdpasim->iommu);
 
for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
@@ -541,10 +555,12 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,
if (ret)
goto err;
}
+   spin_unlock(>iommu_lock);
return 0;
 
 err:
vhost_iotlb_reset(vdpasim->iommu);
+   spin_unlock(>iommu_lock);
return ret;
 }
 
@@ -552,16 +568,23 @@ static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 
iova, u64 size,
   u64 pa, u32 perm)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+   int ret;
 
-   return vhost_iotlb_add_range(vdpasim->iommu, iova,
-iova + size - 1, pa, perm);
+   spin_lock(>iommu_lock);
+   ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa,
+   perm);
+   spin_unlock(>iommu_lock);
+
+   return ret;
 }
 
 static int vdpasim_dma_unmap(struct vdpa_device *vdpa, u64 iova, u64 size)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
+   spin_lock(>iommu_lock);
vhost_iotlb_del_range(vdpasim->iommu, iova, iova + size - 1);
+   spin_unlock(>iommu_lock);
 
return 0;
 }
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 0/6] IRQ offloading for vDPA

2020-07-30 Thread Jason Wang


On 2020/7/28 下午12:23, Zhu Lingshan wrote:

This series intends to implement IRQ offloading for
vhost_vdpa.

By the feat of irq forwarding facilities like posted
interrupt on X86, irq bypass can  help deliver
interrupts to vCPU directly.

vDPA devices have dedicated hardware backends like VFIO
pass-throughed devices. So it would be possible to setup
irq offloading(irq bypass) for vDPA devices and gain
performance improvements.

In my testing, with this feature, we can save 0.1ms
in a ping between two VFs on average.
changes from V3:
(1)removed vDPA irq allocate/free helpers in vDPA core.
(2)add a new function get_vq_irq() in struct vdpa_config_ops,
upper layer driver can use this function to: A. query the
irq numbner of a vq. B. detect whether a vq is enabled.
(3)implement get_vq_irq() in ifcvf driver.
(4)in vhost_vdpa, set_status() will setup irq offloading when
setting DRIVER_OK, and unsetup when receive !DRIVER_OK.
(5)minor improvements.



Ok, I think you can go ahead to post a V5. It's not bad to start with 
get_vq_irq() and we can do any changes afterwards if it can work for 
some cases.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-30 Thread Jason Wang


On 2020/7/29 下午10:15, Eli Cohen wrote:

OK, we have a mode of operation that does not require driver
intervention to manipulate the event queues so I think we're ok with
this design.



Good to know this.

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-blk: fix discard buffer overrun

2020-07-30 Thread Jason Wang


On 2020/7/30 下午4:30, Jeffle Xu wrote:

Before commit eded341c085b ("block: don't decrement nr_phys_segments for
physically contigous segments") applied, the generic block layer may not
guarantee that @req->nr_phys_segments equals the number of bios in the
request. When limits.@max_discard_segments == 1 and the IO scheduler is
set to scheduler except for "none" (mq-deadline, kyber or bfq, e.g.),
the requests merge routine may be called when enqueuing a DISCARD bio.
When merging two requests, @req->nr_phys_segments will minus one if the
middle two bios of the requests can be merged into one physical segment,
causing @req->nr_phys_segments one less than the number of bios in the
DISCARD request.

In this case, we are in risk of overruning virtio_blk_discard_write_zeroes
buffers. Though this issue has been fixed by commit eded341c085b
("block: don't decrement nr_phys_segments for physically contigous segments"),
it can recure once the generic block layer breaks the guarantee as code
refactoring.

commit 8cb6af7b3a6d ("nvme: Fix discard buffer overrun") has fixed the similar
issue in nvme driver. This patch is also inspired by this commit.

Signed-off-by: Jeffle Xu 
Reviewed-by: Joseph Qi 
---
  drivers/block/virtio_blk.c | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 980df853ee49..248c8f46b51c 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -130,12 +130,19 @@ static int virtblk_setup_discard_write_zeroes(struct 
request *req, bool unmap)
u64 sector = bio->bi_iter.bi_sector;
u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
  
-		range[n].flags = cpu_to_le32(flags);

-   range[n].num_sectors = cpu_to_le32(num_sectors);
-   range[n].sector = cpu_to_le64(sector);
+   if (n < segments) {
+   range[n].flags = cpu_to_le32(flags);
+   range[n].num_sectors = cpu_to_le32(num_sectors);
+   range[n].sector = cpu_to_le64(sector);
+   }



Not familiar with block but if we start to duplicate checks like this, 
it's a hint to move it in the core.




n++;
}
  
+	if (WARN_ON_ONCE(n != segments)) {

+   kfree(range);
+   return -EIO;
+   }
+
req->special_vec.bv_page = virt_to_page(range);
req->special_vec.bv_offset = offset_in_page(range);
req->special_vec.bv_len = sizeof(*range) * segments;
@@ -246,8 +253,14 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx 
*hctx,
  
  	if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) {

err = virtblk_setup_discard_write_zeroes(req, unmap);
-   if (err)
-   return BLK_STS_RESOURCE;
+   if (err) {
+   switch (err) {
+   case -ENOMEM:
+   return BLK_STS_RESOURCE;
+   default:
+   return BLK_STS_IOERR;
+   }
+   }



This looks not elegant, why not simple if (err  == -ENOMEM) else if 
(err) ...


Thanks



}
  
  	num = blk_rq_map_sg(hctx->queue, req, vbr->sg);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Jason Wang


On 2020/7/29 下午5:55, Eli Cohen wrote:

On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:

On 2020/7/28 下午5:04, Eli Cohen wrote:

On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:

+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);
+   if (!vq->call_ctx.ctx || irq == -EINVAL) {
+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+

If I understand correctly, this will cause these IRQs to be forwarded
directly to the VCPU, e.g. will be handled by the guest/qemu.


Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.


So, usually the network driver knows how to handle interrups for its
devices. I assume the virtio_net driver at the guest has some default
processing but what if the underlying hardware device (such as the case
of vdpa) needs to take some actions?



Virtio splits the bus operations out of device operations. So did the 
driver.


The virtio-net driver depends on a transport driver to talk to the real 
device. Usually PCI is used as the transport for the device. In this 
case virtio-pci driver is in charge of dealing with irq 
allocation/free/configuration and it needs to co-operate with platform 
specific irqchip (virtualized by KVM) to finish the work like irq 
acknowledge etc.  E.g for x86, the irq offloading can only work when 
there's a hardware support of virtual irqchip (APICv) then all stuffs 
could be done without vmexits.


So no vendor specific part since the device and transport are all standard.



  Is there an option to do bounce the
interrupt back to the vendor specific driver in the host so it can take
these actions?



Currently not, but even if we can do this, I'm afraid we will lose the 
performance advantage of irq bypassing.






Does this mean that the host will not handle this interrupt? How does it
work in case on level triggered interrupts?


There's no guarantee that the KVM arch code can make sure the irq
bypass work for any type of irq. So if they the irq will still need
to be handled by host first. This means we should keep the host
interrupt handler as a slowpath (fallback).


In the case of ConnectX, I need to execute some code to acknowledge the
interrupt.


This turns out to be hard for irq bypassing to work. Is it because
the irq is shared or what kind of ack you need to do?

I have an EQ which is a queue for events comming from the hardware. This
EQ can created so it reports only completion events but I still need to
execute code that roughly tells the device that I saw these event
records and then arm it again so it can report more interrupts (e.g if
more packets are received or sent). This is device specific code.



Any chance that the hardware can use MSI (which is not the case here)?

Thanks



Thanks



Can you explain how this should be done?



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_balloon: fix up endian-ness for free cmd id

2020-07-29 Thread Jason Wang


On 2020/7/28 上午12:03, Michael S. Tsirkin wrote:

free cmd id is read using virtio endian, spec says all fields
in balloon are LE. Fix it up.

Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: sta...@vger.kernel.org
Signed-off-by: Michael S. Tsirkin 
---
  drivers/virtio/virtio_balloon.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 774deb65a9bb..798ec304fe3e 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -578,10 +578,14 @@ static int init_vqs(struct virtio_balloon *vb)
  static u32 virtio_balloon_cmd_id_received(struct virtio_balloon *vb)
  {
if (test_and_clear_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
-  >config_read_bitmap))
+  >config_read_bitmap)) {
virtio_cread(vb->vdev, struct virtio_balloon_config,
 free_page_hint_cmd_id,
 >cmd_id_received_cache);
+   /* Legacy balloon config space is LE, unlike all other devices. 
*/
+   if (!virtio_has_feature(vb->vdev, VIRTIO_F_VERSION_1))
+   vb->cmd_id_received_cache = le32_to_cpu((__force 
__le32)vb->cmd_id_received_cache);
+   }
  
  	return vb->cmd_id_received_cache;

  }



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Jason Wang


On 2020/7/28 下午5:04, Eli Cohen wrote:

On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
  
+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)

+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);
+   if (!vq->call_ctx.ctx || irq == -EINVAL) {
+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+

If I understand correctly, this will cause these IRQs to be forwarded
directly to the VCPU, e.g. will be handled by the guest/qemu.



Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.



Does this mean that the host will not handle this interrupt? How does it
work in case on level triggered interrupts?



There's no guarantee that the KVM arch code can make sure the irq bypass 
work for any type of irq. So if they the irq will still need to be 
handled by host first. This means we should keep the host interrupt 
handler as a slowpath (fallback).





In the case of ConnectX, I need to execute some code to acknowledge the
interrupt.



This turns out to be hard for irq bypassing to work. Is it because the 
irq is shared or what kind of ack you need to do?


Thanks




Can you explain how this should be done?



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-28 Thread Jason Wang


On 2020/7/28 下午5:18, Zhu, Lingshan wrote:


   * status to 0.
@@ -167,6 +220,15 @@ static long vhost_vdpa_set_status(struct 
vhost_vdpa *v, u8 __user *statusp)

  if (status != 0 && (ops->get_status(vdpa) & ~status) != 0)
  return -EINVAL;
  +    /* vq irq is not expected to be changed once DRIVER_OK is set */



So this basically limit the usage of get_vq_irq() in the context 
vhost_vdpa_set_status() and other vDPA bus drivers' set_status(). If 
this is true, there's even no need to introduce any new config ops 
but just let set_status() to return the irqs used for the device. Or 
if we want this to be more generic, we need vpda's own irq manager 
(which should be similar to irq bypass manager). That is:

I think there is no need for a driver to free / re-request its irqs after 
DRIVER_OK though it can do so. If a driver changed its irq of a vq after 
DRIVER_OK, the vq is still operational but will lose irq offloading that is 
reasonable.
If we want set_status() return irqs, we need to record the irqs somewhere in 
vdpa_device,



Why, we can simply pass an array to the driver I think?

void (*set_status)(struct vdpa_device *vdev, u8 status, int *irqs);



as we discussed in a previous thread, this may need initialize and cleanup 
works, so a new ops
with proper comments (don't say they could not change irq, but highlight if irq 
changes, irq offloading will not work till next DRIVER_OK) could be more 
elegant.
However if we really need to change irq after DRIVER_OK, I think maybe we still 
need vDPA vq irq allocate / free helpers, then the helpers can not be used in 
probe() as we discussed before, it is a step back to V3 series.



Still, it's not about whether driver may change irq after DRIVER_OK but 
implication of the assumption. If one bus ops must be called in another 
ops, it's better to just implement them in one ops.





- bus driver can register itself as consumer
- vDPA device driver can register itself as producer
- matching via queue index

IMHO, is it too heavy for this feature,



Do you mean LOCs? We can:

1) refactor irq bypass manager
2) invent it our own (a much simplified version compared to bypass manager)
3) enforcing them via vDPA bus

Each of the above should be not a lot of coding. I think method 3 is 
partially done in your previous series but in an implicit manner:


- bus driver that has alloc_irq/free_irq implemented could be implicitly 
treated as consumer registering

- every vDPA device driver could be treated as producer
- vdpa_devm_alloc_irq() could be treated as producer registering
- alloc_irq/free_irq is the add_producer/del_procuer

We probably just lack some synchronization with driver probe/remove.



and how can they match if two individual adapters both have vq idx = 1.



The matching is per vDPA device.

Thanks



Thanks!

- deal with registering/unregistering of consumer/producer

So there's no need to care when or where the vDPA device driver to 
allocate the irq, and we don't need to care at which context the vDPA 
bus driver can use the irq. Either side may get notified when the 
other side is gone (though we only care about the gone of producer 
probably).


Any thought on this?

Thanks




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-28 Thread Jason Wang


On 2020/7/28 下午12:24, Zhu Lingshan wrote:

This patch introduce a set of functions for setup/unsetup
and update irq offloading respectively by register/unregister
and re-register the irq_bypass_producer.

With these functions, this commit can setup/unsetup
irq offloading through setting DRIVER_OK/!DRIVER_OK, and
update irq offloading through SET_VRING_CALL.

Signed-off-by: Zhu Lingshan 
Suggested-by: Jason Wang 
---
  drivers/vhost/Kconfig |  1 +
  drivers/vhost/vdpa.c  | 79 ++-
  2 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index d3688c6afb87..587fbae06182 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -65,6 +65,7 @@ config VHOST_VDPA
tristate "Vhost driver for vDPA-based backend"
depends on EVENTFD
select VHOST
+   select IRQ_BYPASS_MANAGER
depends on VDPA
help
  This kernel module can be loaded in host kernel to accelerate
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index df3cf386b0cd..1dccced321f8 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -115,6 +115,55 @@ static irqreturn_t vhost_vdpa_config_cb(void *private)
return IRQ_HANDLED;
  }
  
+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)

+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);
+   if (!vq->call_ctx.ctx || irq == -EINVAL) {



It's better to check returned irq as "irq < 0" to be more robust. 
Forcing a specific errno value is not good.




+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+
+   vq->call_ctx.producer.token = vq->call_ctx.ctx;
+   vq->call_ctx.producer.irq = irq;
+   ret = irq_bypass_register_producer(>call_ctx.producer);
+   spin_unlock(>call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, int qid)
+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq_bypass_unregister_producer(>call_ctx.producer);
+   spin_unlock(>call_ctx.ctx_lock);
+}
+
+static void vhost_vdpa_update_vq_irq(struct vhost_virtqueue *vq)
+{
+   spin_lock(>call_ctx.ctx_lock);
+   /*
+* if it has a non-zero irq, means there is a
+* previsouly registered irq_bypass_producer,
+* we should update it when ctx (its token)
+* changes.
+*/
+   if (!vq->call_ctx.producer.irq) {
+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+
+   irq_bypass_unregister_producer(>call_ctx.producer);
+   vq->call_ctx.producer.token = vq->call_ctx.ctx;
+   irq_bypass_register_producer(>call_ctx.producer);
+   spin_unlock(>call_ctx.ctx_lock);
+}
+
  static void vhost_vdpa_reset(struct vhost_vdpa *v)
  {
struct vdpa_device *vdpa = v->vdpa;
@@ -155,11 +204,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, 
u8 __user *statusp)
  {
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
-   u8 status;
+   u8 status, status_old;
+   int i, nvqs;
  
  	if (copy_from_user(, statusp, sizeof(status)))

return -EFAULT;
  
+	status_old = ops->get_status(vdpa);

+   nvqs = v->nvqs;
+
/*
 * Userspace shouldn't remove status bits unless reset the
 * status to 0.
@@ -167,6 +220,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
if (status != 0 && (ops->get_status(vdpa) & ~status) != 0)
return -EINVAL;
  
+	/* vq irq is not expected to be changed once DRIVER_OK is set */



So this basically limit the usage of get_vq_irq() in the context 
vhost_vdpa_set_status() and other vDPA bus drivers' set_status(). If 
this is true, there's even no need to introduce any new config ops but 
just let set_status() to return the irqs used for the device. Or if we 
want this to be more generic, we need vpda's own irq manager (which 
should be similar to irq bypass manager). That is:


- bus driver can register itself as consumer
- vDPA device driver can register itself as producer
- matching via queue index
- deal with registering/unregistering of consumer/producer

So there's no need to care when or where the vDPA device driver to 
allocate the irq, and we don't need to care at which context the vDPA 
bus driver can use the irq. Either side may get notified when the other 
side is gone (though we only care about the gone of producer probably).


Any thought on this?

Thanks



+   if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&

Re: [PATCH V4 3/6] vDPA: add get_vq_irq() in vdpa_config_ops

2020-07-28 Thread Jason Wang


On 2020/7/28 下午12:24, Zhu Lingshan wrote:

This commit adds a new function get_vq_irq() in struct
vdpa_config_ops, which will return the irq number of a
virtqueue.

Signed-off-by: Zhu Lingshan 
Suggested-by: Jason Wang 
---
  include/linux/vdpa.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..cebc79173aaa 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -87,6 +87,11 @@ struct vdpa_device {
   *@vdev: vdpa device
   *@idx: virtqueue index
   *Returns the notifcation area
+ * @get_vq_irq:Get the irq number of a virtqueue
+ * @vdev: vdpa device
+ * @idx: virtqueue index
+ * Returns u32: irq number of a virtqueue,
+ * -EINVAL if no irq assigned.



I think we can not get -EINVAL since the function will return a u32.

Thanks



   * @get_vq_align: Get the virtqueue align requirement
   *for the device
   *@vdev: vdpa device
@@ -178,6 +183,7 @@ struct vdpa_config_ops {
u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx);
struct vdpa_notification_area
(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
+   u32 (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
  
  	/* Device ops */

u32 (*get_vq_align)(struct vdpa_device *vdev);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 vhost next 00/10] VDPA support for Mellanox ConnectX devices

2020-07-28 Thread Jason Wang


On 2020/7/28 下午2:40, Jason Wang wrote:


On 2020/7/28 下午2:32, Eli Cohen wrote:

On Tue, Jul 28, 2020 at 02:18:16PM +0800, Jason Wang wrote:

On 2020/7/28 下午2:05, Eli Cohen wrote:

Hi Michael,
please note that this series depends on mlx5 core device driver 
patches

in mlx5-next branch in
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git.

git pull 
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
mlx5-next


They also depend Jason Wang's patches submitted a couple of weeks ago.

vdpa_sim: use the batching API
vhost-vdpa: support batch updating


Just notice that a new version is posted[1] (you were cced). Talked
with Michael, and it's better for you to merge the new version in
this series.


OK, will send again. Just to make sure, I should put your series and my
series on Michal's vhost branch, the same tree I have been using so far?



Yes. I think so.

Thanks



Just notice Michael's vhost branch can not compile due to this commit:

commit fee8fe6bd8ccacd27e963b71b4f943be3721779e
Author: Michael S. Tsirkin 
Date:   Mon Jul 27 10:51:55 2020 -0400

    vdpa: make sure set_features in invoked for legacy

Let's wait for Michael to clarify the correct branch to use then.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 vhost next 00/10] VDPA support for Mellanox ConnectX devices

2020-07-28 Thread Jason Wang


On 2020/7/28 下午2:32, Eli Cohen wrote:

On Tue, Jul 28, 2020 at 02:18:16PM +0800, Jason Wang wrote:

On 2020/7/28 下午2:05, Eli Cohen wrote:

Hi Michael,
please note that this series depends on mlx5 core device driver patches
in mlx5-next branch in
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git.

git pull git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
mlx5-next

They also depend Jason Wang's patches submitted a couple of weeks ago.

vdpa_sim: use the batching API
vhost-vdpa: support batch updating


Just notice that a new version is posted[1] (you were cced). Talked
with Michael, and it's better for you to merge the new version in
this series.


OK, will send again. Just to make sure, I should put your series and my
series on Michal's vhost branch, the same tree I have been using so far?



Yes. I think so.

Thanks





Sorry for not spotting this before.

[1] https://lkml.org/lkml/2020/7/1/301

Thanks




The following series of patches provide VDPA support for Mellanox
devices. The supported devices are ConnectX6 DX and newer.

Currently, only a network driver is implemented; future patches will
introduce a block device driver. iperf performance on a single queue is
around 12 Gbps.  Future patches will introduce multi queue support.

The files are organized in such a way that code that can be used by
different VDPA implementations will be placed in a common are resides in
drivers/vdpa/mlx5/core.

Only virtual functions are currently supported. Also, certain firmware
capabilities must be set to enable the driver. Physical functions (PFs)
are skipped by the driver.

To make use of the VDPA net driver, one must load mlx5_vdpa. In such
case, VFs will be operated by the VDPA driver. Although one can see a
regular instance of a network driver on the VF, the VDPA driver takes
precedence over the NIC driver, steering-wize.

Currently, the device/interface infrastructure in mlx5_core is used to
probe drivers. Future patches will introduce virtbus as a means to
register devices and drivers and VDPA will be adapted to it.

The mlx5 mode of operation required to support VDPA is switchdev mode.
Once can use Linux or OVS bridge to take care of layer 2 switching.

In order to provide virtio networking to a guest, an updated version of
qemu is required. This version has been tested by the following quemu
version:

url: https://github.com/jasowang/qemu.git
branch: vdpa
Commit ID: 6f4e59b807db


V2->V3
Fix makefile to use include path relative to the root of the kernel

Eli Cohen (7):
   net/vdpa: Use struct for set/get vq state
   vhost: Fix documentation
   vdpa: Modify get_vq_state() to return error code
   vdpa/mlx5: Add hardware descriptive header file
   vdpa/mlx5: Add support library for mlx5 VDPA implementation
   vdpa/mlx5: Add shared memory registration code
   vdpa/mlx5: Add VDPA driver for supported mlx5 devices

Jason Wang (2):
   vhost-vdpa: support batch updating
   vdpa_sim: use the batching API

Max Gurtovoy (1):
   vdpa: remove hard coded virtq num

  drivers/vdpa/Kconfig   |   18 +
  drivers/vdpa/Makefile  |1 +
  drivers/vdpa/ifcvf/ifcvf_base.c|4 +-
  drivers/vdpa/ifcvf/ifcvf_base.h|4 +-
  drivers/vdpa/ifcvf/ifcvf_main.c|   13 +-
  drivers/vdpa/mlx5/Makefile |4 +
  drivers/vdpa/mlx5/core/mlx5_vdpa.h |   91 ++
  drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h |  168 ++
  drivers/vdpa/mlx5/core/mr.c|  473 ++
  drivers/vdpa/mlx5/core/resources.c |  284 
  drivers/vdpa/mlx5/net/main.c   |   76 +
  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 1950 
  drivers/vdpa/mlx5/net/mlx5_vnet.h  |   24 +
  drivers/vdpa/vdpa.c|3 +
  drivers/vdpa/vdpa_sim/vdpa_sim.c   |   35 +-
  drivers/vhost/iotlb.c  |4 +-
  drivers/vhost/vdpa.c   |   46 +-
  include/linux/vdpa.h   |   24 +-
  include/uapi/linux/vhost_types.h   |2 +
  19 files changed, 3165 insertions(+), 59 deletions(-)
  create mode 100644 drivers/vdpa/mlx5/Makefile
  create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa.h
  create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h
  create mode 100644 drivers/vdpa/mlx5/core/mr.c
  create mode 100644 drivers/vdpa/mlx5/core/resources.c
  create mode 100644 drivers/vdpa/mlx5/net/main.c
  create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.c
  create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 vhost next 00/10] VDPA support for Mellanox ConnectX devices

2020-07-28 Thread Jason Wang


On 2020/7/28 下午2:05, Eli Cohen wrote:

Hi Michael,
please note that this series depends on mlx5 core device driver patches
in mlx5-next branch in
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git.

git pull git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
mlx5-next

They also depend Jason Wang's patches submitted a couple of weeks ago.

vdpa_sim: use the batching API
vhost-vdpa: support batch updating



Just notice that a new version is posted[1] (you were cced). Talked with 
Michael, and it's better for you to merge the new version in this series.


Sorry for not spotting this before.

[1] https://lkml.org/lkml/2020/7/1/301

Thanks





The following series of patches provide VDPA support for Mellanox
devices. The supported devices are ConnectX6 DX and newer.

Currently, only a network driver is implemented; future patches will
introduce a block device driver. iperf performance on a single queue is
around 12 Gbps.  Future patches will introduce multi queue support.

The files are organized in such a way that code that can be used by
different VDPA implementations will be placed in a common are resides in
drivers/vdpa/mlx5/core.

Only virtual functions are currently supported. Also, certain firmware
capabilities must be set to enable the driver. Physical functions (PFs)
are skipped by the driver.

To make use of the VDPA net driver, one must load mlx5_vdpa. In such
case, VFs will be operated by the VDPA driver. Although one can see a
regular instance of a network driver on the VF, the VDPA driver takes
precedence over the NIC driver, steering-wize.

Currently, the device/interface infrastructure in mlx5_core is used to
probe drivers. Future patches will introduce virtbus as a means to
register devices and drivers and VDPA will be adapted to it.

The mlx5 mode of operation required to support VDPA is switchdev mode.
Once can use Linux or OVS bridge to take care of layer 2 switching.

In order to provide virtio networking to a guest, an updated version of
qemu is required. This version has been tested by the following quemu
version:

url: https://github.com/jasowang/qemu.git
branch: vdpa
Commit ID: 6f4e59b807db


V2->V3
Fix makefile to use include path relative to the root of the kernel

Eli Cohen (7):
   net/vdpa: Use struct for set/get vq state
   vhost: Fix documentation
   vdpa: Modify get_vq_state() to return error code
   vdpa/mlx5: Add hardware descriptive header file
   vdpa/mlx5: Add support library for mlx5 VDPA implementation
   vdpa/mlx5: Add shared memory registration code
   vdpa/mlx5: Add VDPA driver for supported mlx5 devices

Jason Wang (2):
   vhost-vdpa: support batch updating
   vdpa_sim: use the batching API

Max Gurtovoy (1):
   vdpa: remove hard coded virtq num

  drivers/vdpa/Kconfig   |   18 +
  drivers/vdpa/Makefile  |1 +
  drivers/vdpa/ifcvf/ifcvf_base.c|4 +-
  drivers/vdpa/ifcvf/ifcvf_base.h|4 +-
  drivers/vdpa/ifcvf/ifcvf_main.c|   13 +-
  drivers/vdpa/mlx5/Makefile |4 +
  drivers/vdpa/mlx5/core/mlx5_vdpa.h |   91 ++
  drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h |  168 ++
  drivers/vdpa/mlx5/core/mr.c|  473 ++
  drivers/vdpa/mlx5/core/resources.c |  284 
  drivers/vdpa/mlx5/net/main.c   |   76 +
  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 1950 
  drivers/vdpa/mlx5/net/mlx5_vnet.h  |   24 +
  drivers/vdpa/vdpa.c|3 +
  drivers/vdpa/vdpa_sim/vdpa_sim.c   |   35 +-
  drivers/vhost/iotlb.c  |4 +-
  drivers/vhost/vdpa.c   |   46 +-
  include/linux/vdpa.h   |   24 +-
  include/uapi/linux/vhost_types.h   |2 +
  19 files changed, 3165 insertions(+), 59 deletions(-)
  create mode 100644 drivers/vdpa/mlx5/Makefile
  create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa.h
  create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h
  create mode 100644 drivers/vdpa/mlx5/core/mr.c
  create mode 100644 drivers/vdpa/mlx5/core/resources.c
  create mode 100644 drivers/vdpa/mlx5/net/main.c
  create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.c
  create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 1/2] vdpa: ifcvf: return err when fail to request config irq

2020-07-23 Thread Jason Wang
We ignore the err of requesting config interrupt, fix this.

Fixes: e7991f376a4d ("ifcvf: implement config interrupt in IFCVF")
Cc: Zhu Lingshan 
Signed-off-by: Jason Wang 
---
 drivers/vdpa/ifcvf/ifcvf_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index f5a60c14b979..ae7110955a44 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -76,6 +76,10 @@ static int ifcvf_request_irq(struct ifcvf_adapter *adapter)
ret = devm_request_irq(>dev, irq,
   ifcvf_config_changed, 0,
   vf->config_msix_name, vf);
+   if (ret) {
+   IFCVF_ERR(pdev, "Failed to request config irq\n");
+   return ret;
+   }
 
for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) {
snprintf(vf->vring[i].msix_name, 256, "ifcvf[%s]-%d\n",
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] vdpa: ifcvf: free config irq in ifcvf_free_irq()

2020-07-23 Thread Jason Wang
We don't free config irq in ifcvf_free_irq() which will trigger a
BUG() in pci core since we try to free the vectors that has an
action. Fixing this by recording the config irq in ifcvf_hw structure
and free it in ifcvf_free_irq().

Fixes: e7991f376a4d ("ifcvf: implement config interrupt in IFCVF")
Cc: Zhu Lingshan 
Signed-off-by: Jason Wang 
---
 drivers/vdpa/ifcvf/ifcvf_base.h | 2 +-
 drivers/vdpa/ifcvf/ifcvf_main.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h
index f4554412e607..29efa75cdfce 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.h
+++ b/drivers/vdpa/ifcvf/ifcvf_base.h
@@ -84,7 +84,7 @@ struct ifcvf_hw {
void __iomem * const *base;
char config_msix_name[256];
struct vdpa_callback config_cb;
-
+   unsigned int config_irq;
 };
 
 struct ifcvf_adapter {
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index ae7110955a44..7a6d899e541d 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -53,6 +53,7 @@ static void ifcvf_free_irq(struct ifcvf_adapter *adapter, int 
queues)
for (i = 0; i < queues; i++)
devm_free_irq(>dev, vf->vring[i].irq, >vring[i]);
 
+   devm_free_irq(>dev, vf->config_irq, vf);
ifcvf_free_irq_vectors(pdev);
 }
 
@@ -72,8 +73,8 @@ static int ifcvf_request_irq(struct ifcvf_adapter *adapter)
snprintf(vf->config_msix_name, 256, "ifcvf[%s]-config\n",
 pci_name(pdev));
vector = 0;
-   irq = pci_irq_vector(pdev, vector);
-   ret = devm_request_irq(>dev, irq,
+   vf->config_irq = pci_irq_vector(pdev, vector);
+   ret = devm_request_irq(>dev, vf->config_irq,
   ifcvf_config_changed, 0,
   vf->config_msix_name, vf);
if (ret) {
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 3/6] vDPA: implement vq IRQ allocate/free helpers in vDPA core

2020-07-22 Thread Jason Wang


On 2020/7/22 下午6:08, Zhu Lingshan wrote:

+/*
+ * Request irq for a vq, setup irq offloading if its a vhost_vdpa vq.
+ * This function should be only called through setting virtio DRIVER_OK.
+ * If you want to request irq during probe, you should use raw APIs
+ * like request_irq() or devm_request_irq().



This makes the API less flexibile. The reason is we store the irq in 
vhost-vdpa not vDPA.


I wonder whether the following looks better:

1) store irq in vdpa device
2) register producer when DRIVER_OK and unregister producer when 
!DRIVER_OK in vhost-vDPA

3) deal with the synchronization with SET_VRING_CALL
4) document that irq is not expected to be changed during DRIVER_OK

This can make sure the API works during driver probe, and we don't need 
the setup_irq and unsetup_irq method in vdpa_driver


Thanks



+ */
+int vdpa_devm_request_irq(struct device *dev, struct vdpa_device *vdev,
+ unsigned int irq, irq_handler_t handler,
+ unsigned long irqflags, const char *devname, void 
*dev_id,
+ int qid)
+{
+   int ret;
+
+   ret = devm_request_irq(dev, irq, handler, irqflags, devname, dev_id);
+   if (ret)
+   dev_err(dev, "Failed to request irq for vq %d\n", qid);
+   else
+   vdpa_setup_irq(vdev, qid, irq);
+
+   return ret;
+
+}
+EXPORT_SYMBOL_GPL(vdpa_devm_request_irq);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices

2020-07-20 Thread Jason Wang


On 2020/7/20 下午3:14, Eli Cohen wrote:

Add a front end VDPA driver that registers in the VDPA bus and provides
networking to a guest. The VDPA driver creates the necessary resources
on the VF it is driving such that data path will be offloaded.

Notifications are being communicated through the driver.

Currently, only VFs are supported. In subsequent patches we will have
devlink support to control which VF is used for VDPA and which function
is used for regular networking.

Reviewed-by: Parav Pandit
Signed-off-by: Eli Cohen
---
Changes from V0:
1. Fix include path usage
2. Fix use after free in qp_create()
3. Consistently use mvq->initialized to check if a vq was initialized.
4. Remove unused local variable.
5. Defer modifyig vq to ready to driver ok
6. suspend hardware vq in set_vq_ready(0)
7. Remove reservation for control VQ since it multi queue is not supported in 
this version
8. Avoid call put_device() since this is not a pci device driver.



Looks good to me.

Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 vhost next 06/10] vdpa: Modify get_vq_state() to return error code

2020-07-20 Thread Jason Wang


On 2020/7/20 下午3:14, Eli Cohen wrote:

Modify get_vq_state() so it returns an error code. In case of hardware
acceleration, the available index may be retrieved from the device, an
operation that can possibly fail.

Reviewed-by: Parav Pandit 
Signed-off-by: Eli Cohen 



Acked-by: Jason Wang 



---
  drivers/vdpa/ifcvf/ifcvf_main.c  | 5 +++--
  drivers/vdpa/vdpa_sim/vdpa_sim.c | 5 +++--
  drivers/vhost/vdpa.c | 5 -
  include/linux/vdpa.h | 4 ++--
  4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 69032ee97824..d9b5f465ac81 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -235,12 +235,13 @@ static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device 
*vdpa_dev)
return IFCVF_QUEUE_MAX;
  }
  
-static void ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,

-   struct vdpa_vq_state *state)
+static int ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
+  struct vdpa_vq_state *state)
  {
struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
  
  	state->avail_index = ifcvf_get_vq_state(vf, qid);

+   return 0;
  }
  
  static int ifcvf_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 599519039f8d..ddf6086d43c2 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -427,14 +427,15 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, 
u16 idx,
return 0;
  }
  
-static void vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,

-struct vdpa_vq_state *state)
+static int vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
+   struct vdpa_vq_state *state)
  {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
struct vdpasim_virtqueue *vq = >vqs[idx];
struct vringh *vrh = >vring;
  
  	state->avail_index = vrh->last_avail_idx;

+   return 0;
  }
  
  static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index af98c11c9d26..fadad74f882e 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -360,7 +360,10 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
}
  
  	if (cmd == VHOST_GET_VRING_BASE) {

-   ops->get_vq_state(v->vdpa, idx, _state);
+   r = ops->get_vq_state(v->vdpa, idx, _state);
+   if (r)
+   return r;
+
vq->last_avail_idx = vq_state.avail_index;
}
  
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h

index 7b088bebffe8..000d71a9f988 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -185,8 +185,8 @@ struct vdpa_config_ops {
bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
const struct vdpa_vq_state *state);
-   void (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
-struct vdpa_vq_state *state);
+   int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
+   struct vdpa_vq_state *state);
struct vdpa_notification_area
(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 vhost next 05/10] vhost: Fix documentation

2020-07-20 Thread Jason Wang


On 2020/7/20 下午3:14, Eli Cohen wrote:

Fix documentation to match actual function prototypes.

Reviewed-by: Parav Pandit 
Signed-off-by: Eli Cohen 



Acked-by: Jason Wang 



---
  drivers/vhost/iotlb.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 1f0ca6e44410..0d4213a54a88 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -149,7 +149,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_free);
   * vhost_iotlb_itree_first - return the first overlapped range
   * @iotlb: the IOTLB
   * @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte in IOVA range
   */
  struct vhost_iotlb_map *
  vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last)
@@ -162,7 +162,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first);
   * vhost_iotlb_itree_first - return the next overlapped range
   * @iotlb: the IOTLB
   * @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte IOVA range
   */
  struct vhost_iotlb_map *
  vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020-07-20 Thread Jason Wang


On 2020/7/20 下午7:16, Eugenio Pérez wrote:

On Mon, Jul 20, 2020 at 11:27 AM Michael S. Tsirkin  wrote:

On Thu, Jul 16, 2020 at 07:16:27PM +0200, Eugenio Perez Martin wrote:

On Fri, Jul 10, 2020 at 7:58 AM Michael S. Tsirkin  wrote:

On Fri, Jul 10, 2020 at 07:39:26AM +0200, Eugenio Perez Martin wrote:

How about playing with the batch size? Make it a mod parameter instead
of the hard coded 64, and measure for all values 1 to 64 ...

Right, according to the test result, 64 seems to be too aggressive in
the case of TX.


Got it, thanks both!

In particular I wonder whether with batch size 1
we get same performance as without batching
(would indicate 64 is too aggressive)
or not (would indicate one of the code changes
affects performance in an unexpected way).

--
MST


Hi!

Varying batch_size as drivers/vhost/net.c:VHOST_NET_BATCH,

sorry this is not what I meant.

I mean something like this:


diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0b509be8d7b1..b94680e5721d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1279,6 +1279,10 @@ static void handle_rx_net(struct vhost_work *work)
 handle_rx(net);
  }

+MODULE_PARM_DESC(batch_num, "Number of batched descriptors. (offset from 64)");
+module_param(batch_num, int, 0644);
+static int batch_num = 0;
+
  static int vhost_net_open(struct inode *inode, struct file *f)
  {
 struct vhost_net *n;
@@ -1333,7 +1337,7 @@ static int vhost_net_open(struct inode *inode, struct 
file *f)
 vhost_net_buf_init(>vqs[i].rxq);
 }
 vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
-  UIO_MAXIOV + VHOST_NET_BATCH,
+  UIO_MAXIOV + VHOST_NET_BATCH + batch_num,
VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
NULL);


then you can try tweaking batching and playing with mod parameter without
recompiling.


VHOST_NET_BATCH affects lots of other things.


Ok, got it. Since they were aligned from the start, I thought it was a good 
idea to maintain them in-sync.


and testing
the pps as previous mail says. This means that we have either only
vhost_net batching (in base testing, like previously to apply this
patch) or both batching sizes the same.

I've checked that vhost process (and pktgen) goes 100% cpu also.

For tx: Batching decrements always the performance, in all cases. Not
sure why bufapi made things better the last time.

Batching makes improvements until 64 bufs, I see increments of pps but like 1%.

For rx: Batching always improves performance. It seems that if we
batch little, bufapi decreases performance, but beyond 64, bufapi is
much better. The bufapi version keeps improving until I set a batching
of 1024. So I guess it is super good to have a bunch of buffers to
receive.

Since with this test I cannot disable event_idx or things like that,
what would be the next step for testing?

Thanks!

--
Results:
# Buf size: 1,16,32,64,128,256,512

# Tx
# ===
# Base
2293304.308,3396057.769,3540860.615,3636056.077,3332950.846,3694276.154,3689820
# Batch
2286723.857,3307191.643,3400346.571,3452527.786,3460766.857,3431042.5,3440722.286
# Batch + Bufapi
2257970.769,3151268.385,3260150.538,3379383.846,3424028.846,3433384.308,3385635.231,3406554.538

# Rx
# ==
# pktgen results (pps)
1223275,1668868,1728794,1769261,1808574,1837252,1846436
1456924,1797901,1831234,1868746,1877508,1931598,1936402
1368923,1719716,1794373,1865170,1884803,1916021,1975160

# Testpmd pps results
1222698.143,1670604,1731040.6,1769218,1811206,1839308.75,1848478.75
1450140.5,1799985.75,1834089.75,1871290,1880005.5,1934147.25,1939034
1370621,1721858,1796287.75,1866618.5,1885466.5,1918670.75,1976173.5,1988760.75,1978316

pktgen was run again for rx with 1024 and 2048 buf size, giving
1988760.75 and 1978316 pps. Testpmd goes the same way.

Don't really understand what does this data mean.
Which number of descs is batched for each run?


Sorry, I should have explained better. I will expand here, but feel free to 
skip it since we are going to discard the
data anyway. Or to propose a better way to tell them.

Is a CSV with the values I've obtained, in pps, from pktgen and testpmd. This 
way is easy to plot them.

Maybe is easier as tables, if mail readers/gmail does not misalign them.


# Tx
# ===

Base: With the previous code, not integrating any patch. testpmd is txonly 
mode, tap interface is XDP_DROP everything.
We vary VHOST_NET_BATCH (1, 16, 32, ...). As Jason put in a previous mail:

TX: testpmd(txonly) -> virtio-user -> vhost_net -> XDP_DROP on TAP


  1 | 16 | 32 | 64 | 128|256 |  
 512  |
2293304.308| 3396057.769| 3540860.615| 3636056.077| 3332950.846| 3694276.154| 
3689820|

If we add the batching part of the series, but not the bufapi:

   1 | 16 | 32 | 64 | 128|256|  
   512|
2286723.857 | 3307191.643| 3400346.571| 3452527.786| 3460766.857| 3431042.5 | 

Re: [PATCH V2 3/6] vDPA: implement IRQ offloading helpers in vDPA core

2020-07-20 Thread Jason Wang


On 2020/7/21 上午10:02, Zhu, Lingshan wrote:



On 7/20/2020 5:40 PM, Jason Wang wrote:


On 2020/7/20 下午5:07, Zhu, Lingshan wrote:


+}
+
+static void vdpa_unsetup_irq(struct vdpa_device *vdev, int qid)
+{
+    struct vdpa_driver *drv = drv_to_vdpa(vdev->dev.driver);
+
+    if (drv->unsetup_vq_irq)
+    drv->unsetup_vq_irq(vdev, qid);



Do you need to check the existence of drv before calling 
unset_vq_irq()?

Yes, we should check this when we take the releasing path into account.


And how can this synchronize with driver releasing and binding?

Will add an vdpa_unsetup_irq() call in vhsot_vdpa_release().
For binding, I think it is a new dev bound to the the driver,
it should go through the vdpa_setup_irq() routine. or if it is
a device re-bind to vhost_vdpa, I think we have cleaned up
irq_bypass_producer for it as we would call vhdpa_unsetup_irq()
in the release function.



I meant can the following things happen?

1) some vDPA device driver probe the hardware and call 
vdpa_request_irq() in its PCI probe function.

2) vDPA device is probed by vhost-vDPA

Then irq bypass can't work since we when vdpa_unsetup_irq() is 
called, there's no driver bound. Or is there a requirement that 
vdap_request/free_irq() must be called somewhere (e.g in the 
set_status bus operations)? If yes, we need document those requirements.

vdpa_unseup_irq is only called when we want to unregister the producer,



Typo, I meant vdpa_setup_irq().

Thanks



  now we have two code path using it: free_irq and relaase(). I agree we can 
document this requirements for the helpers, these functions can only be called 
through status changes(DRIVER_OK and !DRIVER_OK).

Thanks,
BR
Zhu Lingshan


Thanks



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 3/6] vDPA: implement IRQ offloading helpers in vDPA core

2020-07-20 Thread Jason Wang


On 2020/7/20 下午5:07, Zhu, Lingshan wrote:


+}
+
+static void vdpa_unsetup_irq(struct vdpa_device *vdev, int qid)
+{
+    struct vdpa_driver *drv = drv_to_vdpa(vdev->dev.driver);
+
+    if (drv->unsetup_vq_irq)
+    drv->unsetup_vq_irq(vdev, qid);



Do you need to check the existence of drv before calling unset_vq_irq()?

Yes, we should check this when we take the releasing path into account.


And how can this synchronize with driver releasing and binding?

Will add an vdpa_unsetup_irq() call in vhsot_vdpa_release().
For binding, I think it is a new dev bound to the the driver,
it should go through the vdpa_setup_irq() routine. or if it is
a device re-bind to vhost_vdpa, I think we have cleaned up
irq_bypass_producer for it as we would call vhdpa_unsetup_irq()
in the release function.



I meant can the following things happen?

1) some vDPA device driver probe the hardware and call 
vdpa_request_irq() in its PCI probe function.

2) vDPA device is probed by vhost-vDPA

Then irq bypass can't work since we when vdpa_unsetup_irq() is called, 
there's no driver bound. Or is there a requirement that 
vdap_request/free_irq() must be called somewhere (e.g in the set_status 
bus operations)? If yes, we need document those requirements.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020-07-20 Thread Jason Wang


On 2020/7/17 上午1:16, Eugenio Perez Martin wrote:

On Fri, Jul 10, 2020 at 7:58 AM Michael S. Tsirkin  wrote:

On Fri, Jul 10, 2020 at 07:39:26AM +0200, Eugenio Perez Martin wrote:

How about playing with the batch size? Make it a mod parameter instead
of the hard coded 64, and measure for all values 1 to 64 ...


Right, according to the test result, 64 seems to be too aggressive in
the case of TX.


Got it, thanks both!

In particular I wonder whether with batch size 1
we get same performance as without batching
(would indicate 64 is too aggressive)
or not (would indicate one of the code changes
affects performance in an unexpected way).

--
MST


Hi!

Varying batch_size as drivers/vhost/net.c:VHOST_NET_BATCH,



Did you mean varying the value of VHOST_NET_BATCH itself or the number 
of batched descriptors?




and testing
the pps as previous mail says. This means that we have either only
vhost_net batching (in base testing, like previously to apply this
patch) or both batching sizes the same.

I've checked that vhost process (and pktgen) goes 100% cpu also.

For tx: Batching decrements always the performance, in all cases. Not
sure why bufapi made things better the last time.

Batching makes improvements until 64 bufs, I see increments of pps but like 1%.

For rx: Batching always improves performance. It seems that if we
batch little, bufapi decreases performance, but beyond 64, bufapi is
much better. The bufapi version keeps improving until I set a batching
of 1024. So I guess it is super good to have a bunch of buffers to
receive.

Since with this test I cannot disable event_idx or things like that,
what would be the next step for testing?

Thanks!

--
Results:
# Buf size: 1,16,32,64,128,256,512

# Tx
# ===
# Base
2293304.308,3396057.769,3540860.615,3636056.077,3332950.846,3694276.154,3689820



What's the meaning of buf size in the context of "base"?

And I wonder maybe perf diff can help.

Thanks



# Batch
2286723.857,3307191.643,3400346.571,3452527.786,3460766.857,3431042.5,3440722.286
# Batch + Bufapi
2257970.769,3151268.385,3260150.538,3379383.846,3424028.846,3433384.308,3385635.231,3406554.538

# Rx
# ==
# pktgen results (pps)
1223275,1668868,1728794,1769261,1808574,1837252,1846436
1456924,1797901,1831234,1868746,1877508,1931598,1936402
1368923,1719716,1794373,1865170,1884803,1916021,1975160

# Testpmd pps results
1222698.143,1670604,1731040.6,1769218,1811206,1839308.75,1848478.75
1450140.5,1799985.75,1834089.75,1871290,1880005.5,1934147.25,1939034
1370621,1721858,1796287.75,1866618.5,1885466.5,1918670.75,1976173.5,1988760.75,1978316

pktgen was run again for rx with 1024 and 2048 buf size, giving
1988760.75 and 1978316 pps. Testpmd goes the same way.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] vhost: vdpa: remove per device feature whitelist

2020-07-20 Thread Jason Wang
We used to have a per device feature whitelist to filter out the
unsupported virtio features. But this seems unnecessary since:

- the main idea behind feature whitelist is to block control vq
  feature until we finalize the control virtqueue API. But the current
  vhost-vDPA uAPI is sufficient to support control virtqueue. For
  device that has hardware control virtqueue, the vDPA device driver
  can just setup the hardware virtqueue and let userspace to use
  hardware virtqueue directly. For device that doesn't have a control
  virtqueue, the vDPA device driver need to use e.g vringh to emulate
  a software control virtqueue.
- we don't do it in virtio-vDPA driver

So remove this limitation.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vdpa.c | 37 -
 1 file changed, 37 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 77a0c9fb6cc3..f7f6ddd681ce 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -26,35 +26,6 @@
 
 #include "vhost.h"
 
-enum {
-   VHOST_VDPA_FEATURES =
-   (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
-   (1ULL << VIRTIO_F_ANY_LAYOUT) |
-   (1ULL << VIRTIO_F_VERSION_1) |
-   (1ULL << VIRTIO_F_IOMMU_PLATFORM) |
-   (1ULL << VIRTIO_F_RING_PACKED) |
-   (1ULL << VIRTIO_F_ORDER_PLATFORM) |
-   (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
-   (1ULL << VIRTIO_RING_F_EVENT_IDX),
-
-   VHOST_VDPA_NET_FEATURES = VHOST_VDPA_FEATURES |
-   (1ULL << VIRTIO_NET_F_CSUM) |
-   (1ULL << VIRTIO_NET_F_GUEST_CSUM) |
-   (1ULL << VIRTIO_NET_F_MTU) |
-   (1ULL << VIRTIO_NET_F_MAC) |
-   (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
-   (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
-   (1ULL << VIRTIO_NET_F_GUEST_ECN) |
-   (1ULL << VIRTIO_NET_F_GUEST_UFO) |
-   (1ULL << VIRTIO_NET_F_HOST_TSO4) |
-   (1ULL << VIRTIO_NET_F_HOST_TSO6) |
-   (1ULL << VIRTIO_NET_F_HOST_ECN) |
-   (1ULL << VIRTIO_NET_F_HOST_UFO) |
-   (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-   (1ULL << VIRTIO_NET_F_STATUS) |
-   (1ULL << VIRTIO_NET_F_SPEED_DUPLEX),
-};
-
 /* Currently, only network backend w/o multiqueue is supported. */
 #define VHOST_VDPA_VQ_MAX  2
 
@@ -79,10 +50,6 @@ static DEFINE_IDA(vhost_vdpa_ida);
 
 static dev_t vhost_vdpa_major;
 
-static const u64 vhost_vdpa_features[] = {
-   [VIRTIO_ID_NET] = VHOST_VDPA_NET_FEATURES,
-};
-
 static void handle_vq_kick(struct vhost_work *work)
 {
struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
@@ -255,7 +222,6 @@ static long vhost_vdpa_get_features(struct vhost_vdpa *v, 
u64 __user *featurep)
u64 features;
 
features = ops->get_features(vdpa);
-   features &= vhost_vdpa_features[v->virtio_id];
 
if (copy_to_user(featurep, , sizeof(features)))
return -EFAULT;
@@ -279,9 +245,6 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, 
u64 __user *featurep)
if (copy_from_user(, featurep, sizeof(features)))
return -EFAULT;
 
-   if (features & ~vhost_vdpa_features[v->virtio_id])
-   return -EINVAL;
-
if (ops->set_features(vdpa, features))
return -EINVAL;
 
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 2/6] kvm: detect assigned device via irqbypass manager

2020-07-19 Thread Jason Wang


On 2020/7/18 上午2:08, Alex Williamson wrote:

On Thu, 16 Jul 2020 19:23:45 +0800
Zhu Lingshan  wrote:


vDPA devices has dedicated backed hardware like
passthrough-ed devices. Then it is possible to setup irq
offloading to vCPU for vDPA devices. Thus this patch tries to
manipulated assigned device counters via irqbypass manager.

We will increase/decrease the assigned device counter in kvm/x86.
Both vDPA and VFIO would go through this code path.

This code path only affect x86 for now.

Signed-off-by: Zhu Lingshan 
Suggested-by: Jason Wang 
---
  arch/x86/kvm/x86.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00c88c2..20c07d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10624,11 +10624,17 @@ int kvm_arch_irq_bypass_add_producer(struct 
irq_bypass_consumer *cons,
  {
struct kvm_kernel_irqfd *irqfd =
container_of(cons, struct kvm_kernel_irqfd, consumer);
+   int ret;
  
  	irqfd->producer = prod;

+   kvm_arch_start_assignment(irqfd->kvm);
+   ret = kvm_x86_ops.update_pi_irte(irqfd->kvm,
+prod->irq, irqfd->gsi, 1);
+
+   if (ret)
+   kvm_arch_end_assignment(irqfd->kvm);
  
-	return kvm_x86_ops.update_pi_irte(irqfd->kvm,

-  prod->irq, irqfd->gsi, 1);
+   return ret;
  }
  
  void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,


Why isn't there a matching end-assignment in the del_producer path?  It
seems this only goes one-way, what happens when a device is
hot-unplugged from the VM or the device interrupt configuration changes.
This will still break vfio if it's not guaranteed to be symmetric.
Thanks,

Alex



Yes, we need add logic in the del_producer path.

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost next 06/10] vdpa: Add means to communicate vq status on get_vq_state

2020-07-19 Thread Jason Wang


On 2020/7/16 下午6:25, Eli Cohen wrote:

On Thu, Jul 16, 2020 at 05:35:18PM +0800, Jason Wang wrote:

On 2020/7/16 下午4:21, Eli Cohen wrote:

On Thu, Jul 16, 2020 at 04:11:00PM +0800, Jason Wang wrote:

On 2020/7/16 下午3:23, Eli Cohen wrote:

Currently, get_vq_state() is used only to pass the available index value
of a vq. Extend the struct to return status on the VQ to the caller.
For now, define VQ_STATE_NOT_READY. In the future it will be extended to
include other infomration.

Modify current vdpa driver to update this field.

Reviewed-by: Parav Pandit
Signed-off-by: Eli Cohen

What's the difference between this and get_vq_ready()?

Thanks


There is no difference. It is just a way to communicate a problem to
with the state of the VQ back to the caller. This is not available now.
I think an asynchronous is preferred but that is not available
currently.


I still don't see the reason, maybe you can give me an example?



My intention was to provide a mechainsm to return meaningful information
on the state of the vq. For example, when you fail to get the state of
the VQ.

Maybe I could just change the prototype of the function to return int
and the driver could put an error if it has trouble returning the vq
state.



That's fine.

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  1   2   3   4   5   6   7   8   9   10   >