[PATCH 5/5] vdpa: mlx5: support per virtqueue dma device

2023-01-10 Thread Jason Wang
This patch implements per virtqueue dma device for mlx5_vdpa. This is
needed for virtio_vdpa to work for CVQ which is backed by vringh but
not DMA. We simply advertise the vDPA device itself as the DMA device
for CVQ then DMA API can simply use PA so the identical mapping for
CVQ can still be used. Otherwise the identical (1:1) mapping won't
work when platform IOMMU is enabled since the IOVA is allocated on
demand which is not necessarily the PA.

This fixes the following crash when mlx5 vDPA device is bound to
virtio-vdpa with platform IOMMU enabled but not in passthrough mode:

BUG: unable to handle page fault for address: ff2fb3063deb1002
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 1393001067 P4D 1393002067 PUD 0
Oops:  [#1] PREEMPT SMP NOPTI
CPU: 55 PID: 8923 Comm: kworker/u112:3 Kdump: loaded Not tainted 6.1.0+ #7
Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.5.4 12/17/2021
Workqueue: mlx5_vdpa_wq mlx5_cvq_kick_handler [mlx5_vdpa]
RIP: 0010:vringh_getdesc_iotlb+0x93/0x1d0 [vringh]
Code: 14 25 40 ef 01 00 83 82 c0 0a 00 00 01 48 2b 05 93 5a 1b ea 8b 4c 24 14 
48 c1 f8 06 48 c1 e0 0c 48 03 05 90 5a 1b ea 48 01 c8 <0f> b7 00 83 aa c0 0a 00 
00 01 65 ff 0d bc e4 41 3f 0f 84 05 01 00
RSP: 0018:ff46821ba664fdf8 EFLAGS: 00010282
RAX: ff2fb3063deb1002 RBX: 0a20 RCX: 0002
RDX: ff2fb318d2f94380 RSI: 0002 RDI: 0001
RBP: ff2fb3065e832410 R08: ff46821ba664fe00 R09: 0001
R10:  R11: 000d R12: ff2fb3065e832488
R13: ff2fb3065e8324a8 R14: ff2fb3065e8324c8 R15: ff2fb3065e8324a8
FS:  () GS:ff2fb3257fac() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ff2fb3063deb1002 CR3: 001392010006 CR4: 00771ee0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554
Call Trace:

  mlx5_cvq_kick_handler+0x89/0x2b0 [mlx5_vdpa]
  process_one_work+0x1e2/0x3b0
  ? rescuer_thread+0x390/0x390
  worker_thread+0x50/0x3a0
  ? rescuer_thread+0x390/0x390
  kthread+0xd6/0x100
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  

Signed-off-by: Jason Wang 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 6632651b1e54..b0e74c25bf48 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2682,6 +2682,16 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, 
unsigned int asid,
return err;
 }
 
+struct device *mlx5_get_vq_dma_dev(struct vdpa_device *vdev, u16 idx)
+{
+   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+
+   if (is_ctrl_vq_idx(mvdev, idx))
+   return >dev;
+
+   return mvdev->vdev.dma_dev;
+}
+
 static void mlx5_vdpa_free(struct vdpa_device *vdev)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
@@ -2897,6 +2907,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = {
.get_generation = mlx5_vdpa_get_generation,
.set_map = mlx5_vdpa_set_map,
.set_group_asid = mlx5_set_group_asid,
+   .get_vq_dma_dev = mlx5_get_vq_dma_dev,
.free = mlx5_vdpa_free,
.suspend = mlx5_vdpa_suspend,
 };
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/5] virtio-vdpa: support per vq dma device

2023-01-10 Thread Jason Wang
This patch adds the support of per vq dma device for virito-vDPA. vDPA
parents then are allowed to use different DMA devices. This is useful
for the parents that have software or emulated virtqueues.

Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio_vdpa.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index 9670cc79371d..d7f5af62ddaa 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -135,6 +135,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
 {
struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+   struct device *dma_dev;
const struct vdpa_config_ops *ops = vdpa->config;
struct virtio_vdpa_vq_info *info;
struct vdpa_callback cb;
@@ -175,9 +176,15 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
 
/* Create the vring */
align = ops->get_vq_align(vdpa);
-   vq = vring_create_virtqueue(index, max_num, align, vdev,
-   true, may_reduce_num, ctx,
-   virtio_vdpa_notify, callback, name);
+
+   if (ops->get_vq_dma_dev)
+   dma_dev = ops->get_vq_dma_dev(vdpa, index);
+   else
+   dma_dev = vdpa_get_dma_dev(vdpa);
+   vq = vring_create_virtqueue_dma(index, max_num, align, vdev,
+   true, may_reduce_num, ctx,
+   virtio_vdpa_notify, callback,
+   name, dma_dev);
if (!vq) {
err = -ENOMEM;
goto error_new_virtqueue;
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 4/5] vdpa: set dma mask for vDPA device

2023-01-10 Thread Jason Wang
Setting DMA mask for vDPA device in case that there are virtqueue that
is not backed by DMA so the vDPA device could be advertised as the DMA
device that is used by DMA API for software emulated virtqueues.

Signed-off-by: Jason Wang 
---
 drivers/vdpa/vdpa.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 8ef7aa1365cc..6821b2850bbb 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -39,6 +39,11 @@ static int vdpa_dev_probe(struct device *d)
u32 max_num, min_num = 1;
int ret = 0;
 
+   d->dma_mask = >coherent_dma_mask;
+   ret = dma_set_mask_and_coherent(d, DMA_BIT_MASK(64));
+   if (ret)
+   return ret;
+
max_num = ops->get_vq_num_max(vdev);
if (ops->get_vq_num_min)
min_num = ops->get_vq_num_min(vdev);
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/5] vdpa: introduce get_vq_dma_device()

2023-01-10 Thread Jason Wang
This patch introduces a new method to query the dma device that is use
for a specific virtqueue.

Signed-off-by: Jason Wang 
---
 include/linux/vdpa.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 6d0f5e4e82c2..3ec13aee35f5 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -282,6 +282,11 @@ struct vdpa_map_file {
  * @iova: iova to be unmapped
  * @size: size of the area
  * Returns integer: success (0) or error (< 0)
+ * @get_vq_dma_dev:Get the dma device for a specific
+ * virtqueue (optional)
+ * @vdev: vdpa device
+ * @idx: virtqueue index
+ * Returns pointer to structure device or error 
(NULL)
  * @free:  Free resources that belongs to vDPA (optional)
  * @vdev: vdpa device
  */
@@ -341,6 +346,7 @@ struct vdpa_config_ops {
 u64 iova, u64 size);
int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
  unsigned int asid);
+   struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
 
/* Free device resources */
void (*free)(struct vdpa_device *vdev);
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/5] virtio_ring: per virtqueue dma device

2023-01-10 Thread Jason Wang
This patch introduces a per virtqueue dma device. This will be used
for virtio devices whose virtqueue are backed by different underlayer
devices.

One example is the vDPA that where the control virtqueue could be
implemented through software mediation.

Some of the work are actually done before since the helper like
vring_dma_device(). This work left are:

- Let vring_dma_device() return the per virtqueue dma device instead
  of the vdev's parent.
- Allow passing a dma_device when creating the virtqueue through a new
  helper, old vring creation helper will keep using vdev's parent.

Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 133 ---
 include/linux/virtio_ring.h  |  16 +
 2 files changed, 109 insertions(+), 40 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 723c4e29e1d3..41144b5246a8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -202,6 +202,9 @@ struct vring_virtqueue {
/* DMA, allocation, and size information */
bool we_own_ring;
 
+   /* Device used for doing DMA */
+   struct device *dma_dev;
+
 #ifdef DEBUG
/* They're supposed to lock for us. */
unsigned int in_use;
@@ -219,7 +222,8 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
   bool context,
   bool (*notify)(struct virtqueue 
*),
   void (*callback)(struct 
virtqueue *),
-  const char *name);
+  const char *name,
+  struct device *dma_dev);
 static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num);
 static void vring_free(struct virtqueue *_vq);
 
@@ -297,10 +301,11 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
 EXPORT_SYMBOL_GPL(virtio_max_dma_size);
 
 static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag)
+  dma_addr_t *dma_handle, gfp_t flag,
+  struct device *dma_dev)
 {
if (vring_use_dma_api(vdev)) {
-   return dma_alloc_coherent(vdev->dev.parent, size,
+   return dma_alloc_coherent(dma_dev, size,
  dma_handle, flag);
} else {
void *queue = alloc_pages_exact(PAGE_ALIGN(size), flag);
@@ -330,10 +335,11 @@ static void *vring_alloc_queue(struct virtio_device 
*vdev, size_t size,
 }
 
 static void vring_free_queue(struct virtio_device *vdev, size_t size,
-void *queue, dma_addr_t dma_handle)
+void *queue, dma_addr_t dma_handle,
+struct device *dma_dev)
 {
if (vring_use_dma_api(vdev))
-   dma_free_coherent(vdev->dev.parent, size, queue, dma_handle);
+   dma_free_coherent(dma_dev, size, queue, dma_handle);
else
free_pages_exact(queue, PAGE_ALIGN(size));
 }
@@ -341,11 +347,11 @@ static void vring_free_queue(struct virtio_device *vdev, 
size_t size,
 /*
  * The DMA ops on various arches are rather gnarly right now, and
  * making all of the arch DMA ops work on the vring device itself
- * is a mess.  For now, we use the parent device for DMA ops.
+ * is a mess.
  */
 static inline struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 {
-   return vq->vq.vdev->dev.parent;
+   return vq->dma_dev;
 }
 
 /* Map one sg entry. */
@@ -1032,11 +1038,12 @@ static int vring_alloc_state_extra_split(struct 
vring_virtqueue_split *vring_spl
 }
 
 static void vring_free_split(struct vring_virtqueue_split *vring_split,
-struct virtio_device *vdev)
+struct virtio_device *vdev, struct device *dma_dev)
 {
vring_free_queue(vdev, vring_split->queue_size_in_bytes,
 vring_split->vring.desc,
-vring_split->queue_dma_addr);
+vring_split->queue_dma_addr,
+dma_dev);
 
kfree(vring_split->desc_state);
kfree(vring_split->desc_extra);
@@ -1046,7 +1053,8 @@ static int vring_alloc_queue_split(struct 
vring_virtqueue_split *vring_split,
   struct virtio_device *vdev,
   u32 num,
   unsigned int vring_align,
-  bool may_reduce_num)
+  bool may_reduce_num,
+  struct device *dma_dev)
 {
void *queue = NULL;
dma_addr_t dma_addr;
@@ -1061,7 +1069,8 @@ static int vring_alloc_queue_split(struct 
vring_virtqueue_split *vring_split,
for (; 

[PATCH 0/5] virtio_ring: per virtqueue DMA device

2023-01-10 Thread Jason Wang
Hi All:

In some cases, the virtqueue could be backed by different devices. One
example is that in the case of vDPA some parent may emualte virtqueue
via vringh. In this case, it would be wrong if we stick with the
physical DMA device for software emulated device, since there's no
easy way for vringh to know about the hardware IOMMU mappings.

So this series tries to introduce per virtqueue DMA device, then
software virtqueues can utilize the transport specific method to
assign appropirate DMA device.

This fixes the crash of mlx5_vdpa + virtio_vdpa when platform IOMMU is
enabled but not in the passthrough mode. The reason for the crash is
that the virito_ring tries to map the control virtqueue into platform
IOMMU but the vringh assumes a direct mapping (PA as IOVA). This is
fixed by advetise the vDPA device that doesnt do DMA (without a DMA
ops). So DMA API can go with the direct mapping then the vringh will
be happy since mlx5_vdpa assuems a direct/identical mapping by
default.

Please review.

Thanks

Jason Wang (5):
  virtio_ring: per virtqueue dma device
  vdpa: introduce get_vq_dma_device()
  virtio-vdpa: support per vq dma device
  vdpa: set dma mask for vDPA device
  vdpa: mlx5: support per virtqueue dma device

 drivers/vdpa/mlx5/net/mlx5_vnet.c |  11 +++
 drivers/vdpa/vdpa.c   |   5 ++
 drivers/virtio/virtio_ring.c  | 133 +-
 drivers/virtio/virtio_vdpa.c  |  13 ++-
 include/linux/vdpa.h  |   6 ++
 include/linux/virtio_ring.h   |  16 
 6 files changed, 141 insertions(+), 43 deletions(-)

-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 2/9] vringh: remove vringh_iov and unite to vringh_kiov

2023-01-10 Thread Shunsuke Mie


On 2023/01/11 14:54, Jason Wang wrote:

On Wed, Jan 11, 2023 at 11:27 AM Shunsuke Mie  wrote:


On 2022/12/28 15:36, Jason Wang wrote:

On Tue, Dec 27, 2022 at 3:06 PM Shunsuke Mie  wrote:

2022年12月27日(火) 15:04 Jason Wang :

On Tue, Dec 27, 2022 at 10:25 AM Shunsuke Mie  wrote:

struct vringh_iov is defined to hold userland addresses. However, to use
common function, __vring_iov, finally the vringh_iov converts to the
vringh_kiov with simple cast. It includes compile time check code to make
sure it can be cast correctly.

To simplify the code, this patch removes the struct vringh_iov and unifies
APIs to struct vringh_kiov.

Signed-off-by: Shunsuke Mie 

While at this, I wonder if we need to go further, that is, switch to
using an iov iterator instead of a vringh customized one.

I didn't see the iov iterator yet, thank you for informing me.
Is that iov_iter? https://lwn.net/Articles/625077/

Exactly.

I've investigated the iov_iter, vhost and related APIs. As a result, I
think that it is not easy to switch to use the iov_iter. Because, the
design of vhost and vringh is different.

Yes, but just to make sure we are on the same page, the reason I
suggest iov_iter for vringh is that the vringh itself has customized
iter equivalent, e.g it has iter for kernel,user, or even iotlb. At
least the kernel and userspace part could be switched to iov_iter.
Note that it has nothing to do with vhost.

I agree. It can be switch to use iov_iter, but I think we need to change
fundamentally. There are duplicated code on vhost and vringh to access
vring, and some helper functions...

Anyway, I'd like to focus vringh in this patchset. Thank you for your

comments and suggestions!


Best


The iov_iter has vring desc info and meta data of transfer method. The
vhost provides generic transfer function for the iov_iter. In constrast,
vringh_iov just has vring desc info. The vringh provides transfer functions
for each methods.

In the future, it is better to use common data structure and APIs between
vhost and vringh (or merge completely), but it requires a lot of
changes, so I'd like to just
organize data structure in vringh as a first step in this patch.

That's fine.

Thansk



Best


Thanks


Thanks


---
   drivers/vhost/vringh.c | 32 ++
   include/linux/vringh.h | 45 --
   2 files changed, 10 insertions(+), 67 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 828c29306565..aa3cd27d2384 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -691,8 +691,8 @@ EXPORT_SYMBOL(vringh_init_user);
* calling vringh_iov_cleanup() to release the memory, even on error!
*/
   int vringh_getdesc_user(struct vringh *vrh,
-   struct vringh_iov *riov,
-   struct vringh_iov *wiov,
+   struct vringh_kiov *riov,
+   struct vringh_kiov *wiov,
  bool (*getrange)(struct vringh *vrh,
   u64 addr, struct vringh_range *r),
  u16 *head)
@@ -708,26 +708,6 @@ int vringh_getdesc_user(struct vringh *vrh,
  if (err == vrh->vring.num)
  return 0;

-   /* We need the layouts to be the identical for this to work */
-   BUILD_BUG_ON(sizeof(struct vringh_kiov) != sizeof(struct vringh_iov));
-   BUILD_BUG_ON(offsetof(struct vringh_kiov, iov) !=
-offsetof(struct vringh_iov, iov));
-   BUILD_BUG_ON(offsetof(struct vringh_kiov, i) !=
-offsetof(struct vringh_iov, i));
-   BUILD_BUG_ON(offsetof(struct vringh_kiov, used) !=
-offsetof(struct vringh_iov, used));
-   BUILD_BUG_ON(offsetof(struct vringh_kiov, max_num) !=
-offsetof(struct vringh_iov, max_num));
-   BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
-   BUILD_BUG_ON(offsetof(struct iovec, iov_base) !=
-offsetof(struct kvec, iov_base));
-   BUILD_BUG_ON(offsetof(struct iovec, iov_len) !=
-offsetof(struct kvec, iov_len));
-   BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_base)
-!= sizeof(((struct kvec *)NULL)->iov_base));
-   BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_len)
-!= sizeof(((struct kvec *)NULL)->iov_len));
-
  *head = err;
  err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
 (struct vringh_kiov *)wiov,
@@ -740,14 +720,14 @@ int vringh_getdesc_user(struct vringh *vrh,
   EXPORT_SYMBOL(vringh_getdesc_user);

   /**
- * vringh_iov_pull_user - copy bytes from vring_iov.
+ * vringh_iov_pull_user - copy bytes from vring_kiov.
* @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
* @dst: the place to copy.
* @len: the maximum length to copy.
*
* Returns the bytes copied <= len or a 

[PATCH] vhost-scsi: unbreak any layout for response

2023-01-10 Thread Jason Wang
Al Viro said:

"""
Since "vhost/scsi: fix reuse of >iov[out] in response"
we have this:
cmd->tvc_resp_iov = vq->iov[vc.out];
cmd->tvc_in_iovs = vc.in;
combined with
iov_iter_init(_iter, ITER_DEST, >tvc_resp_iov,
  cmd->tvc_in_iovs, sizeof(v_rsp));
in vhost_scsi_complete_cmd_work().  We used to have ->tvc_resp_iov
_pointing_ to vq->iov[vc.out]; back then iov_iter_init() asked to
set an iovec-backed iov_iter over the tail of vq->iov[], with
length being the amount of iovecs in the tail.

Now we have a copy of one element of that array.  Fortunately, the members
following it in the containing structure are two non-NULL kernel pointers,
so copy_to_iter() will not copy anything beyond the first iovec - kernel
pointer is not (on the majority of architectures) going to be accepted by
access_ok() in copyout() and it won't be skipped since the "length" (in
reality - another non-NULL kernel pointer) won't be zero.

So it's not going to give a guest-to-qemu escalation, but it's definitely
a bug.  Frankly, my preference would be to verify that the very first iovec
is long enough to hold rsp_size.  Due to the above, any users that try to
give us vq->iov[vc.out].iov_len < sizeof(struct virtio_scsi_cmd_resp)
would currently get a failure in vhost_scsi_complete_cmd_work()
anyway.
"""

However, the spec doesn't say anything about the legacy descriptor
layout for the respone. So this patch tries to not assume the response
to reside in a single separate descriptor which is what commit
79c14141a487 ("vhost/scsi: Convert completion path to use") tries to
achieve towards to ANY_LAYOUT.

This is done by allocating and using dedicate resp iov in the
command. To be safety, start with UIO_MAXIOV to be consistent with the
vhost core.

Testing with the hacked virtio-scsi driver that use 1 descriptor for 1
byte in the response.

Reported-by: Al Viro 
Cc: Benjamin Coddington 
Cc: Nicholas Bellinger 
Fixes: a77ec83a5789 ("vhost/scsi: fix reuse of >iov[out] in response")
Signed-off-by: Jason Wang 
---
 drivers/vhost/scsi.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index dca6346d75b3..7d6d70072603 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -80,7 +80,7 @@ struct vhost_scsi_cmd {
struct scatterlist *tvc_prot_sgl;
struct page **tvc_upages;
/* Pointer to response header iovec */
-   struct iovec tvc_resp_iov;
+   struct iovec *tvc_resp_iov;
/* Pointer to vhost_scsi for our device */
struct vhost_scsi *tvc_vhost;
/* Pointer to vhost_virtqueue for the cmd */
@@ -563,7 +563,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work 
*work)
memcpy(v_rsp.sense, cmd->tvc_sense_buf,
   se_cmd->scsi_sense_length);
 
-   iov_iter_init(_iter, ITER_DEST, >tvc_resp_iov,
+   iov_iter_init(_iter, ITER_DEST, cmd->tvc_resp_iov,
  cmd->tvc_in_iovs, sizeof(v_rsp));
ret = copy_to_iter(_rsp, sizeof(v_rsp), _iter);
if (likely(ret == sizeof(v_rsp))) {
@@ -594,6 +594,7 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct 
vhost_scsi_tpg *tpg,
struct vhost_scsi_cmd *cmd;
struct vhost_scsi_nexus *tv_nexus;
struct scatterlist *sg, *prot_sg;
+   struct iovec *tvc_resp_iov;
struct page **pages;
int tag;
 
@@ -613,6 +614,7 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct 
vhost_scsi_tpg *tpg,
sg = cmd->tvc_sgl;
prot_sg = cmd->tvc_prot_sgl;
pages = cmd->tvc_upages;
+   tvc_resp_iov = cmd->tvc_resp_iov;
memset(cmd, 0, sizeof(*cmd));
cmd->tvc_sgl = sg;
cmd->tvc_prot_sgl = prot_sg;
@@ -625,6 +627,7 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct 
vhost_scsi_tpg *tpg,
cmd->tvc_data_direction = data_direction;
cmd->tvc_nexus = tv_nexus;
cmd->inflight = vhost_scsi_get_inflight(vq);
+   cmd->tvc_resp_iov = tvc_resp_iov;
 
memcpy(cmd->tvc_cdb, cdb, VHOST_SCSI_MAX_CDB_SIZE);
 
@@ -935,7 +938,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
struct iov_iter in_iter, prot_iter, data_iter;
u64 tag;
u32 exp_data_len, data_direction;
-   int ret, prot_bytes, c = 0;
+   int ret, prot_bytes, i, c = 0;
u16 lun;
u8 task_attr;
bool t10_pi = vhost_has_feature(vq, VIRTIO_SCSI_F_T10_PI);
@@ -1092,7 +1095,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
}
cmd->tvc_vhost = vs;
cmd->tvc_vq = vq;
-   cmd->tvc_resp_iov = vq->iov[vc.out];
+   for (i = 0; i < vc.in ; i++)
+   cmd->tvc_resp_iov[i] = vq->iov[vc.out + i];
cmd->tvc_in_iovs = vc.in;
 

Re: [RFC PATCH 2/9] vringh: remove vringh_iov and unite to vringh_kiov

2023-01-10 Thread Jason Wang
On Wed, Jan 11, 2023 at 11:27 AM Shunsuke Mie  wrote:
>
>
> On 2022/12/28 15:36, Jason Wang wrote:
> > On Tue, Dec 27, 2022 at 3:06 PM Shunsuke Mie  wrote:
> >> 2022年12月27日(火) 15:04 Jason Wang :
> >>> On Tue, Dec 27, 2022 at 10:25 AM Shunsuke Mie  wrote:
>  struct vringh_iov is defined to hold userland addresses. However, to use
>  common function, __vring_iov, finally the vringh_iov converts to the
>  vringh_kiov with simple cast. It includes compile time check code to make
>  sure it can be cast correctly.
> 
>  To simplify the code, this patch removes the struct vringh_iov and 
>  unifies
>  APIs to struct vringh_kiov.
> 
>  Signed-off-by: Shunsuke Mie 
> >>> While at this, I wonder if we need to go further, that is, switch to
> >>> using an iov iterator instead of a vringh customized one.
> >> I didn't see the iov iterator yet, thank you for informing me.
> >> Is that iov_iter? https://lwn.net/Articles/625077/
> > Exactly.
>
> I've investigated the iov_iter, vhost and related APIs. As a result, I
> think that it is not easy to switch to use the iov_iter. Because, the
> design of vhost and vringh is different.

Yes, but just to make sure we are on the same page, the reason I
suggest iov_iter for vringh is that the vringh itself has customized
iter equivalent, e.g it has iter for kernel,user, or even iotlb. At
least the kernel and userspace part could be switched to iov_iter.
Note that it has nothing to do with vhost.

>
> The iov_iter has vring desc info and meta data of transfer method. The
> vhost provides generic transfer function for the iov_iter. In constrast,
> vringh_iov just has vring desc info. The vringh provides transfer functions
> for each methods.
>
> In the future, it is better to use common data structure and APIs between
> vhost and vringh (or merge completely), but it requires a lot of
> changes, so I'd like to just
> organize data structure in vringh as a first step in this patch.

That's fine.

Thansk

>
>
> Best
>
> > Thanks
> >
> >>> Thanks
> >>>
>  ---
>    drivers/vhost/vringh.c | 32 ++
>    include/linux/vringh.h | 45 --
>    2 files changed, 10 insertions(+), 67 deletions(-)
> 
>  diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
>  index 828c29306565..aa3cd27d2384 100644
>  --- a/drivers/vhost/vringh.c
>  +++ b/drivers/vhost/vringh.c
>  @@ -691,8 +691,8 @@ EXPORT_SYMBOL(vringh_init_user);
> * calling vringh_iov_cleanup() to release the memory, even on error!
> */
>    int vringh_getdesc_user(struct vringh *vrh,
>  -   struct vringh_iov *riov,
>  -   struct vringh_iov *wiov,
>  +   struct vringh_kiov *riov,
>  +   struct vringh_kiov *wiov,
>   bool (*getrange)(struct vringh *vrh,
>    u64 addr, struct vringh_range 
>  *r),
>   u16 *head)
>  @@ -708,26 +708,6 @@ int vringh_getdesc_user(struct vringh *vrh,
>   if (err == vrh->vring.num)
>   return 0;
> 
>  -   /* We need the layouts to be the identical for this to work */
>  -   BUILD_BUG_ON(sizeof(struct vringh_kiov) != sizeof(struct 
>  vringh_iov));
>  -   BUILD_BUG_ON(offsetof(struct vringh_kiov, iov) !=
>  -offsetof(struct vringh_iov, iov));
>  -   BUILD_BUG_ON(offsetof(struct vringh_kiov, i) !=
>  -offsetof(struct vringh_iov, i));
>  -   BUILD_BUG_ON(offsetof(struct vringh_kiov, used) !=
>  -offsetof(struct vringh_iov, used));
>  -   BUILD_BUG_ON(offsetof(struct vringh_kiov, max_num) !=
>  -offsetof(struct vringh_iov, max_num));
>  -   BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
>  -   BUILD_BUG_ON(offsetof(struct iovec, iov_base) !=
>  -offsetof(struct kvec, iov_base));
>  -   BUILD_BUG_ON(offsetof(struct iovec, iov_len) !=
>  -offsetof(struct kvec, iov_len));
>  -   BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_base)
>  -!= sizeof(((struct kvec *)NULL)->iov_base));
>  -   BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_len)
>  -!= sizeof(((struct kvec *)NULL)->iov_len));
>  -
>   *head = err;
>   err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
>  (struct vringh_kiov *)wiov,
>  @@ -740,14 +720,14 @@ int vringh_getdesc_user(struct vringh *vrh,
>    EXPORT_SYMBOL(vringh_getdesc_user);
> 
>    /**
>  - * vringh_iov_pull_user - copy bytes from vring_iov.
>  + * vringh_iov_pull_user - copy bytes from vring_kiov.
> * 

Re: [RFC PATCH 4/9] vringh: unify the APIs for all accessors

2023-01-10 Thread Shunsuke Mie


On 2022/12/28 16:20, Michael S. Tsirkin wrote:

On Wed, Dec 28, 2022 at 11:24:10AM +0900, Shunsuke Mie wrote:

2022年12月27日(火) 23:37 Michael S. Tsirkin :

On Tue, Dec 27, 2022 at 07:22:36PM +0900, Shunsuke Mie wrote:

2022年12月27日(火) 16:49 Shunsuke Mie :

2022年12月27日(火) 16:04 Michael S. Tsirkin :

On Tue, Dec 27, 2022 at 11:25:26AM +0900, Shunsuke Mie wrote:

Each vringh memory accessors that are for user, kern and iotlb has own
interfaces that calls common code. But some codes are duplicated and that
becomes loss extendability.

Introduce a struct vringh_ops and provide a common APIs for all accessors.
It can bee easily extended vringh code for new memory accessor and
simplified a caller code.

Signed-off-by: Shunsuke Mie 
---
  drivers/vhost/vringh.c | 667 +++--
  include/linux/vringh.h | 100 +++---
  2 files changed, 225 insertions(+), 542 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index aa3cd27d2384..ebfd3644a1a3 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -35,15 +35,12 @@ static __printf(1,2) __cold void vringh_bad(const char 
*fmt, ...)
  }

  /* Returns vring->num if empty, -ve on error. */
-static inline int __vringh_get_head(const struct vringh *vrh,
- int (*getu16)(const struct vringh *vrh,
-   u16 *val, const __virtio16 *p),
- u16 *last_avail_idx)
+static inline int __vringh_get_head(const struct vringh *vrh, u16 
*last_avail_idx)
  {
   u16 avail_idx, i, head;
   int err;

- err = getu16(vrh, _idx, >vring.avail->idx);
+ err = vrh->ops.getu16(vrh, _idx, >vring.avail->idx);
   if (err) {
   vringh_bad("Failed to access avail idx at %p",
  >vring.avail->idx);

I like that this patch removes more lines of code than it adds.

However one of the design points of vringh abstractions is that they were
carefully written to be very low overhead.
This is why we are passing function pointers to inline functions -
compiler can optimize that out.

I think that introducing ops indirect functions calls here is going to break
these assumptions and hurt performance.
Unless compiler can somehow figure it out and optimize?
I don't see how it's possible with ops pointer in memory
but maybe I'm wrong.

I think your concern is correct. I have to understand the compiler
optimization and redesign this approach If it is needed.

Was any effort taken to test effect of these patches on performance?

I just tested vringh_test and already faced little performance reduction.
I have to investigate that, as you said.

I attempted to test with perf. I found that the performance of patched code
is almost the same as the upstream one. However, I have to investigate way
this patch leads to this result, also the profiling should be run on
more powerful
machines too.

environment:
$ grep 'model name' /proc/cpuinfo
model name  : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
model name  : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
model name  : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
model name  : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz

results:
* for patched code
  Performance counter stats for 'nice -n -20 ./vringh_test_patched
--parallel --eventidx --fast-vringh --indirect --virtio-1' (20 runs):

   3,028.05 msec task-clock#0.995 CPUs
utilized( +-  0.12% )
 78,150  context-switches  #   25.691 K/sec
( +-  0.00% )
  5  cpu-migrations#1.644 /sec
( +-  3.33% )
190  page-faults   #   62.461 /sec
( +-  0.41% )
  6,919,025,222  cycles#2.275 GHz
( +-  0.13% )
  8,990,220,160  instructions  #1.29  insn per
cycle   ( +-  0.04% )
  1,788,326,786  branches  #  587.899 M/sec
( +-  0.05% )
  4,557,398  branch-misses #0.25% of all
branches  ( +-  0.43% )

3.04359 +- 0.00378 seconds time elapsed  ( +-  0.12% )

* for upstream code
  Performance counter stats for 'nice -n -20 ./vringh_test_base
--parallel --eventidx --fast-vringh --indirect --virtio-1' (10 runs):

   3,058.41 msec task-clock#0.999 CPUs
utilized( +-  0.14% )
 78,149  context-switches  #   25.545 K/sec
( +-  0.00% )
  5  cpu-migrations#1.634 /sec
( +-  2.67% )
194  page-faults   #   63.414 /sec
( +-  0.43% )
  6,988,713,963  cycles#2.284 GHz
( +-  0.14% )
  8,512,533,269  instructions  #1.22  insn per
cycle   ( +-  0.04% )

Re: [RFC PATCH 2/9] vringh: remove vringh_iov and unite to vringh_kiov

2023-01-10 Thread Shunsuke Mie


On 2022/12/28 15:36, Jason Wang wrote:

On Tue, Dec 27, 2022 at 3:06 PM Shunsuke Mie  wrote:

2022年12月27日(火) 15:04 Jason Wang :

On Tue, Dec 27, 2022 at 10:25 AM Shunsuke Mie  wrote:

struct vringh_iov is defined to hold userland addresses. However, to use
common function, __vring_iov, finally the vringh_iov converts to the
vringh_kiov with simple cast. It includes compile time check code to make
sure it can be cast correctly.

To simplify the code, this patch removes the struct vringh_iov and unifies
APIs to struct vringh_kiov.

Signed-off-by: Shunsuke Mie 

While at this, I wonder if we need to go further, that is, switch to
using an iov iterator instead of a vringh customized one.

I didn't see the iov iterator yet, thank you for informing me.
Is that iov_iter? https://lwn.net/Articles/625077/

Exactly.


I've investigated the iov_iter, vhost and related APIs. As a result, I
think that it is not easy to switch to use the iov_iter. Because, the
design of vhost and vringh is different.

The iov_iter has vring desc info and meta data of transfer method. The
vhost provides generic transfer function for the iov_iter. In constrast,
vringh_iov just has vring desc info. The vringh provides transfer functions
for each methods.

In the future, it is better to use common data structure and APIs between
vhost and vringh (or merge completely), but it requires a lot of 
changes, so I'd like to just

organize data structure in vringh as a first step in this patch.


Best


Thanks


Thanks


---
  drivers/vhost/vringh.c | 32 ++
  include/linux/vringh.h | 45 --
  2 files changed, 10 insertions(+), 67 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 828c29306565..aa3cd27d2384 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -691,8 +691,8 @@ EXPORT_SYMBOL(vringh_init_user);
   * calling vringh_iov_cleanup() to release the memory, even on error!
   */
  int vringh_getdesc_user(struct vringh *vrh,
-   struct vringh_iov *riov,
-   struct vringh_iov *wiov,
+   struct vringh_kiov *riov,
+   struct vringh_kiov *wiov,
 bool (*getrange)(struct vringh *vrh,
  u64 addr, struct vringh_range *r),
 u16 *head)
@@ -708,26 +708,6 @@ int vringh_getdesc_user(struct vringh *vrh,
 if (err == vrh->vring.num)
 return 0;

-   /* We need the layouts to be the identical for this to work */
-   BUILD_BUG_ON(sizeof(struct vringh_kiov) != sizeof(struct vringh_iov));
-   BUILD_BUG_ON(offsetof(struct vringh_kiov, iov) !=
-offsetof(struct vringh_iov, iov));
-   BUILD_BUG_ON(offsetof(struct vringh_kiov, i) !=
-offsetof(struct vringh_iov, i));
-   BUILD_BUG_ON(offsetof(struct vringh_kiov, used) !=
-offsetof(struct vringh_iov, used));
-   BUILD_BUG_ON(offsetof(struct vringh_kiov, max_num) !=
-offsetof(struct vringh_iov, max_num));
-   BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
-   BUILD_BUG_ON(offsetof(struct iovec, iov_base) !=
-offsetof(struct kvec, iov_base));
-   BUILD_BUG_ON(offsetof(struct iovec, iov_len) !=
-offsetof(struct kvec, iov_len));
-   BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_base)
-!= sizeof(((struct kvec *)NULL)->iov_base));
-   BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_len)
-!= sizeof(((struct kvec *)NULL)->iov_len));
-
 *head = err;
 err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
(struct vringh_kiov *)wiov,
@@ -740,14 +720,14 @@ int vringh_getdesc_user(struct vringh *vrh,
  EXPORT_SYMBOL(vringh_getdesc_user);

  /**
- * vringh_iov_pull_user - copy bytes from vring_iov.
+ * vringh_iov_pull_user - copy bytes from vring_kiov.
   * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
   * @dst: the place to copy.
   * @len: the maximum length to copy.
   *
   * Returns the bytes copied <= len or a negative errno.
   */
-ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
+ssize_t vringh_iov_pull_user(struct vringh_kiov *riov, void *dst, size_t len)
  {
 return vringh_iov_xfer(NULL, (struct vringh_kiov *)riov,
dst, len, xfer_from_user);
@@ -755,14 +735,14 @@ ssize_t vringh_iov_pull_user(struct vringh_iov *riov, 
void *dst, size_t len)
  EXPORT_SYMBOL(vringh_iov_pull_user);

  /**
- * vringh_iov_push_user - copy bytes into vring_iov.
+ * vringh_iov_push_user - copy bytes into vring_kiov.
   * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
   * @src: the place to copy from.
   * @len: the maximum length to copy.
   *
   * Returns the 

Re: [PATCH v9 0/3] virtio: vdpa: new SolidNET DPU driver

2023-01-10 Thread Alvaro Karsz
> Beautiful, nice threading, thanks! :)
Sure, thanks for noticing the indentation issue in the first place :)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v9 0/3] virtio: vdpa: new SolidNET DPU driver

2023-01-10 Thread Bjorn Helgaas
On Tue, Jan 10, 2023 at 06:56:35PM +0200, Alvaro Karsz wrote:
> This series adds a vDPA driver for SolidNET DPU.
> ...

> v9:
>   - Indent PCI id in pci_ids.h with tabs - Patch 1.
>   - Wrap patch comment log to fill 75 columns - Patch 1 + 2.

Beautiful, nice threading, thanks! :)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v9 2/3] PCI: Avoid FLR for SolidRun SNET DPU rev 1

2023-01-10 Thread Alvaro Karsz
This patch fixes a FLR bug on the SNET DPU rev 1 by setting the
PCI_DEV_FLAGS_NO_FLR_RESET flag.

As there is a quirk to avoid FLR (quirk_no_flr), I added a new quirk
to check the rev ID before calling to quirk_no_flr.

Without this patch, a SNET DPU rev 1 may hang when FLR is applied.

Signed-off-by: Alvaro Karsz 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/quirks.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 285acc4aacc..809d03272c2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5343,6 +5343,14 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, 
quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr);
 
+/* FLR may cause the SolidRun SNET DPU (rev 0x1) to hang */
+static void quirk_no_flr_snet(struct pci_dev *dev)
+{
+   if (dev->revision == 0x1)
+   quirk_no_flr(dev);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SOLIDRUN, 0x1000, quirk_no_flr_snet);
+
 static void quirk_no_ext_tags(struct pci_dev *pdev)
 {
struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
-- 
2.32.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v9 3/3] virtio: vdpa: new SolidNET DPU driver.

2023-01-10 Thread Alvaro Karsz
This commit includes:
 1) The driver to manage the controlplane over vDPA bus.
 2) A HW monitor device to read health values from the DPU.

Signed-off-by: Alvaro Karsz 
Acked-by: Jason Wang 
---
 MAINTAINERS|4 +
 drivers/vdpa/Kconfig   |   18 +
 drivers/vdpa/Makefile  |1 +
 drivers/vdpa/solidrun/Makefile |6 +
 drivers/vdpa/solidrun/snet_hwmon.c |  188 +
 drivers/vdpa/solidrun/snet_main.c  | 1110 
 drivers/vdpa/solidrun/snet_vdpa.h  |  194 +
 7 files changed, 1521 insertions(+)
 create mode 100644 drivers/vdpa/solidrun/Makefile
 create mode 100644 drivers/vdpa/solidrun/snet_hwmon.c
 create mode 100644 drivers/vdpa/solidrun/snet_main.c
 create mode 100644 drivers/vdpa/solidrun/snet_vdpa.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7f86d02cb42..78e181e5cf5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22024,6 +22024,10 @@ IFCVF VIRTIO DATA PATH ACCELERATOR
 R: Zhu Lingshan 
 F: drivers/vdpa/ifcvf/
 
+SNET DPU VIRTIO DATA PATH ACCELERATOR
+R: Alvaro Karsz 
+F: drivers/vdpa/solidrun/
+
 VIRTIO BALLOON
 M: "Michael S. Tsirkin" 
 M: David Hildenbrand 
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 50f45d03761..7396e92b485 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -86,4 +86,22 @@ config ALIBABA_ENI_VDPA
  VDPA driver for Alibaba ENI (Elastic Network Interface) which is 
built upon
  virtio 0.9.5 specification.
 
+ config SNET_VDPA
+   tristate "SolidRun's vDPA driver for SolidNET"
+   depends on PCI_MSI && PCI_IOV && (HWMON || HWMON=n)
+
+   # This driver MAY create a HWMON device.
+   # Depending on (HWMON || HWMON=n) ensures that:
+   # If HWMON=n the driver can be compiled either as a module or built-in.
+   # If HWMON=y the driver can be compiled either as a module or built-in.
+   # If HWMON=m the driver is forced to be compiled as a module.
+   # By doing so, IS_ENABLED can be used instead of IS_REACHABLE
+
+   help
+ vDPA driver for SolidNET DPU.
+ With this driver, the VirtIO dataplane can be
+ offloaded to a SolidNET DPU.
+ This driver includes a HW monitor device that
+ reads health values from the DPU.
+
 endif # VDPA
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 15665563a7f..59396ff2a31 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_IFCVF)+= ifcvf/
 obj-$(CONFIG_MLX5_VDPA) += mlx5/
 obj-$(CONFIG_VP_VDPA)+= virtio_pci/
 obj-$(CONFIG_ALIBABA_ENI_VDPA) += alibaba/
+obj-$(CONFIG_SNET_VDPA) += solidrun/
diff --git a/drivers/vdpa/solidrun/Makefile b/drivers/vdpa/solidrun/Makefile
new file mode 100644
index 000..c0aa3415bf7
--- /dev/null
+++ b/drivers/vdpa/solidrun/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_SNET_VDPA) += snet_vdpa.o
+snet_vdpa-$(CONFIG_SNET_VDPA) += snet_main.o
+ifdef CONFIG_HWMON
+snet_vdpa-$(CONFIG_SNET_VDPA) += snet_hwmon.o
+endif
diff --git a/drivers/vdpa/solidrun/snet_hwmon.c 
b/drivers/vdpa/solidrun/snet_hwmon.c
new file mode 100644
index 000..e695e36ff75
--- /dev/null
+++ b/drivers/vdpa/solidrun/snet_hwmon.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SolidRun DPU driver for control plane
+ *
+ * Copyright (C) 2022 SolidRun
+ *
+ * Author: Alvaro Karsz 
+ *
+ */
+#include 
+
+#include "snet_vdpa.h"
+
+/* Monitor offsets */
+#define SNET_MON_TMP0_IN_OFF  0x00
+#define SNET_MON_TMP0_MAX_OFF 0x08
+#define SNET_MON_TMP0_CRIT_OFF0x10
+#define SNET_MON_TMP1_IN_OFF  0x18
+#define SNET_MON_TMP1_CRIT_OFF0x20
+#define SNET_MON_CURR_IN_OFF  0x28
+#define SNET_MON_CURR_MAX_OFF 0x30
+#define SNET_MON_CURR_CRIT_OFF0x38
+#define SNET_MON_PWR_IN_OFF   0x40
+#define SNET_MON_VOLT_IN_OFF  0x48
+#define SNET_MON_VOLT_CRIT_OFF0x50
+#define SNET_MON_VOLT_LCRIT_OFF   0x58
+
+static void snet_hwmon_read_reg(struct psnet *psnet, u32 reg, long *out)
+{
+   *out = psnet_read64(psnet, psnet->cfg.hwmon_off + reg);
+}
+
+static umode_t snet_howmon_is_visible(const void *data,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel)
+{
+   return 0444;
+}
+
+static int snet_howmon_read(struct device *dev, enum hwmon_sensor_types type,
+   u32 attr, int channel, long *val)
+{
+   struct psnet *psnet = dev_get_drvdata(dev);
+   int ret = 0;
+
+   switch (type) {
+   case hwmon_in:
+   switch (attr) {
+   case hwmon_in_lcrit:
+   snet_hwmon_read_reg(psnet, SNET_MON_VOLT_LCRIT_OFF, 
val);
+   break;
+   case hwmon_in_crit:
+   snet_hwmon_read_reg(psnet, SNET_MON_VOLT_CRIT_OFF, val);
+   break;
+   case hwmon_in_input:
+   

[PATCH v9 0/3] virtio: vdpa: new SolidNET DPU driver

2023-01-10 Thread Alvaro Karsz
This series adds a vDPA driver for SolidNET DPU.

Patch 1 adds SolidRun vendor ID to pci_ids.
Patch 2 adds a PCI quirk needed by the DPU.
Patch 3 has the driver source code.

Patch 1 is prerequisite for both patch 2 and patch 3.

v2:
- Semantics fixes in commit log - Patch 1.
- Move the vendor ID to the right place, sorted by vendor ID - Patch 1.
- Update patch subject to be more meaningful and similar to
  previous quirks - Patch 2.
- Update the commit log to describe better what the patch does -
  Patch 2.
- Auto detect the BAR used for communication - Patch 3.
- When waiting for the DPU to write a config, wait for 5secs
  before giving up on the device - Patch 3.
- Return EOPNOTSUPP error code in vDPA set_vq_state callback if
  the vq state is not the same as the initial one - Patch 3.
- Implement a vDPA reset callback - Patch 3.
- Wait for an ACK when sending a message to the DPU - Patch 3.
- Add endianness comments on 64bit read/write functions - Patch 3.
- Remove the get_iova_range and free vDPA callbacks - Patch 3.
- Usage of managed device functions to ioremap a region - Patch 3.
- Call pci_set_drvdata and pci_set_master before
  vdpa_register_device - Patch 3.
- Create DMA isolation between the vDPA devices by using the
  chip SR-IOV feature.
  Every vDPA device gets a PCIe VF with its own DMA device - Patch 3.

v3:
- Validate vDPA config length while receiving the DPU's config,
  not while trying to write the vDPA config to the DPU - Patch 3.
- Request IRQs when vDPA status is set to
  VIRTIO_CONFIG_S_DRIVER_OK - Patch 3.
- Remove snet_reset_dev() from the PCI remove function for a VF - Patch 
3.

v4:
- Get SolidRun vendor ID from pci_ids - Patch 3.

v5:
- Remove "select HWMON" from Kconfig.
  Usage of "depends on HWMON || HWMON=n" instead and usage of
  IS_ENABLED(CONFIG_HWMON) when calling to snet hwmon functions.
  snet_hwmon.c is compiled only if CONFIG_HWMON is defined - Patch 3.
- Remove the  #include  from snet_hwmon.c - Patch 
3.
- Remove the unnecessary (long) casting from snet_hwmon_read_reg -
  Patch 3.
- Remove the "_hwmon" ending from the hwmon name - Patch 3.
- Usage of IS_ERR macro to check if devm_hwmon_device_register_with_info
  failed - Patch 3.
- Replace schedule() with usleep_range() in the "hot loop" in
  psnet_detect_bar - Patch 3.
- Remove the logging of memory allocation failures - Patch 3.
- Add parenthesis to arguments in SNET_* macros - Patch 3.
- Prefer sizeof(*variable) instead of sizeof(struct x) when allocating
  memory - Patch 3.

v6:
- SNET_WRN -> SNET_WARN - Patch 3.

v7:
- Explain the dependency of SNET_VDPA on HWMON in Kconfig - Patch 3.
- Fix snprintf size argument in psnet_open_pf_bar and
  snet_open_vf_bar - Patch 3.
- Fix kernel warning in snet_vdpa_remove_vf.
  Call pci_free_irq_vectors after vdpa_unregister_device,
  otherwise we'll get "remove_proc_entry: removing non-empty
  directory 'irq/..', leaking at least '..'" warnings - Patch 3.
- Remove the psnet_create_hwmon function empty definition if
  HWMON is not enabled, otherwise, we'll get "warning: no
  previous prototype for 'psnet_create_hwmon'" when compiling
  with W=1.
  This was reported by kernel test robot  - Patch 3.

v8:
- Fix the series versioning.
  I updated the versions of every patch separately,
  which seems to be wrong.

v9:
- Indent PCI id in pci_ids.h with tabs - Patch 1.
- Wrap patch comment log to fill 75 columns - Patch 1 + 2.

Alvaro Karsz (3):
  PCI: Add SolidRun vendor ID
  PCI: Avoid FLR for SolidRun SNET DPU rev 1
  virtio: vdpa: new SolidNET DPU driver.

 MAINTAINERS|4 +
 drivers/pci/quirks.c   |8 +
 drivers/vdpa/Kconfig   |   18 +
 drivers/vdpa/Makefile  |1 +
 drivers/vdpa/solidrun/Makefile |6 +
 drivers/vdpa/solidrun/snet_hwmon.c |  188 +
 drivers/vdpa/solidrun/snet_main.c  | 1110 
 drivers/vdpa/solidrun/snet_vdpa.h  |  194 +
 include/linux/pci_ids.h|2 +
 9 files changed, 1531 insertions(+)
 create mode 100644 drivers/vdpa/solidrun/Makefile
 create mode 100644 drivers/vdpa/solidrun/snet_hwmon.c
 create mode 100644 drivers/vdpa/solidrun/snet_main.c
 create mode 100644 drivers/vdpa/solidrun/snet_vdpa.h

-- 
2.32.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v9 1/3] PCI: Add SolidRun vendor ID

2023-01-10 Thread Alvaro Karsz
Add SolidRun vendor ID to pci_ids.h

The vendor ID is used in 2 different source files, the SNET vDPA driver
and PCI quirks.

Signed-off-by: Alvaro Karsz 
Acked-by: Bjorn Helgaas 
---
 include/linux/pci_ids.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index b362d90eb9b..6716e6371a1 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -3092,6 +3092,8 @@
 
 #define PCI_VENDOR_ID_3COM_2   0xa727
 
+#define PCI_VENDOR_ID_SOLIDRUN 0xd063
+
 #define PCI_VENDOR_ID_DIGIUM   0xd161
 #define PCI_DEVICE_ID_DIGIUM_HFC4S 0xb410
 
-- 
2.32.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/3] PCI: Add SolidRun vendor ID

2023-01-10 Thread Alvaro Karsz
Ok, I'll send a new version
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/3] PCI: Add SolidRun vendor ID

2023-01-10 Thread Bjorn Helgaas
On Tue, Jan 10, 2023 at 05:46:37PM +0200, Alvaro Karsz wrote:
> > This should be indented with tabs instead of spaces so it matches the
> > rest of the file.
> 
> Michael, sorry about all the versions..
> Do you want me to fix it in a new version?
> I could fix it with a patch directly to the pci maintainers after your
> linux-next is merged, if this is ok with everyone.

It's not worth merging this patch and then adding another patch on top
just to convert spaces to tabs.

> > It's helpful if you can send the patches in a series so the individual
> > patches are replies to the cover letter.  That makes tools and
> > archives work better:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/5.Posting.rst?id=v6.1#n342
> 
> Yes, I fixed it in the next version:
> https://lists.linuxfoundation.org/pipermail/virtualization/2023-January/064190.html

It doesn't look fixed to me.  The "lore" archive is better than
pipermail, and the cover letter doesn't show any replies:
https://lore.kernel.org/linux-pci/20230109080429.1155046-1-alvaro.ka...@solid-run.com/

If you look at https://lore.kernel.org/linux-pci/, you'll see the
typical thread structure with patches nested under the cover letter.
The patches have "In-Reply-To" headers that reference the cover
letter.

Bjorn
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v9] virtio/vsock: replace virtio_vsock_pkt with sk_buff

2023-01-10 Thread Stefano Garzarella

On Sat, Jan 07, 2023 at 12:29:37AM +, Bobby Eshleman wrote:

This commit changes virtio/vsock to use sk_buff instead of
virtio_vsock_pkt. Beyond better conforming to other net code, using
sk_buff allows vsock to use sk_buff-dependent features in the future
(such as sockmap) and improves throughput.

This patch introduces the following performance changes:

Tool/Config: uperf w/ 64 threads, SOCK_STREAM
Test Runs: 5, mean of results
Before: commit 95ec6bce2a0b ("Merge branch 'net-ipa-more-endpoints'")

Test: 64KB, g2h
Before: 21.63 Gb/s
After: 25.59 Gb/s (+18%)

Test: 16B, g2h
Before: 11.86 Mb/s
After: 17.41 Mb/s (+46%)

Test: 64KB, h2g
Before: 2.15 Gb/s
After: 3.6 Gb/s (+67%)

Test: 16B, h2g
Before: 14.38 Mb/s
After: 18.43 Mb/s (+28%)

Signed-off-by: Bobby Eshleman 
Reviewed-by: Stefano Garzarella 
Acked-by: Paolo Abeni 
---

Tested using vsock_test g2h and h2g.  I'm not sure if it is standard
practice here to carry Acks and Reviews forward to future versions, but
I'm doing that here to hopefully make life easier for maintainers.
Please let me know if it is not standard practice.

Changes in v9:
- check length in rx header
- guard alloactor from small requests
- squashed fix for v8 bug reported by syzbot:
   syzbot+30b72abaa17c07fe3...@syzkaller.appspotmail.com

Changes in v8:
- vhost/vsock: remove unused enum
- vhost/vsock: use spin_lock_bh() instead of spin_lock()
- vsock/loopback: use __skb_dequeue instead of skb_dequeue

Changes in v7:
- use skb_queue_empty() instead of skb_queue_empty_lockless()

Changes in v6:
- use skb->cb instead of skb->_skb_refdst
- use lock-free __skb_queue_purge for rx_queue when rx_lock held

Changes in v5:
- last_skb instead of skb: last_hdr->len = cpu_to_le32(last_skb->len)

Changes in v4:
- vdso/bits.h -> linux/bits.h
- add virtio_vsock_alloc_skb() helper
- virtio/vsock: rename buf_len -> total_len
- update last_hdr->len
- fix build_skb() for vsockmon (tested)
- add queue helpers
- use spin_{unlock/lock}_bh() instead of spin_lock()/spin_unlock()
- note: I only ran a few g2h tests to check that this change
 had no perf impact. The above data is still from patch
 v3.

Changes in v3:
- fix seqpacket bug
- use zero in vhost_add_used(..., 0) device doesn't write to buffer
- use xmas tree style declarations
- vsock_hdr() -> virtio_vsock_hdr() and other include file style fixes
- no skb merging
- save space by not using vsock_metadata
- use _skb_refdst instead of skb buffer space for flags
- use skb_pull() to keep track of read bytes instead of using an an
 extra variable 'off' in the skb buffer space
- remove unnecessary sk_allocation assignment
- do not zero hdr needlessly
- introduce virtio_transport_skb_len() because skb->len changes now
- use spin_lock() directly on queue lock instead of sk_buff_head helpers
 which use spin_lock_irqsave() (e.g., skb_dequeue)
- do not reduce buffer size to be page size divisible
- Note: the biggest performance change came from loosening the spinlock
 variation and not reducing the buffer size.

Changes in v2:
- Use alloc_skb() directly instead of sock_alloc_send_pskb() to minimize
 uAPI changes.
- Do not marshal errors to -ENOMEM for non-virtio implementations.
- No longer a part of the original series
- Some code cleanup and refactoring
- Include performance stats

drivers/vhost/vsock.c   | 215 +---
include/linux/virtio_vsock.h| 129 ++--
net/vmw_vsock/virtio_transport.c| 149 +++--
net/vmw_vsock/virtio_transport_common.c | 422 +---
net/vmw_vsock/vsock_loopback.c  |  51 +--
5 files changed, 500 insertions(+), 466 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index cd6f7776013a..1f6542c7070b 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -51,8 +51,7 @@ struct vhost_vsock {
struct hlist_node hash;

struct vhost_work send_pkt_work;
-   spinlock_t send_pkt_list_lock;
-   struct list_head send_pkt_list; /* host->guest pending packets */
+   struct sk_buff_head send_pkt_queue; /* host->guest pending packets */

atomic_t queued_replies;

@@ -108,40 +107,33 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
vhost_disable_notify(>dev, vq);

do {
-   struct virtio_vsock_pkt *pkt;
+   struct virtio_vsock_hdr *hdr;
+   size_t iov_len, payload_len;
struct iov_iter iov_iter;
+   u32 flags_to_restore = 0;
+   struct sk_buff *skb;
unsigned out, in;
size_t nbytes;
-   size_t iov_len, payload_len;
int head;
-   u32 flags_to_restore = 0;

-   spin_lock_bh(>send_pkt_list_lock);
-   if (list_empty(>send_pkt_list)) {
-   spin_unlock_bh(>send_pkt_list_lock);
+   spin_lock_bh(>send_pkt_queue.lock);
+   skb = __skb_dequeue(>send_pkt_queue);
+   

Re: [PATCH v8 1/3] PCI: Add SolidRun vendor ID

2023-01-10 Thread Alvaro Karsz
Thanks Bjorn.

> This should be indented with tabs instead of spaces so it matches the
> rest of the file.

Michael, sorry about all the versions..
Do you want me to fix it in a new version?
I could fix it with a patch directly to the pci maintainers after your
linux-next is merged, if this is ok with everyone.

> It's helpful if you can send the patches in a series so the individual
> patches are replies to the cover letter.  That makes tools and
> archives work better:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/5.Posting.rst?id=v6.1#n342

Yes, I fixed it in the next version:
https://lists.linuxfoundation.org/pipermail/virtualization/2023-January/064190.html

Alvaro
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 1/3] PCI: Add SolidRun vendor ID

2023-01-10 Thread Bjorn Helgaas
On Mon, Jan 09, 2023 at 10:04:53AM +0200, Alvaro Karsz wrote:
> Add SolidRun vendor ID to pci_ids.h
> 
> The vendor ID is used in 2 different source files,
> the SNET vDPA driver and PCI quirks.

Nits: wrap to fill 75 columns.

> Signed-off-by: Alvaro Karsz 
> Acked-by: Bjorn Helgaas 
> ---
>  include/linux/pci_ids.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index b362d90eb9b..9a3102e61db 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -3092,6 +3092,8 @@
> 
>  #define PCI_VENDOR_ID_3COM_2 0xa727
> 
> +#define PCI_VENDOR_ID_SOLIDRUN  0xd063

This should be indented with tabs instead of spaces so it matches the
rest of the file.

It's helpful if you can send the patches in a series so the individual
patches are replies to the cover letter.  That makes tools and
archives work better:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/5.Posting.rst?id=v6.1#n342
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v9] virtio/vsock: replace virtio_vsock_pkt with sk_buff

2023-01-10 Thread Paolo Abeni
On Tue, 2023-01-10 at 09:36 +0100, Paolo Abeni wrote:
> On Sat, 2023-01-07 at 00:29 +, Bobby Eshleman wrote:
> > This commit changes virtio/vsock to use sk_buff instead of
> > virtio_vsock_pkt. Beyond better conforming to other net code, using
> > sk_buff allows vsock to use sk_buff-dependent features in the future
> > (such as sockmap) and improves throughput.
> > 
> > This patch introduces the following performance changes:
> > 
> > Tool/Config: uperf w/ 64 threads, SOCK_STREAM
> > Test Runs: 5, mean of results
> > Before: commit 95ec6bce2a0b ("Merge branch 'net-ipa-more-endpoints'")
> > 
> > Test: 64KB, g2h
> > Before: 21.63 Gb/s
> > After: 25.59 Gb/s (+18%)
> > 
> > Test: 16B, g2h
> > Before: 11.86 Mb/s
> > After: 17.41 Mb/s (+46%)
> > 
> > Test: 64KB, h2g
> > Before: 2.15 Gb/s
> > After: 3.6 Gb/s (+67%)
> > 
> > Test: 16B, h2g
> > Before: 14.38 Mb/s
> > After: 18.43 Mb/s (+28%)
> > 
> > Signed-off-by: Bobby Eshleman 
> > Reviewed-by: Stefano Garzarella 
> > Acked-by: Paolo Abeni 
> > ---
> > 
> > Tested using vsock_test g2h and h2g.  I'm not sure if it is standard
> > practice here to carry Acks and Reviews forward to future versions, but
> > I'm doing that here to hopefully make life easier for maintainers.
> > Please let me know if it is not standard practice.
> 
> As Jakub noted, there is no clear rule for tag passing across different
> patch revisions.
> 
> Here, given the complexity of the patch and the not trivial list of
> changes, I would have preferred you would have dropped my tag.
> 
> > Changes in v9:
> > - check length in rx header
> > - guard alloactor from small requests
> > - squashed fix for v8 bug reported by syzbot:
> > syzbot+30b72abaa17c07fe3...@syzkaller.appspotmail.com
> 
> It's not clear to me what/where is the fix exactly, could you please
> clarify?

Reading the syzkaller report, it looks like iov_length() in
vhost_vsock_alloc_pkt() can not be trusted to carry a reasonable value.

As such, don't you additionally need to ensure/check that iov_length()
is greater or equal to sizeof(virtio_vsock_hdr) ?

Thanks.

Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v9] virtio/vsock: replace virtio_vsock_pkt with sk_buff

2023-01-10 Thread Michael S. Tsirkin
On Sat, Jan 07, 2023 at 12:29:37AM +, Bobby Eshleman wrote:
> This commit changes virtio/vsock to use sk_buff instead of
> virtio_vsock_pkt. Beyond better conforming to other net code, using
> sk_buff allows vsock to use sk_buff-dependent features in the future
> (such as sockmap) and improves throughput.
> 
> This patch introduces the following performance changes:
> 
> Tool/Config: uperf w/ 64 threads, SOCK_STREAM
> Test Runs: 5, mean of results
> Before: commit 95ec6bce2a0b ("Merge branch 'net-ipa-more-endpoints'")
> 
> Test: 64KB, g2h
> Before: 21.63 Gb/s
> After: 25.59 Gb/s (+18%)
> 
> Test: 16B, g2h
> Before: 11.86 Mb/s
> After: 17.41 Mb/s (+46%)
> 
> Test: 64KB, h2g
> Before: 2.15 Gb/s
> After: 3.6 Gb/s (+67%)
> 
> Test: 16B, h2g
> Before: 14.38 Mb/s
> After: 18.43 Mb/s (+28%)

I suspect you didn't redo the perf testing since #s match exactly. Not
that I see any major changes but you never know.

> Signed-off-by: Bobby Eshleman 
> Reviewed-by: Stefano Garzarella 
> Acked-by: Paolo Abeni 

Acked-by: Michael S. Tsirkin 

> ---
> 
> Tested using vsock_test g2h and h2g.  I'm not sure if it is standard
> practice here to carry Acks and Reviews forward to future versions, but
> I'm doing that here to hopefully make life easier for maintainers.
> Please let me know if it is not standard practice.
> 
> Changes in v9:
> - check length in rx header
> - guard alloactor from small requests
> - squashed fix for v8 bug reported by syzbot:
> syzbot+30b72abaa17c07fe3...@syzkaller.appspotmail.com
> 
> Changes in v8:
> - vhost/vsock: remove unused enum
> - vhost/vsock: use spin_lock_bh() instead of spin_lock()
> - vsock/loopback: use __skb_dequeue instead of skb_dequeue
> 
> Changes in v7:
> - use skb_queue_empty() instead of skb_queue_empty_lockless()
> 
> Changes in v6:
> - use skb->cb instead of skb->_skb_refdst
> - use lock-free __skb_queue_purge for rx_queue when rx_lock held
> 
> Changes in v5:
> - last_skb instead of skb: last_hdr->len = cpu_to_le32(last_skb->len)
> 
> Changes in v4:
> - vdso/bits.h -> linux/bits.h
> - add virtio_vsock_alloc_skb() helper
> - virtio/vsock: rename buf_len -> total_len
> - update last_hdr->len
> - fix build_skb() for vsockmon (tested)
> - add queue helpers
> - use spin_{unlock/lock}_bh() instead of spin_lock()/spin_unlock()
> - note: I only ran a few g2h tests to check that this change
>   had no perf impact. The above data is still from patch
>   v3.
> 
> Changes in v3:
> - fix seqpacket bug
> - use zero in vhost_add_used(..., 0) device doesn't write to buffer
> - use xmas tree style declarations
> - vsock_hdr() -> virtio_vsock_hdr() and other include file style fixes
> - no skb merging
> - save space by not using vsock_metadata
> - use _skb_refdst instead of skb buffer space for flags
> - use skb_pull() to keep track of read bytes instead of using an an
>   extra variable 'off' in the skb buffer space
> - remove unnecessary sk_allocation assignment
> - do not zero hdr needlessly
> - introduce virtio_transport_skb_len() because skb->len changes now
> - use spin_lock() directly on queue lock instead of sk_buff_head helpers
>   which use spin_lock_irqsave() (e.g., skb_dequeue)
> - do not reduce buffer size to be page size divisible
> - Note: the biggest performance change came from loosening the spinlock
>   variation and not reducing the buffer size.
> 
> Changes in v2:
> - Use alloc_skb() directly instead of sock_alloc_send_pskb() to minimize
>   uAPI changes.
> - Do not marshal errors to -ENOMEM for non-virtio implementations.
> - No longer a part of the original series
> - Some code cleanup and refactoring
> - Include performance stats
> 
>  drivers/vhost/vsock.c   | 215 +---
>  include/linux/virtio_vsock.h| 129 ++--
>  net/vmw_vsock/virtio_transport.c| 149 +++--
>  net/vmw_vsock/virtio_transport_common.c | 422 +---
>  net/vmw_vsock/vsock_loopback.c  |  51 +--
>  5 files changed, 500 insertions(+), 466 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index cd6f7776013a..1f6542c7070b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -51,8 +51,7 @@ struct vhost_vsock {
>   struct hlist_node hash;
>  
>   struct vhost_work send_pkt_work;
> - spinlock_t send_pkt_list_lock;
> - struct list_head send_pkt_list; /* host->guest pending packets */
> + struct sk_buff_head send_pkt_queue; /* host->guest pending packets */
>  
>   atomic_t queued_replies;
>  
> @@ -108,40 +107,33 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>   vhost_disable_notify(>dev, vq);
>  
>   do {
> - struct virtio_vsock_pkt *pkt;
> + struct virtio_vsock_hdr *hdr;
> + size_t iov_len, payload_len;
>   struct iov_iter iov_iter;
> + u32 flags_to_restore = 0;
> + struct sk_buff *skb;
>   unsigned out, in;
>   size_t nbytes;
> - 

Re: [PATCH net-next v6 4/4] test/vsock: vsock_perf utility

2023-01-10 Thread Paolo Abeni
Hi,

sorry for the late feedback, a couple of notes below...

On Sun, 2023-01-08 at 20:43 +, Arseniy Krasnov wrote:
> This adds utility to check vsock rx/tx performance.
> 
> Usage as sender:
> ./vsock_perf --sender  --port  --bytes 
> Usage as receiver:
> ./vsock_perf --port  --rcvlowat 
> 
> Signed-off-by: Arseniy Krasnov 
> Reviewed-by: Stefano Garzarella 
> ---
>  tools/testing/vsock/Makefile |   3 +-
>  tools/testing/vsock/README   |  34 +++
>  tools/testing/vsock/vsock_perf.c | 441 +++
>  3 files changed, 477 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/vsock/vsock_perf.c
> 
> diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
> index f8293c6910c9..43a254f0e14d 100644
> --- a/tools/testing/vsock/Makefile
> +++ b/tools/testing/vsock/Makefile
> @@ -1,8 +1,9 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -all: test
> +all: test vsock_perf
>  test: vsock_test vsock_diag_test
>  vsock_test: vsock_test.o timeout.o control.o util.o
>  vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
> +vsock_perf: vsock_perf.o
>  
>  CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include 
> -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
> -U_FORTIFY_SOURCE -D_GNU_SOURCE
>  .PHONY: all test clean
> diff --git a/tools/testing/vsock/README b/tools/testing/vsock/README
> index 4d5045e7d2c3..84ee217ba8ee 100644
> --- a/tools/testing/vsock/README
> +++ b/tools/testing/vsock/README
> @@ -35,3 +35,37 @@ Invoke test binaries in both directions as follows:
> --control-port=$GUEST_IP \
> --control-port=1234 \
> --peer-cid=3
> +
> +vsock_perf utility
> +---
> +'vsock_perf' is a simple tool to measure vsock performance. It works in
> +sender/receiver modes: sender connect to peer at the specified port and
> +starts data transmission to the receiver. After data processing is done,
> +it prints several metrics(see below).
> +
> +Usage:
> +# run as sender
> +# connect to CID 2, port 1234, send 1G of data, tx buf size is 1M
> +./vsock_perf --sender 2 --port 1234 --bytes 1G --buf-size 1M
> +
> +Output:
> +tx performance: A Gbits/s
> +
> +Output explanation:
> +A is calculated as "number of bits to send" / "time in tx loop"
> +
> +# run as receiver
> +# listen port 1234, rx buf size is 1M, socket buf size is 1G, SO_RCVLOWAT is 
> 64K
> +./vsock_perf --port 1234 --buf-size 1M --vsk-size 1G --rcvlowat 64K
> +
> +Output:
> +rx performance: A Gbits/s
> +total in 'read()': B sec
> +POLLIN wakeups: C
> +average in 'read()': D ns
> +
> +Output explanation:
> +A is calculated as "number of received bits" / "time in rx loop".
> +B is time, spent in 'read()' system call(excluding 'poll()')
> +C is number of 'poll()' wake ups with POLLIN bit set.
> +D is B / C, e.g. average amount of time, spent in single 'read()'.
> diff --git a/tools/testing/vsock/vsock_perf.c 
> b/tools/testing/vsock/vsock_perf.c
> new file mode 100644
> index ..ccd595462b40
> --- /dev/null
> +++ b/tools/testing/vsock/vsock_perf.c
> @@ -0,0 +1,441 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * vsock_perf - benchmark utility for vsock.
> + *
> + * Copyright (C) 2022 SberDevices.
> + *
> + * Author: Arseniy Krasnov 
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DEFAULT_BUF_SIZE_BYTES   (128 * 1024)
> +#define DEFAULT_TO_SEND_BYTES(64 * 1024)
> +#define DEFAULT_VSOCK_BUF_BYTES (256 * 1024)
> +#define DEFAULT_RCVLOWAT_BYTES   1
> +#define DEFAULT_PORT 1234
> +
> +#define BYTES_PER_GB (1024 * 1024 * 1024ULL)
> +#define NSEC_PER_SEC (10ULL)
> +
> +static unsigned int port = DEFAULT_PORT;
> +static unsigned long buf_size_bytes = DEFAULT_BUF_SIZE_BYTES;
> +static unsigned long vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES;
> +
> +static inline time_t current_nsec(void)

Minor nit: you should avoid 'static inline' functions in c files,
'static' would suffice and will allow the compiler to do a better job.

> +{
> + struct timespec ts;
> +
> + if (clock_gettime(CLOCK_REALTIME, )) {
> + perror("clock_gettime");
> + exit(EXIT_FAILURE);
> + }
> +
> + return (ts.tv_sec * NSEC_PER_SEC) + ts.tv_nsec;
> +}
> +
> +/* From lib/cmdline.c. */
> +static unsigned long memparse(const char *ptr)
> +{
> + char *endptr;
> +
> + unsigned long long ret = strtoull(ptr, , 0);
> +
> + switch (*endptr) {
> + case 'E':
> + case 'e':
> + ret <<= 10;
> + case 'P':
> + case 'p':
> + ret <<= 10;
> + case 'T':
> + case 't':
> + ret <<= 10;
> + case 'G':
> + case 'g':
> + ret <<= 10;
> + case 'M':
> + case 'm':
> + ret <<= 10;
> + case 'K':
> 

Re: [PATCH net-next v9] virtio/vsock: replace virtio_vsock_pkt with sk_buff

2023-01-10 Thread Paolo Abeni
On Sat, 2023-01-07 at 00:29 +, Bobby Eshleman wrote:
> This commit changes virtio/vsock to use sk_buff instead of
> virtio_vsock_pkt. Beyond better conforming to other net code, using
> sk_buff allows vsock to use sk_buff-dependent features in the future
> (such as sockmap) and improves throughput.
> 
> This patch introduces the following performance changes:
> 
> Tool/Config: uperf w/ 64 threads, SOCK_STREAM
> Test Runs: 5, mean of results
> Before: commit 95ec6bce2a0b ("Merge branch 'net-ipa-more-endpoints'")
> 
> Test: 64KB, g2h
> Before: 21.63 Gb/s
> After: 25.59 Gb/s (+18%)
> 
> Test: 16B, g2h
> Before: 11.86 Mb/s
> After: 17.41 Mb/s (+46%)
> 
> Test: 64KB, h2g
> Before: 2.15 Gb/s
> After: 3.6 Gb/s (+67%)
> 
> Test: 16B, h2g
> Before: 14.38 Mb/s
> After: 18.43 Mb/s (+28%)
> 
> Signed-off-by: Bobby Eshleman 
> Reviewed-by: Stefano Garzarella 
> Acked-by: Paolo Abeni 
> ---
> 
> Tested using vsock_test g2h and h2g.  I'm not sure if it is standard
> practice here to carry Acks and Reviews forward to future versions, but
> I'm doing that here to hopefully make life easier for maintainers.
> Please let me know if it is not standard practice.

As Jakub noted, there is no clear rule for tag passing across different
patch revisions.

Here, given the complexity of the patch and the not trivial list of
changes, I would have preferred you would have dropped my tag.

> Changes in v9:
> - check length in rx header
> - guard alloactor from small requests
> - squashed fix for v8 bug reported by syzbot:
> syzbot+30b72abaa17c07fe3...@syzkaller.appspotmail.com

It's not clear to me what/where is the fix exactly, could you please
clarify?

Thanks!

Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization