Re: [PATCH 2/2] virtio_ring: add unlikely annotation for free descs check

2022-03-28 Thread Jason Wang
On Mon, Mar 28, 2022 at 6:58 PM Xianting Tian
 wrote:
>
> The 'if (vq->vq.num_free < descs_used)' check will almost always be false.
>
> Signed-off-by: Xianting Tian 

Acked-by: Jason Wang 

> ---
>  drivers/virtio/virtio_ring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d597fc0874ec..ab6d5f0cb579 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -525,7 +525,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
> descs_used = total_sg;
> }
>
> -   if (vq->vq.num_free < descs_used) {
> +   if (unlikely(vq->vq.num_free < descs_used)) {
> pr_debug("Can't add buf len %i - avail = %i\n",
>  descs_used, vq->vq.num_free);
> /* FIXME: for historical reasons, we force a notify here if
> --
> 2.17.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] virtio_ring: remove unnecessary to_vvq call in vring hot path

2022-03-28 Thread Jason Wang
On Mon, Mar 28, 2022 at 6:58 PM Xianting Tian
 wrote:
>
> It passes '_vq' to virtqueue_use_indirect(), which still calls
> to_vvq to get 'vq', let's directly pass 'vq'. It can avoid
> unnecessary call of to_vvq in hot path.
>
> Signed-off-by: Xianting Tian 

Acked-by: Jason Wang 

> ---
>  drivers/virtio/virtio_ring.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 962f1477b1fa..d597fc0874ec 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -205,11 +205,9 @@ struct vring_virtqueue {
>
>  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>
> -static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
> +static inline bool virtqueue_use_indirect(struct vring_virtqueue *vq,
>   unsigned int total_sg)
>  {
> -   struct vring_virtqueue *vq = to_vvq(_vq);
> -
> /*
>  * If the host supports indirect descriptor tables, and we have 
> multiple
>  * buffers, then go indirect. FIXME: tune this threshold
> @@ -507,7 +505,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>
> head = vq->free_head;
>
> -   if (virtqueue_use_indirect(_vq, total_sg))
> +   if (virtqueue_use_indirect(vq, total_sg))
> desc = alloc_indirect_split(_vq, total_sg, gfp);
> else {
> desc = NULL;
> @@ -1194,7 +1192,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
> *_vq,
>
> BUG_ON(total_sg == 0);
>
> -   if (virtqueue_use_indirect(_vq, total_sg)) {
> +   if (virtqueue_use_indirect(vq, total_sg)) {
> err = virtqueue_add_indirect_packed(vq, sgs, total_sg, 
> out_sgs,
> in_sgs, data, gfp);
> if (err != -ENOMEM) {
> --
> 2.17.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RESEND V2 1/3] vdpa: mlx5: prevent cvq work from hogging CPU

2022-03-28 Thread Jason Wang
A userspace triggerable infinite loop could happen in
mlx5_cvq_kick_handler() if userspace keeps sending a huge amount of
cvq requests.

Fixing this by introducing a quota and re-queue the work if we're out
of the budget (currently the implicit budget is one) . While at it,
using a per device work struct to avoid on demand memory allocation
for cvq.

Fixes: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Jason Wang 
---
Changes since V1:
- Using 1 as the budget
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index d0f91078600e..b2afd2b6fbca 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -163,6 +163,7 @@ struct mlx5_vdpa_net {
u32 cur_num_vqs;
struct notifier_block nb;
struct vdpa_callback config_cb;
+   struct mlx5_vdpa_wq_ent cvq_ent;
 };
 
 static void free_resources(struct mlx5_vdpa_net *ndev);
@@ -1616,10 +1617,10 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
ndev = to_mlx5_vdpa_ndev(mvdev);
cvq = >cvq;
if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
-   goto out;
+   return;
 
if (!cvq->ready)
-   goto out;
+   return;
 
while (true) {
err = vringh_getdesc_iotlb(>vring, >riov, >wiov, 
>head,
@@ -1653,9 +1654,10 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
 
if (vringh_need_notify_iotlb(>vring))
vringh_notify(>vring);
+
+   queue_work(mvdev->wq, >work);
+   break;
}
-out:
-   kfree(wqent);
 }
 
 static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)
@@ -1663,7 +1665,6 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, 
u16 idx)
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
struct mlx5_vdpa_virtqueue *mvq;
-   struct mlx5_vdpa_wq_ent *wqent;
 
if (!is_index_valid(mvdev, idx))
return;
@@ -1672,13 +1673,7 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, 
u16 idx)
if (!mvdev->cvq.ready)
return;
 
-   wqent = kzalloc(sizeof(*wqent), GFP_ATOMIC);
-   if (!wqent)
-   return;
-
-   wqent->mvdev = mvdev;
-   INIT_WORK(>work, mlx5_cvq_kick_handler);
-   queue_work(mvdev->wq, >work);
+   queue_work(mvdev->wq, >cvq_ent.work);
return;
}
 
@@ -2668,6 +2663,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
if (err)
goto err_mr;
 
+   ndev->cvq_ent.mvdev = mvdev;
+   INIT_WORK(>cvq_ent.work, mlx5_cvq_kick_handler);
mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_wq");
if (!mvdev->wq) {
err = -ENOMEM;
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RESEND V2 3/3] vdpa/mlx5: Use consistent RQT size

2022-03-28 Thread Jason Wang
From: Eli Cohen 

The current code evaluates RQT size based on the configured number of
virtqueues. This can raise an issue in the following scenario:

Assume MQ was negotiated.
1. mlx5_vdpa_set_map() gets called.
2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
   than the configured max VQs.
3. A second set_map gets called, but now a smaller number of VQs is used
   to evaluate the size of the RQT.
4. handle_ctrl_mq() is called with a value larger than what the RQT can
   hold. This will emit errors and the driver state is compromised.

To fix this, we use a new field in struct mlx5_vdpa_net to hold the
required number of entries in the RQT. This value is evaluated in
mlx5_vdpa_set_driver_features() where we have the negotiated features
all set up.

In addtion to that, we take into consideration the max capability of RQT
entries early when the device is added so we don't need to take consider
it when creating the RQT.

Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
max_vas / 2 and make the code clearer.

Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 61 +++
 1 file changed, 21 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 53b8c1a68f90..61bec1ed0bc9 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -161,6 +161,7 @@ struct mlx5_vdpa_net {
struct mlx5_flow_handle *rx_rule_mcast;
bool setup;
u32 cur_num_vqs;
+   u32 rqt_size;
struct notifier_block nb;
struct vdpa_callback config_cb;
struct mlx5_vdpa_wq_ent cvq_ent;
@@ -204,17 +205,12 @@ static __virtio16 cpu_to_mlx5vdpa16(struct mlx5_vdpa_dev 
*mvdev, u16 val)
return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), val);
 }
 
-static inline u32 mlx5_vdpa_max_qps(int max_vqs)
-{
-   return max_vqs / 2;
-}
-
 static u16 ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev)
 {
if (!(mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
return 2;
 
-   return 2 * mlx5_vdpa_max_qps(mvdev->max_vqs);
+   return mvdev->max_vqs;
 }
 
 static bool is_ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev, u16 idx)
@@ -1236,25 +1232,13 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueue *
 static int create_rqt(struct mlx5_vdpa_net *ndev)
 {
__be32 *list;
-   int max_rqt;
void *rqtc;
int inlen;
void *in;
int i, j;
int err;
-   int num;
-
-   if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
-   num = 1;
-   else
-   num = ndev->cur_num_vqs / 2;
 
-   max_rqt = min_t(int, roundup_pow_of_two(num),
-   1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
-   if (max_rqt < 1)
-   return -EOPNOTSUPP;
-
-   inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + max_rqt * 
MLX5_ST_SZ_BYTES(rq_num);
+   inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + ndev->rqt_size * 
MLX5_ST_SZ_BYTES(rq_num);
in = kzalloc(inlen, GFP_KERNEL);
if (!in)
return -ENOMEM;
@@ -1263,12 +1247,12 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
 
MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
-   MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt);
+   MLX5_SET(rqtc, rqtc, rqt_max_size, ndev->rqt_size);
list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
-   for (i = 0, j = 0; i < max_rqt; i++, j += 2)
-   list[i] = cpu_to_be32(ndev->vqs[j % (2 * num)].virtq_id);
+   for (i = 0, j = 0; i < ndev->rqt_size; i++, j += 2)
+   list[i] = cpu_to_be32(ndev->vqs[j % 
ndev->cur_num_vqs].virtq_id);
 
-   MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt);
+   MLX5_SET(rqtc, rqtc, rqt_actual_size, ndev->rqt_size);
err = mlx5_vdpa_create_rqt(>mvdev, in, inlen, >res.rqtn);
kfree(in);
if (err)
@@ -1282,19 +1266,13 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
 static int modify_rqt(struct mlx5_vdpa_net *ndev, int num)
 {
__be32 *list;
-   int max_rqt;
void *rqtc;
int inlen;
void *in;
int i, j;
int err;
 
-   max_rqt = min_t(int, roundup_pow_of_two(ndev->cur_num_vqs / 2),
-   1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
-   if (max_rqt < 1)
-   return -EOPNOTSUPP;
-
-   inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + max_rqt * 
MLX5_ST_SZ_BYTES(rq_num);
+   inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + ndev->rqt_size * 
MLX5_ST_SZ_BYTES(rq_num);
in = kzalloc(inlen, GFP_KERNEL);
if (!in)
return -ENOMEM;
@@ -1305,10 +1283,10 @@ static int modify_rqt(struct mlx5_vdpa_net *ndev, int 
num)

[PATCH RESEND V2 2/3] vdpa: mlx5: synchronize driver status with CVQ

2022-03-28 Thread Jason Wang
Currently, CVQ doesn't have any synchronization with the driver
status. Then CVQ emulation code run in the middle of:

1) device reset
2) device status changed
3) map updating

The will lead several unexpected issue like trying to execute CVQ
command after the driver has been teared down.

Fixing this by using reslock to synchronize CVQ emulation code with
the driver status changing:

- protect the whole device reset, status changing and set_map()
  updating with reslock
- protect the CVQ handler with the reslock and check
  VIRTIO_CONFIG_S_DRIVER_OK in the CVQ handler

This will guarantee that:

1) CVQ handler won't work if VIRTIO_CONFIG_S_DRIVER_OK is not set
2) CVQ handler will see a consistent state of the driver instead of
   the partial one when it is running in the middle of the
   teardown_driver() or setup_driver().

Cc: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Jason Wang 
---
Changes since V1:
- document the lock requirement
- protect the whole .set_map()
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 51 ++-
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b2afd2b6fbca..53b8c1a68f90 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1616,11 +1616,17 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
mvdev = wqent->mvdev;
ndev = to_mlx5_vdpa_ndev(mvdev);
cvq = >cvq;
+
+   mutex_lock(>reslock);
+
+   if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK))
+   goto out;
+
if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
-   return;
+   goto out;
 
if (!cvq->ready)
-   return;
+   goto out;
 
while (true) {
err = vringh_getdesc_iotlb(>vring, >riov, >wiov, 
>head,
@@ -1658,6 +1664,9 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
queue_work(mvdev->wq, >work);
break;
}
+
+out:
+   mutex_unlock(>reslock);
 }
 
 static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)
@@ -2132,7 +2141,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev, struct vhost_iotlb
goto err_mr;
 
if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK))
-   return 0;
+   goto err_mr;
 
restore_channels_info(ndev);
err = setup_driver(mvdev);
@@ -2147,12 +2156,14 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev, struct vhost_iotlb
return err;
 }
 
+/* reslock must be held for this function */
 static int setup_driver(struct mlx5_vdpa_dev *mvdev)
 {
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
int err;
 
-   mutex_lock(>reslock);
+   WARN_ON(!mutex_is_locked(>reslock));
+
if (ndev->setup) {
mlx5_vdpa_warn(mvdev, "setup driver called for already setup 
driver\n");
err = 0;
@@ -2182,7 +2193,6 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
goto err_fwd;
}
ndev->setup = true;
-   mutex_unlock(>reslock);
 
return 0;
 
@@ -2193,23 +2203,23 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
 err_rqt:
teardown_virtqueues(ndev);
 out:
-   mutex_unlock(>reslock);
return err;
 }
 
+/* reslock must be held for this function */
 static void teardown_driver(struct mlx5_vdpa_net *ndev)
 {
-   mutex_lock(>reslock);
+
+   WARN_ON(!mutex_is_locked(>reslock));
+
if (!ndev->setup)
-   goto out;
+   return;
 
remove_fwd_to_tir(ndev);
destroy_tir(ndev);
destroy_rqt(ndev);
teardown_virtqueues(ndev);
ndev->setup = false;
-out:
-   mutex_unlock(>reslock);
 }
 
 static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
@@ -2230,6 +2240,8 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
 
print_status(mvdev, status, true);
 
+   mutex_lock(>reslock);
+
if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
err = setup_driver(mvdev);
@@ -2239,16 +2251,19 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
}
} else {
mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be 
cleared\n");
-   return;
+   goto err_clear;
}
}
 
ndev->mvdev.status = status;
+   mutex_unlock(>reslock);
return;
 
 err_setup:
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
+err_clear:
+   mutex_unlock(>reslock);
 }
 
 static int mlx5_vdpa_reset(struct vdpa_device *vdev)
@@ -2258,6 +2273,8 @@ static 

Re: [PATCH V2 3/3] vdpa/mlx5: Use consistent RQT size

2022-03-28 Thread Jason Wang
My bad, dunno what happened.

Please ignore this series, I will repost.

Thanks

On Tue, Mar 29, 2022 at 12:08 PM Jason Wang  wrote:

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 2/3] vdpa: mlx5: synchronize driver status with CVQ

2022-03-28 Thread Jason Wang
Currently, CVQ doesn't have any synchronization with the driver
status. Then CVQ emulation code run in the middle of:

1) device reset
2) device status changed
3) map updating

The will lead several unexpected issue like trying to execute CVQ
command after the driver has been teared down.

Fixing this by using reslock to synchronize CVQ emulation code with
the driver status changing:

- protect the whole device reset, status changing and set_map()
  updating with reslock
- protect the CVQ handler with the reslock and check
  VIRTIO_CONFIG_S_DRIVER_OK in the CVQ handler

This will guarantee that:

1) CVQ handler won't work if VIRTIO_CONFIG_S_DRIVER_OK is not set
2) CVQ handler will see a consistent state of the driver instead of
   the partial one when it is running in the middle of the
   teardown_driver() or setup_driver().

Cc: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Jason Wang 
---
Changes since V1:
- document the lock requirement
- protect the whole .set_map()
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 51 ++-
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b2afd2b6fbca..53b8c1a68f90 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1616,11 +1616,17 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
mvdev = wqent->mvdev;
ndev = to_mlx5_vdpa_ndev(mvdev);
cvq = >cvq;
+
+   mutex_lock(>reslock);
+
+   if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK))
+   goto out;
+
if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
-   return;
+   goto out;
 
if (!cvq->ready)
-   return;
+   goto out;
 
while (true) {
err = vringh_getdesc_iotlb(>vring, >riov, >wiov, 
>head,
@@ -1658,6 +1664,9 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
queue_work(mvdev->wq, >work);
break;
}
+
+out:
+   mutex_unlock(>reslock);
 }
 
 static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)
@@ -2132,7 +2141,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev, struct vhost_iotlb
goto err_mr;
 
if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK))
-   return 0;
+   goto err_mr;
 
restore_channels_info(ndev);
err = setup_driver(mvdev);
@@ -2147,12 +2156,14 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev, struct vhost_iotlb
return err;
 }
 
+/* reslock must be held for this function */
 static int setup_driver(struct mlx5_vdpa_dev *mvdev)
 {
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
int err;
 
-   mutex_lock(>reslock);
+   WARN_ON(!mutex_is_locked(>reslock));
+
if (ndev->setup) {
mlx5_vdpa_warn(mvdev, "setup driver called for already setup 
driver\n");
err = 0;
@@ -2182,7 +2193,6 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
goto err_fwd;
}
ndev->setup = true;
-   mutex_unlock(>reslock);
 
return 0;
 
@@ -2193,23 +2203,23 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
 err_rqt:
teardown_virtqueues(ndev);
 out:
-   mutex_unlock(>reslock);
return err;
 }
 
+/* reslock must be held for this function */
 static void teardown_driver(struct mlx5_vdpa_net *ndev)
 {
-   mutex_lock(>reslock);
+
+   WARN_ON(!mutex_is_locked(>reslock));
+
if (!ndev->setup)
-   goto out;
+   return;
 
remove_fwd_to_tir(ndev);
destroy_tir(ndev);
destroy_rqt(ndev);
teardown_virtqueues(ndev);
ndev->setup = false;
-out:
-   mutex_unlock(>reslock);
 }
 
 static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
@@ -2230,6 +2240,8 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
 
print_status(mvdev, status, true);
 
+   mutex_lock(>reslock);
+
if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
err = setup_driver(mvdev);
@@ -2239,16 +2251,19 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
}
} else {
mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be 
cleared\n");
-   return;
+   goto err_clear;
}
}
 
ndev->mvdev.status = status;
+   mutex_unlock(>reslock);
return;
 
 err_setup:
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
+err_clear:
+   mutex_unlock(>reslock);
 }
 
 static int mlx5_vdpa_reset(struct vdpa_device *vdev)
@@ -2258,6 +2273,8 @@ static 

[PATCH V2 3/3] vdpa/mlx5: Use consistent RQT size

2022-03-28 Thread Jason Wang
From: Eli Cohen 

The current code evaluates RQT size based on the configured number of
virtqueues. This can raise an issue in the following scenario:

Assume MQ was negotiated.
1. mlx5_vdpa_set_map() gets called.
2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
   than the configured max VQs.
3. A second set_map gets called, but now a smaller number of VQs is used
   to evaluate the size of the RQT.
4. handle_ctrl_mq() is called with a value larger than what the RQT can
   hold. This will emit errors and the driver state is compromised.

To fix this, we use a new field in struct mlx5_vdpa_net to hold the
required number of entries in the RQT. This value is evaluated in
mlx5_vdpa_set_driver_features() where we have the negotiated features
all set up.

In addtion to that, we take into consideration the max capability of RQT
entries early when the device is added so we don't need to take consider
it when creating the RQT.

Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
max_vas / 2 and make the code clearer.

Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 61 +++
 1 file changed, 21 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 53b8c1a68f90..61bec1ed0bc9 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -161,6 +161,7 @@ struct mlx5_vdpa_net {
struct mlx5_flow_handle *rx_rule_mcast;
bool setup;
u32 cur_num_vqs;
+   u32 rqt_size;
struct notifier_block nb;
struct vdpa_callback config_cb;
struct mlx5_vdpa_wq_ent cvq_ent;
@@ -204,17 +205,12 @@ static __virtio16 cpu_to_mlx5vdpa16(struct mlx5_vdpa_dev 
*mvdev, u16 val)
return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), val);
 }
 
-static inline u32 mlx5_vdpa_max_qps(int max_vqs)
-{
-   return max_vqs / 2;
-}
-
 static u16 ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev)
 {
if (!(mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
return 2;
 
-   return 2 * mlx5_vdpa_max_qps(mvdev->max_vqs);
+   return mvdev->max_vqs;
 }
 
 static bool is_ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev, u16 idx)
@@ -1236,25 +1232,13 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueue *
 static int create_rqt(struct mlx5_vdpa_net *ndev)
 {
__be32 *list;
-   int max_rqt;
void *rqtc;
int inlen;
void *in;
int i, j;
int err;
-   int num;
-
-   if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
-   num = 1;
-   else
-   num = ndev->cur_num_vqs / 2;
 
-   max_rqt = min_t(int, roundup_pow_of_two(num),
-   1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
-   if (max_rqt < 1)
-   return -EOPNOTSUPP;
-
-   inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + max_rqt * 
MLX5_ST_SZ_BYTES(rq_num);
+   inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + ndev->rqt_size * 
MLX5_ST_SZ_BYTES(rq_num);
in = kzalloc(inlen, GFP_KERNEL);
if (!in)
return -ENOMEM;
@@ -1263,12 +1247,12 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
 
MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
-   MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt);
+   MLX5_SET(rqtc, rqtc, rqt_max_size, ndev->rqt_size);
list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
-   for (i = 0, j = 0; i < max_rqt; i++, j += 2)
-   list[i] = cpu_to_be32(ndev->vqs[j % (2 * num)].virtq_id);
+   for (i = 0, j = 0; i < ndev->rqt_size; i++, j += 2)
+   list[i] = cpu_to_be32(ndev->vqs[j % 
ndev->cur_num_vqs].virtq_id);
 
-   MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt);
+   MLX5_SET(rqtc, rqtc, rqt_actual_size, ndev->rqt_size);
err = mlx5_vdpa_create_rqt(>mvdev, in, inlen, >res.rqtn);
kfree(in);
if (err)
@@ -1282,19 +1266,13 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
 static int modify_rqt(struct mlx5_vdpa_net *ndev, int num)
 {
__be32 *list;
-   int max_rqt;
void *rqtc;
int inlen;
void *in;
int i, j;
int err;
 
-   max_rqt = min_t(int, roundup_pow_of_two(ndev->cur_num_vqs / 2),
-   1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
-   if (max_rqt < 1)
-   return -EOPNOTSUPP;
-
-   inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + max_rqt * 
MLX5_ST_SZ_BYTES(rq_num);
+   inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + ndev->rqt_size * 
MLX5_ST_SZ_BYTES(rq_num);
in = kzalloc(inlen, GFP_KERNEL);
if (!in)
return -ENOMEM;
@@ -1305,10 +1283,10 @@ static int modify_rqt(struct mlx5_vdpa_net *ndev, int 
num)

[PATCH V2 1/3] vdpa: mlx5: prevent cvq work from hogging CPU

2022-03-28 Thread Jason Wang
A userspace triggerable infinite loop could happen in
mlx5_cvq_kick_handler() if userspace keeps sending a huge amount of
cvq requests.

Fixing this by introducing a quota and re-queue the work if we're out
of the budget (currently the implicit budget is one) . While at it,
using a per device work struct to avoid on demand memory allocation
for cvq.

Fixes: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Jason Wang 
---
Changes since V1:
- Using 1 as the budget
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index d0f91078600e..b2afd2b6fbca 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -163,6 +163,7 @@ struct mlx5_vdpa_net {
u32 cur_num_vqs;
struct notifier_block nb;
struct vdpa_callback config_cb;
+   struct mlx5_vdpa_wq_ent cvq_ent;
 };
 
 static void free_resources(struct mlx5_vdpa_net *ndev);
@@ -1616,10 +1617,10 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
ndev = to_mlx5_vdpa_ndev(mvdev);
cvq = >cvq;
if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)))
-   goto out;
+   return;
 
if (!cvq->ready)
-   goto out;
+   return;
 
while (true) {
err = vringh_getdesc_iotlb(>vring, >riov, >wiov, 
>head,
@@ -1653,9 +1654,10 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
 
if (vringh_need_notify_iotlb(>vring))
vringh_notify(>vring);
+
+   queue_work(mvdev->wq, >work);
+   break;
}
-out:
-   kfree(wqent);
 }
 
 static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)
@@ -1663,7 +1665,6 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, 
u16 idx)
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
struct mlx5_vdpa_virtqueue *mvq;
-   struct mlx5_vdpa_wq_ent *wqent;
 
if (!is_index_valid(mvdev, idx))
return;
@@ -1672,13 +1673,7 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, 
u16 idx)
if (!mvdev->cvq.ready)
return;
 
-   wqent = kzalloc(sizeof(*wqent), GFP_ATOMIC);
-   if (!wqent)
-   return;
-
-   wqent->mvdev = mvdev;
-   INIT_WORK(>work, mlx5_cvq_kick_handler);
-   queue_work(mvdev->wq, >work);
+   queue_work(mvdev->wq, >cvq_ent.work);
return;
}
 
@@ -2668,6 +2663,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
if (err)
goto err_mr;
 
+   ndev->cvq_ent.mvdev = mvdev;
+   INIT_WORK(>cvq_ent.work, mlx5_cvq_kick_handler);
mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_wq");
if (!mvdev->wq) {
err = -ENOMEM;
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-gpu: fix a missing check to avoid NULL dereference

2022-03-28 Thread Chia-I Wu
On Sat, Mar 26, 2022 at 10:09 PM Xiaomeng Tong  wrote:
>
> 'cache_ent' could be set NULL inside virtio_gpu_cmd_get_capset()
> and it will lead to a NULL dereference by a lately use of it
> (i.e., ptr = cache_ent->caps_cache). Fix it with a NULL check.
>
> Fixes: 62fb7a5e10962 ("virtio-gpu: add 3d/virgl support")
> Signed-off-by: Xiaomeng Tong 
Reviewed-by: Chia-I Wu 
> ---
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index c708bab555c6..b0f1c4d8fd23 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -579,8 +579,10 @@ static int virtio_gpu_get_caps_ioctl(struct drm_device 
> *dev,
> spin_unlock(>display_info_lock);
>
> /* not in cache - need to talk to hw */
> -   virtio_gpu_cmd_get_capset(vgdev, found_valid, args->cap_set_ver,
> +   ret = virtio_gpu_cmd_get_capset(vgdev, found_valid, args->cap_set_ver,
>   _ent);
> +   if (ret)
> +   return ret;
> virtio_gpu_notify(vgdev);
>
>  copy_exit:
>
> base-commit: f443e374ae131c168a065ea1748feac6b2e76613
> --
> 2.17.1
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] net/mlx5: Add support for configuring max device MTU

2022-03-28 Thread Michael S. Tsirkin
On Mon, Feb 21, 2022 at 02:19:27PM +0200, Eli Cohen wrote:
> Allow an admin creating a vdpa device to specify the max MTU for the
> net device.
> 
> For example, to create a device with max MTU of 1000, the following
> command can be used:
> 
> $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 mtu 1000
> 
> This configuration mechanism assumes that vdpa is the sole real user of
> the function. mlx5_core could theoretically change the mtu of the
> function using the ip command on the mlx5_core net device but this
> should not be done.
> 
> Reviewed-by: Si-Wei Liu

missing space before <

> Signed-off-by: Eli Cohen 
> ---
> v0 -> v1:
> Update changelog


can you rebase pls? does not seem to apply

>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 32 ++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 6156cf6e9377..be095dc1134e 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2689,6 +2689,28 @@ static int event_handler(struct notifier_block *nb, 
> unsigned long event, void *p
>   return ret;
>  }
>  
> +static int config_func_mtu(struct mlx5_core_dev *mdev, u16 mtu)
> +{
> + int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
> + void *in;
> + int err;
> +
> + in = kvzalloc(inlen, GFP_KERNEL);
> + if (!in)
> + return -ENOMEM;
> +
> + MLX5_SET(modify_nic_vport_context_in, in, field_select.mtu, 1);
> + MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.mtu,
> +  mtu + MLX5V_ETH_HARD_MTU);
> + MLX5_SET(modify_nic_vport_context_in, in, opcode,
> +  MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
> +
> + err = mlx5_cmd_exec_in(mdev, modify_nic_vport_context, in);
> +
> + kvfree(in);
> + return err;
> +}
> +
>  static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>const struct vdpa_dev_set_config *add_config)
>  {
> @@ -2749,6 +2771,13 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> *v_mdev, const char *name,
>   mutex_init(>reslock);
>   mutex_init(>numq_lock);
>   config = >config;
> +
> + if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) {
> + err = config_func_mtu(mdev, add_config->net.mtu);
> + if (err)
> + goto err_mtu;
> + }
> +
>   err = query_mtu(mdev, );
>   if (err)
>   goto err_mtu;
> @@ -2867,7 +2896,8 @@ static int mlx5v_probe(struct auxiliary_device *adev,
>   mgtdev->mgtdev.device = mdev->device;
>   mgtdev->mgtdev.id_table = id_table;
>   mgtdev->mgtdev.config_attr_mask = 
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR) |
> -   
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
> +   
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP) |
> +   BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU);
>   mgtdev->mgtdev.max_supported_vqs =
>   MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues) + 1;
>   mgtdev->mgtdev.supported_features = get_supported_features(mdev);
> -- 
> 2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virito-pci-modern: Remove useless DMA-32 fallback configuration

2022-03-28 Thread Michael S. Tsirkin
Typo in subj, and pls make subject match actual file name (_ not -).


On Fri, Mar 18, 2022 at 12:58:58AM +, cgel@gmail.com wrote:
> From: Minghao Chi 
> 
> As stated in [1], dma_set_mask() with a 64-bit mask will never fail if
> dev->dma_mask is non-NULL.
> So, if it fails, the 32 bits case will also fail for the same reason.
> 
> Simplify code and remove some dead code accordingly.
> 
> [1]: https://lkml.org/lkml/2021/6/7/398
> 
> Reported-by: Zeal Robot 
> Signed-off-by: Minghao Chi 
> ---
>  drivers/virtio/virtio_pci_modern_dev.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index e8b3ff2b9fbc..dff0b15a239d 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -255,9 +255,6 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
>   }
>  
>   err = dma_set_mask_and_coherent(_dev->dev, DMA_BIT_MASK(64));
> - if (err)
> - err = dma_set_mask_and_coherent(_dev->dev,
> - DMA_BIT_MASK(32));
>   if (err)
>   dev_warn(_dev->dev, "Failed to enable 64-bit or 32-bit DMA. 
>  Trying to continue, but this might not work.\n");
>  
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] dt-bindings: virtio: mmio: add optional wakeup-source property

2022-03-28 Thread Michael S. Tsirkin
On Fri, Mar 25, 2022 at 09:59:45AM +0800, Minghao Xue wrote:
> Some systems want to set the interrupt of virtio_mmio device
> as a wakeup source. On such systems, we'll use the existence
> of the "wakeup-source" property as a signal of requirement.
> 
> Signed-off-by: Minghao Xue 

I don't have enough of a clue about dt to review this.
Pls get some acks from people with DT expertise.

> ---
> v1 -> v2: rename property from "virtio,wakeup" to "wakeup-source"
> 
>  Documentation/devicetree/bindings/virtio/mmio.yaml | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml 
> b/Documentation/devicetree/bindings/virtio/mmio.yaml
> index 4b7a027..160b21b 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -31,6 +31,10 @@ properties:
>  description: Required for devices making accesses thru an IOMMU.
>  maxItems: 1
>  
> +  wakeup-source:
> +type: boolean
> +description: Required for setting irq of a virtio_mmio device as wakeup 
> source.
> +
>  required:
>- compatible
>- reg
> -- 
> 2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 4/4] drivers/net/virtio_net: Added RSS hash report control.

2022-03-28 Thread Andrew Melnychenko
Now it's possible to control supported hashflows.
Added hashflow set/get callbacks.
Also, disabling RXH_IP_SRC/DST for TCP would disable then for UDP.
TCP and UDP supports only:
ethtool -U eth0 rx-flow-hash tcp4 sd
RXH_IP_SRC + RXH_IP_DST
ethtool -U eth0 rx-flow-hash tcp4 sdfn
RXH_IP_SRC + RXH_IP_DST + RXH_L4_B_0_1 + RXH_L4_B_2_3
Disabling happens because VirtioNET hashtype for IP doesn't check L4 proto,
it works for all IP packets(TCP, UDP, ICMP, etc.).
For TCP and UDP, it's possible to set IP+PORT hashes.
But disabling IP hashes will disable them for TCP and UDP simultaneously.
It's possible to set IP+PORT for TCP/UDP and disable/enable IP
for everything else(UDP, ICMP, etc.).

Signed-off-by: Andrew Melnychenko 
---
 drivers/net/virtio_net.c | 141 ++-
 1 file changed, 140 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c9472c30e8a2..17eeb4f807e3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -231,6 +231,7 @@ struct virtnet_info {
u8 rss_key_size;
u16 rss_indir_table_size;
u32 rss_hash_types_supported;
+   u32 rss_hash_types_saved;
 
/* Has control virtqueue */
bool has_cvq;
@@ -2278,6 +2279,7 @@ static void virtnet_init_default_rss(struct virtnet_info 
*vi)
int i = 0;
 
vi->ctrl->rss.hash_types = vi->rss_hash_types_supported;
+   vi->rss_hash_types_saved = vi->rss_hash_types_supported;
vi->ctrl->rss.indirection_table_mask = vi->rss_indir_table_size
? vi->rss_indir_table_size - 1 
: 0;
vi->ctrl->rss.unclassified_queue = 0;
@@ -2293,6 +2295,121 @@ static void virtnet_init_default_rss(struct 
virtnet_info *vi)
netdev_rss_key_fill(vi->ctrl->rss.key, vi->rss_key_size);
 }
 
+static void virtnet_get_hashflow(const struct virtnet_info *vi, struct 
ethtool_rxnfc *info)
+{
+   info->data = 0;
+   switch (info->flow_type) {
+   case TCP_V4_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_TCPv4) {
+   info->data = RXH_IP_SRC | RXH_IP_DST |
+RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   } else if (vi->rss_hash_types_saved & 
VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+   }
+   break;
+   case TCP_V6_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_TCPv6) {
+   info->data = RXH_IP_SRC | RXH_IP_DST |
+RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   } else if (vi->rss_hash_types_saved & 
VIRTIO_NET_RSS_HASH_TYPE_IPv6) {
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+   }
+   break;
+   case UDP_V4_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_UDPv4) {
+   info->data = RXH_IP_SRC | RXH_IP_DST |
+RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   } else if (vi->rss_hash_types_saved & 
VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+   }
+   break;
+   case UDP_V6_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_UDPv6) {
+   info->data = RXH_IP_SRC | RXH_IP_DST |
+RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   } else if (vi->rss_hash_types_saved & 
VIRTIO_NET_RSS_HASH_TYPE_IPv6) {
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+   }
+   break;
+   case IPV4_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_IPv4)
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+
+   break;
+   case IPV6_FLOW:
+   if (vi->rss_hash_types_saved & VIRTIO_NET_RSS_HASH_TYPE_IPv6)
+   info->data = RXH_IP_SRC | RXH_IP_DST;
+
+   break;
+   default:
+   info->data = 0;
+   break;
+   }
+}
+
+static bool virtnet_set_hashflow(struct virtnet_info *vi, struct ethtool_rxnfc 
*info)
+{
+   u32 new_hashtypes = vi->rss_hash_types_saved;
+   bool is_disable = info->data & RXH_DISCARD;
+   bool is_l4 = info->data == (RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | 
RXH_L4_B_2_3);
+
+   /* supports only 'sd', 'sdfn' and 'r' */
+   if (!((info->data == (RXH_IP_SRC | RXH_IP_DST)) | is_l4 | is_disable))
+   return false;
+
+   switch (info->flow_type) {
+   case TCP_V4_FLOW:
+   new_hashtypes &= ~(VIRTIO_NET_RSS_HASH_TYPE_IPv4 | 
VIRTIO_NET_RSS_HASH_TYPE_TCPv4);
+   if (!is_disable)
+   new_hashtypes |= VIRTIO_NET_RSS_HASH_TYPE_IPv4
+   | (is_l4 ? 

[PATCH v5 3/4] drivers/net/virtio_net: Added RSS hash report.

2022-03-28 Thread Andrew Melnychenko
Added features for RSS hash report.
If hash is provided - it sets to skb.
Added checks if rss and/or hash are enabled together.

Signed-off-by: Andrew Melnychenko 
---
 drivers/net/virtio_net.c | 55 +++-
 1 file changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b5f2bb426a7b..c9472c30e8a2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -227,6 +227,7 @@ struct virtnet_info {
 
/* Host supports rss and/or hash report */
bool has_rss;
+   bool has_rss_hash_report;
u8 rss_key_size;
u16 rss_indir_table_size;
u32 rss_hash_types_supported;
@@ -1148,6 +1149,35 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
return NULL;
 }
 
+static void virtio_skb_set_hash(const struct virtio_net_hdr_v1_hash *hdr_hash,
+   struct sk_buff *skb)
+{
+   enum pkt_hash_types rss_hash_type;
+
+   if (!hdr_hash || !skb)
+   return;
+
+   switch ((int)hdr_hash->hash_report) {
+   case VIRTIO_NET_HASH_REPORT_TCPv4:
+   case VIRTIO_NET_HASH_REPORT_UDPv4:
+   case VIRTIO_NET_HASH_REPORT_TCPv6:
+   case VIRTIO_NET_HASH_REPORT_UDPv6:
+   case VIRTIO_NET_HASH_REPORT_TCPv6_EX:
+   case VIRTIO_NET_HASH_REPORT_UDPv6_EX:
+   rss_hash_type = PKT_HASH_TYPE_L4;
+   break;
+   case VIRTIO_NET_HASH_REPORT_IPv4:
+   case VIRTIO_NET_HASH_REPORT_IPv6:
+   case VIRTIO_NET_HASH_REPORT_IPv6_EX:
+   rss_hash_type = PKT_HASH_TYPE_L3;
+   break;
+   case VIRTIO_NET_HASH_REPORT_NONE:
+   default:
+   rss_hash_type = PKT_HASH_TYPE_NONE;
+   }
+   skb_set_hash(skb, (unsigned int)hdr_hash->hash_value, rss_hash_type);
+}
+
 static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
void *buf, unsigned int len, void **ctx,
unsigned int *xdp_xmit,
@@ -1182,6 +1212,8 @@ static void receive_buf(struct virtnet_info *vi, struct 
receive_queue *rq,
return;
 
hdr = skb_vnet_hdr(skb);
+   if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
+   virtio_skb_set_hash((const struct virtio_net_hdr_v1_hash *)hdr, 
skb);
 
if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -2232,7 +2264,8 @@ static bool virtnet_commit_rss_command(struct 
virtnet_info *vi)
sg_set_buf([3], vi->ctrl->rss.key, sg_buf_size);
 
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
- VIRTIO_NET_CTRL_MQ_RSS_CONFIG, sgs)) {
+ vi->has_rss ? VIRTIO_NET_CTRL_MQ_RSS_CONFIG
+ : VIRTIO_NET_CTRL_MQ_HASH_CONFIG, sgs)) {
dev_warn(>dev, "VIRTIONET issue with committing RSS 
sgs\n");
return false;
}
@@ -3231,6 +3264,8 @@ static bool virtnet_validate_features(struct 
virtio_device *vdev)
 VIRTNET_FAIL_ON(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR,
 "VIRTIO_NET_F_CTRL_VQ") ||
 VIRTNET_FAIL_ON(vdev, VIRTIO_NET_F_RSS,
+"VIRTIO_NET_F_CTRL_VQ") ||
+VIRTNET_FAIL_ON(vdev, VIRTIO_NET_F_HASH_REPORT,
 "VIRTIO_NET_F_CTRL_VQ"))) {
return false;
}
@@ -3366,8 +3401,13 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi->mergeable_rx_bufs = true;
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
+   vi->has_rss_hash_report = true;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
vi->has_rss = true;
+
+   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
@@ -3383,8 +3423,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 
dev->hw_features |= NETIF_F_RXHASH;
}
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
-   virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
+
+   if (vi->has_rss_hash_report)
+   vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
+   else if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
+virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
vi->hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
else
vi->hdr_len = sizeof(struct virtio_net_hdr);
@@ -3451,7 +3494,7 @@ static int virtnet_probe(struct virtio_device *vdev)
}
}
 
-   if (vi->has_rss)
+   if (vi->has_rss || 

[PATCH v5 2/4] drivers/net/virtio_net: Added basic RSS support.

2022-03-28 Thread Andrew Melnychenko
Added features for RSS.
Added initialization, RXHASH feature and ethtool ops.
By default RSS/RXHASH is disabled.
Virtio RSS "IPv6 extensions" hashes disabled.
Added ethtools ops to set key and indirection table.

Signed-off-by: Andrew Melnychenko 
---
 drivers/net/virtio_net.c | 192 +--
 1 file changed, 186 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b9ed7c55d9a0..b5f2bb426a7b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -169,6 +169,24 @@ struct receive_queue {
struct xdp_rxq_info xdp_rxq;
 };
 
+/* This structure can contain rss message with maximum settings for 
indirection table and keysize
+ * Note, that default structure that describes RSS configuration 
virtio_net_rss_config
+ * contains same info but can't handle table values.
+ * In any case, structure would be passed to virtio hw through sg_buf split by 
parts
+ * because table sizes may be differ according to the device configuration.
+ */
+#define VIRTIO_NET_RSS_MAX_KEY_SIZE 40
+#define VIRTIO_NET_RSS_MAX_TABLE_LEN128
+struct virtio_net_ctrl_rss {
+   u32 hash_types;
+   u16 indirection_table_mask;
+   u16 unclassified_queue;
+   u16 indirection_table[VIRTIO_NET_RSS_MAX_TABLE_LEN];
+   u16 max_tx_vq;
+   u8 hash_key_length;
+   u8 key[VIRTIO_NET_RSS_MAX_KEY_SIZE];
+};
+
 /* Control VQ buffers: protected by the rtnl lock */
 struct control_buf {
struct virtio_net_ctrl_hdr hdr;
@@ -178,6 +196,7 @@ struct control_buf {
u8 allmulti;
__virtio16 vid;
__virtio64 offloads;
+   struct virtio_net_ctrl_rss rss;
 };
 
 struct virtnet_info {
@@ -206,6 +225,12 @@ struct virtnet_info {
/* Host will merge rx buffers for big packets (shake it! shake it!) */
bool mergeable_rx_bufs;
 
+   /* Host supports rss and/or hash report */
+   bool has_rss;
+   u8 rss_key_size;
+   u16 rss_indir_table_size;
+   u32 rss_hash_types_supported;
+
/* Has control virtqueue */
bool has_cvq;
 
@@ -2184,6 +2209,57 @@ static void virtnet_get_ringparam(struct net_device *dev,
ring->tx_pending = ring->tx_max_pending;
 }
 
+static bool virtnet_commit_rss_command(struct virtnet_info *vi)
+{
+   struct net_device *dev = vi->dev;
+   struct scatterlist sgs[4];
+   unsigned int sg_buf_size;
+
+   /* prepare sgs */
+   sg_init_table(sgs, 4);
+
+   sg_buf_size = offsetof(struct virtio_net_ctrl_rss, indirection_table);
+   sg_set_buf([0], >ctrl->rss, sg_buf_size);
+
+   sg_buf_size = sizeof(uint16_t) * (vi->ctrl->rss.indirection_table_mask 
+ 1);
+   sg_set_buf([1], vi->ctrl->rss.indirection_table, sg_buf_size);
+
+   sg_buf_size = offsetof(struct virtio_net_ctrl_rss, key)
+   - offsetof(struct virtio_net_ctrl_rss, max_tx_vq);
+   sg_set_buf([2], >ctrl->rss.max_tx_vq, sg_buf_size);
+
+   sg_buf_size = vi->rss_key_size;
+   sg_set_buf([3], vi->ctrl->rss.key, sg_buf_size);
+
+   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
+ VIRTIO_NET_CTRL_MQ_RSS_CONFIG, sgs)) {
+   dev_warn(>dev, "VIRTIONET issue with committing RSS 
sgs\n");
+   return false;
+   }
+   return true;
+}
+
+static void virtnet_init_default_rss(struct virtnet_info *vi)
+{
+   u32 indir_val = 0;
+   int i = 0;
+
+   vi->ctrl->rss.hash_types = vi->rss_hash_types_supported;
+   vi->ctrl->rss.indirection_table_mask = vi->rss_indir_table_size
+   ? vi->rss_indir_table_size - 1 
: 0;
+   vi->ctrl->rss.unclassified_queue = 0;
+
+   for (; i < vi->rss_indir_table_size; ++i) {
+   indir_val = ethtool_rxfh_indir_default(i, vi->curr_queue_pairs);
+   vi->ctrl->rss.indirection_table[i] = indir_val;
+   }
+
+   vi->ctrl->rss.max_tx_vq = vi->curr_queue_pairs;
+   vi->ctrl->rss.hash_key_length = vi->rss_key_size;
+
+   netdev_rss_key_fill(vi->ctrl->rss.key, vi->rss_key_size);
+}
+
 
 static void virtnet_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
@@ -2412,6 +2488,71 @@ static void virtnet_update_settings(struct virtnet_info 
*vi)
vi->duplex = duplex;
 }
 
+static u32 virtnet_get_rxfh_key_size(struct net_device *dev)
+{
+   return ((struct virtnet_info *)netdev_priv(dev))->rss_key_size;
+}
+
+static u32 virtnet_get_rxfh_indir_size(struct net_device *dev)
+{
+   return ((struct virtnet_info *)netdev_priv(dev))->rss_indir_table_size;
+}
+
+static int virtnet_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, u8 
*hfunc)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   int i;
+
+   if (indir) {
+   for (i = 0; i < vi->rss_indir_table_size; ++i)
+   indir[i] = vi->ctrl->rss.indirection_table[i];
+   }
+
+   

[PATCH v5 1/4] drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.

2022-03-28 Thread Andrew Melnychenko
The header v1 provides additional info about RSS.
Added changes to computing proper header length.
In the next patches, the header may contain RSS hash info
for the hash population.

Signed-off-by: Andrew Melnychenko 
---
 drivers/net/virtio_net.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a801ea40908f..b9ed7c55d9a0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -242,13 +242,13 @@ struct virtnet_info {
 };
 
 struct padded_vnet_hdr {
-   struct virtio_net_hdr_mrg_rxbuf hdr;
+   struct virtio_net_hdr_v1_hash hdr;
/*
 * hdr is in a separate sg buffer, and data sg buffer shares same page
 * with this header sg. This padding makes next sg 16 byte aligned
 * after the header.
 */
-   char padding[4];
+   char padding[12];
 };
 
 static bool is_xdp_frame(void *ptr)
@@ -396,7 +396,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
hdr_len = vi->hdr_len;
if (vi->mergeable_rx_bufs)
-   hdr_padded_len = sizeof(*hdr);
+   hdr_padded_len = hdr_len;
else
hdr_padded_len = sizeof(struct padded_vnet_hdr);
 
@@ -1266,7 +1266,8 @@ static unsigned int get_mergeable_buf_len(struct 
receive_queue *rq,
  struct ewma_pkt_len *avg_pkt_len,
  unsigned int room)
 {
-   const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   struct virtnet_info *vi = rq->vq->vdev->priv;
+   const size_t hdr_len = vi->hdr_len;
unsigned int len;
 
if (room)
@@ -2851,7 +2852,7 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
  */
 static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct 
virtqueue *vq)
 {
-   const unsigned int hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   const unsigned int hdr_len = vi->hdr_len;
unsigned int rq_size = virtqueue_get_vring_size(vq);
unsigned int packet_len = vi->big_packets ? IP_MAX_MTU : 
vi->dev->max_mtu;
unsigned int buf_len = hdr_len + ETH_HLEN + VLAN_HLEN + packet_len;
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 0/4] RSS support for VirtioNet.

2022-03-28 Thread Andrew Melnychenko
Virtio-net supports "hardware" RSS with toeplitz key.
Also, it allows receiving calculated hash in vheader
that may be used with RPS.
Added ethtools callbacks to manipulate RSS.

Technically hash calculation may be set only for
SRC+DST and SRC+DST+PORTSRC+PORTDST hashflows.
The completely disabling hash calculation for TCP or UDP
would disable hash calculation for IP.

RSS/RXHASH is disabled by default.

Changes since v4:
* Refactored code.
* Fixed sparse warnings, there would be warnings for using
  restricted fields in vnet header structure.

Changes since v3:
* Moved hunks a bit.
* Added indirection table zero size check
  for hash report only feature.
* Added virtio_skb_set_hash() helper instead of in-place routine.

Changes since v2:
* Fixed issue with calculating padded header length.
  During review/tests, there was found an issue that
  will crash the kernel if VIRTIO_NET_F_MRG_RXBUF
  was not set. (thx to Jason Wang )
* Refactored the code according to review.

Changes since v1:
* Refactored virtnet_set_hashflow.
* Refactored virtio_net_ctrl_rss.
* Moved hunks between patches a bit.

Changes since rfc:
* Code refactored.
* Patches reformatted.
* Added feature validation.

Andrew (1):
  drivers/net/virtio_net: Added RSS hash report control.

Andrew Melnychenko (3):
  drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.
  drivers/net/virtio_net: Added basic RSS support.
  drivers/net/virtio_net: Added RSS hash report.

 drivers/net/virtio_net.c | 389 +--
 1 file changed, 376 insertions(+), 13 deletions(-)

-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 2/2] virtio-blk: support mq_ops->queue_rqs()

2022-03-28 Thread Stefan Hajnoczi
On Thu, Mar 24, 2022 at 11:04:50PM +0900, Suwan Kim wrote:
> @@ -367,6 +381,66 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx 
> *hctx,
>   return BLK_STS_OK;
>  }
>  
> +static bool virtblk_prep_rq_batch(struct virtio_blk_vq *vq, struct request 
> *req)
> +{
> + struct virtio_blk *vblk = req->mq_hctx->queue->queuedata;
> + struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
> + unsigned long flags;
> + int num, err;
> +
> + req->mq_hctx->tags->rqs[req->tag] = req;
> +
> + if (virtblk_prep_rq(req->mq_hctx, vblk, req, vbr, ) != BLK_STS_OK)
> + return false;
> +
> + spin_lock_irqsave(>lock, flags);
> + err = virtblk_add_req(vq->vq, vbr, vbr->sg_table.sgl, num);
> + if (err) {
> + spin_unlock_irqrestore(>lock, flags);
> + virtblk_unmap_data(req, vbr);
> + virtblk_cleanup_cmd(req);
> + return false;
> + }
> + spin_unlock_irqrestore(>lock, flags);

Simplification:

  spin_lock_irqsave(>lock, flags);
  err = virtblk_add_req(vq->vq, vbr, vbr->sg_table.sgl, num);
  spin_unlock_irqrestore(>lock, flags);
  if (err) {
  virtblk_unmap_data(req, vbr);
  virtblk_cleanup_cmd(req);
  return false;
  }

> +
> + return true;
> +}
> +
> +static void virtio_queue_rqs(struct request **rqlist)
> +{
> + struct request *req, *next, *prev = NULL;
> + struct request *requeue_list = NULL;
> +
> + rq_list_for_each_safe(rqlist, req, next) {
> + struct virtio_blk_vq *vq = req->mq_hctx->driver_data;
> + unsigned long flags;
> + bool kick;
> +
> + if (!virtblk_prep_rq_batch(vq, req)) {
> + rq_list_move(rqlist, _list, req, prev);
> + req = prev;
> +
> + if (!req)
> + continue;
> + }
> +
> + if (!next || req->mq_hctx != next->mq_hctx) {
> + spin_lock_irqsave(>lock, flags);

Did you try calling virtblk_add_req() here to avoid acquiring and
releasing the lock multiple times? In other words, do virtblk_prep_rq()
but wait until we get here to do virtblk_add_req().

I don't know if it has any measurable effect on performance or maybe the
code would become too complex, but I noticed that we're not fully
exploiting batching.

> + kick = virtqueue_kick_prepare(vq->vq);
> + spin_unlock_irqrestore(>lock, flags);
> + if (kick)
> + virtqueue_notify(vq->vq);
> +
> + req->rq_next = NULL;
> + *rqlist = next;
> + prev = NULL;
> + } else
> + prev = req;

What guarantees that req is still alive after we called
virtblk_add_req()? The device may have seen it and completed it already
by the time we get here.


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/2] virtio-blk: support polling I/O

2022-03-28 Thread Stefan Hajnoczi
On Thu, Mar 24, 2022 at 11:04:49PM +0900, Suwan Kim wrote:
> +static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch 
> *iob)
> +{
> + struct virtio_blk_vq *vq = hctx->driver_data;
> + struct virtblk_req *vbr;
> + unsigned long flags;
> + unsigned int len;
> + int found = 0;
> +
> + spin_lock_irqsave(>lock, flags);
> +
> + while ((vbr = virtqueue_get_buf(vq->vq, )) != NULL) {
> + struct request *req = blk_mq_rq_from_pdu(vbr);
> +
> + found++;
> + if (!blk_mq_add_to_batch(req, iob, vbr->status,
> + virtblk_complete_batch))
> + blk_mq_complete_request(req);
> + }
> +
> + spin_unlock_irqrestore(>lock, flags);

virtblk_done() does:

  /* In case queue is stopped waiting for more buffers. */
  if (req_done)
  blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);

Is the same thing needed here in virtblk_poll() so that stopped queues
are restarted when requests complete?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re:

2022-03-28 Thread Michael S. Tsirkin
On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
> On Mon, Mar 28, 2022 at 1:59 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Mar 28, 2022 at 12:56:41PM +0800, Jason Wang wrote:
> > > On Fri, Mar 25, 2022 at 6:10 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> > > > > On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > > > > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Bcc:
> > > > > > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > > > > > Message-ID: <20220325021422-mutt-send-email-...@kernel.org>
> > > > > > > > Reply-To:
> > > > > > > > In-Reply-To: 
> > > > > > > >
> > > > > > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > > > > > >
> > > > > > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > > > > > This is a rework on the previous IRQ hardening that is 
> > > > > > > > > > > done for
> > > > > > > > > > > virtio-pci where several drawbacks were found and were 
> > > > > > > > > > > reverted:
> > > > > > > > > > >
> > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to 
> > > > > > > > > > > affinity managed IRQ
> > > > > > > > > > > that is used by some device such as virtio-blk
> > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > >
> > > > > > > > > > > In this patch, we tries to borrow the idea from the INTX 
> > > > > > > > > > > IRQ hardening
> > > > > > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden 
> > > > > > > > > > > INTX interrupts")
> > > > > > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > > > > > virtio_reset_device()/virtio_device_ready(). A 
> > > > > > > > > > > synchornize_rcu() is
> > > > > > > > > > > used in virtio_reset_device() to synchronize with the IRQ 
> > > > > > > > > > > handlers. In
> > > > > > > > > > > the future, we may provide config_ops for the transport 
> > > > > > > > > > > that doesn't
> > > > > > > > > > > use IRQ. With this, vring_interrupt() can return check 
> > > > > > > > > > > and early if
> > > > > > > > > > > irq_soft_enabled is false. This lead to 
> > > > > > > > > > > smp_load_acquire() to be used
> > > > > > > > > > > but the cost should be acceptable.
> > > > > > > > > > Maybe it should be but is it? Can't we use synchronize_irq 
> > > > > > > > > > instead?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Even if we allow the transport driver to synchornize through
> > > > > > > > > synchronize_irq() we still need a check in the 
> > > > > > > > > vring_interrupt().
> > > > > > > > >
> > > > > > > > > We do something like the following previously:
> > > > > > > > >
> > > > > > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > > > > > return IRQ_NONE;
> > > > > > > > >
> > > > > > > > > But it looks like a bug since speculative read can be done 
> > > > > > > > > before the check
> > > > > > > > > where the interrupt handler can't see the uncommitted setup 
> > > > > > > > > which is done by
> > > > > > > > > the driver.
> > > > > > > >
> > > > > > > > I don't think so - if you sync after setting the value then
> > > > > > > > you are guaranteed that any handler running afterwards
> > > > > > > > will see the new value.
> > > > > > >
> > > > > > > The problem is not disabled but the enable.
> > > > > >
> > > > > > So a misbehaving device can lose interrupts? That's not a problem 
> > > > > > at all
> > > > > > imo.
> > > > >
> > > > > It's the interrupt raised before setting irq_soft_enabled to true:
> > > > >
> > > > > CPU 0 probe) driver specific setup (not commited)
> > > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > > CPU 0 probe) set irq_soft_enabled to true
> > > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > > CPU 1 IRQ handler) use the uninitialized variable
> > > > >
> > > > > Thanks
> > > >
> > > > Yea, it hurts if you do it.  So do not do it then ;).
> > > >
> > > > irq_soft_enabled (I think driver_ok or status is a better name)
> > >
> > > I can change it to driver_ok.
> > >
> > > > should be initialized to false *before* irq is requested.
> > > >
> > > > And requesting irq commits all memory otherwise all drivers would be
> > > > broken,
> > >
> > > So I think we might talk different issues:
> > >
> > > 1) Whether request_irq() commits the previous setups, I think the
> > > answer is yes, since the spin_unlock of desc->lock (release) can
> > > guarantee this though there seems no documentation around
> > > request_irq() to say this.
> > >
> > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> > > using 

Re: [PATCH] virtio_ring: remove unnecessary to_vvq call in vring hot path

2022-03-28 Thread Stefano Garzarella

On Thu, Mar 24, 2022 at 03:33:40PM +0800, Xianting Tian wrote:

It passes '_vq' to virtqueue_use_indirect(), which still calls
to_vvq to get 'vq', let's directly pass 'vq'. It can avoid
unnecessary call of to_vvq in hot path.


It seems reasonable to me.



Other tiny optimization:
Add unlikely to "if (vq->vq.num_free < descs_used).


Better to do this change in another patch.

Thanks,
Stefano



Signed-off-by: Xianting Tian 
---
drivers/virtio/virtio_ring.c | 10 --
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 962f1477b1fa..ab6d5f0cb579 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -205,11 +205,9 @@ struct vring_virtqueue {

#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)

-static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+static inline bool virtqueue_use_indirect(struct vring_virtqueue *vq,
  unsigned int total_sg)
{
-   struct vring_virtqueue *vq = to_vvq(_vq);
-
/*
 * If the host supports indirect descriptor tables, and we have multiple
 * buffers, then go indirect. FIXME: tune this threshold
@@ -507,7 +505,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,

head = vq->free_head;

-   if (virtqueue_use_indirect(_vq, total_sg))
+   if (virtqueue_use_indirect(vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -527,7 +525,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
descs_used = total_sg;
}

-   if (vq->vq.num_free < descs_used) {
+   if (unlikely(vq->vq.num_free < descs_used)) {
pr_debug("Can't add buf len %i - avail = %i\n",
 descs_used, vq->vq.num_free);
/* FIXME: for historical reasons, we force a notify here if
@@ -1194,7 +1192,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,

BUG_ON(total_sg == 0);

-   if (virtqueue_use_indirect(_vq, total_sg)) {
+   if (virtqueue_use_indirect(vq, total_sg)) {
err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
in_sgs, data, gfp);
if (err != -ENOMEM) {
--
2.17.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re:

2022-03-28 Thread Jason Wang
On Mon, Mar 28, 2022 at 1:59 PM Michael S. Tsirkin  wrote:
>
> On Mon, Mar 28, 2022 at 12:56:41PM +0800, Jason Wang wrote:
> > On Fri, Mar 25, 2022 at 6:10 PM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> > > > On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > > > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > Bcc:
> > > > > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > > > > Message-ID: <20220325021422-mutt-send-email-...@kernel.org>
> > > > > > > Reply-To:
> > > > > > > In-Reply-To: 
> > > > > > >
> > > > > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > > > > >
> > > > > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > > > > This is a rework on the previous IRQ hardening that is done 
> > > > > > > > > > for
> > > > > > > > > > virtio-pci where several drawbacks were found and were 
> > > > > > > > > > reverted:
> > > > > > > > > >
> > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to 
> > > > > > > > > > affinity managed IRQ
> > > > > > > > > > that is used by some device such as virtio-blk
> > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > >
> > > > > > > > > > In this patch, we tries to borrow the idea from the INTX 
> > > > > > > > > > IRQ hardening
> > > > > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden 
> > > > > > > > > > INTX interrupts")
> > > > > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > > > > virtio_reset_device()/virtio_device_ready(). A 
> > > > > > > > > > synchornize_rcu() is
> > > > > > > > > > used in virtio_reset_device() to synchronize with the IRQ 
> > > > > > > > > > handlers. In
> > > > > > > > > > the future, we may provide config_ops for the transport 
> > > > > > > > > > that doesn't
> > > > > > > > > > use IRQ. With this, vring_interrupt() can return check and 
> > > > > > > > > > early if
> > > > > > > > > > irq_soft_enabled is false. This lead to smp_load_acquire() 
> > > > > > > > > > to be used
> > > > > > > > > > but the cost should be acceptable.
> > > > > > > > > Maybe it should be but is it? Can't we use synchronize_irq 
> > > > > > > > > instead?
> > > > > > > >
> > > > > > > >
> > > > > > > > Even if we allow the transport driver to synchornize through
> > > > > > > > synchronize_irq() we still need a check in the 
> > > > > > > > vring_interrupt().
> > > > > > > >
> > > > > > > > We do something like the following previously:
> > > > > > > >
> > > > > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > > > > return IRQ_NONE;
> > > > > > > >
> > > > > > > > But it looks like a bug since speculative read can be done 
> > > > > > > > before the check
> > > > > > > > where the interrupt handler can't see the uncommitted setup 
> > > > > > > > which is done by
> > > > > > > > the driver.
> > > > > > >
> > > > > > > I don't think so - if you sync after setting the value then
> > > > > > > you are guaranteed that any handler running afterwards
> > > > > > > will see the new value.
> > > > > >
> > > > > > The problem is not disabled but the enable.
> > > > >
> > > > > So a misbehaving device can lose interrupts? That's not a problem at 
> > > > > all
> > > > > imo.
> > > >
> > > > It's the interrupt raised before setting irq_soft_enabled to true:
> > > >
> > > > CPU 0 probe) driver specific setup (not commited)
> > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > CPU 0 probe) set irq_soft_enabled to true
> > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > CPU 1 IRQ handler) use the uninitialized variable
> > > >
> > > > Thanks
> > >
> > > Yea, it hurts if you do it.  So do not do it then ;).
> > >
> > > irq_soft_enabled (I think driver_ok or status is a better name)
> >
> > I can change it to driver_ok.
> >
> > > should be initialized to false *before* irq is requested.
> > >
> > > And requesting irq commits all memory otherwise all drivers would be
> > > broken,
> >
> > So I think we might talk different issues:
> >
> > 1) Whether request_irq() commits the previous setups, I think the
> > answer is yes, since the spin_unlock of desc->lock (release) can
> > guarantee this though there seems no documentation around
> > request_irq() to say this.
> >
> > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> > using smp_wmb() before the request_irq().
> >
> > And even if write is ordered we still need read to be ordered to be
> > paired with that.
> >
> > > if it doesn't it just needs to be fixed, not worked around in
> > > virtio.
> >
> > 2) virtio drivers might do a lot of setups between