[GIT PULL] mlx5: last minute fixup

2022-05-18 Thread Michael S. Tsirkin
The following changes since commit 42226c989789d8da4af1de0c31070c96726d990c:

  Linux 5.18-rc7 (2022-05-15 18:08:58 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to acde3929492bcb9ceb0df1270230c422b1013798:

  vdpa/mlx5: Use consistent RQT size (2022-05-18 12:31:31 -0400)


mlx5: last minute fixup

The patch has been on list for a while but as it was posted as part of a
thread it was missed.

Signed-off-by: Michael S. Tsirkin 


Eli Cohen (1):
  vdpa/mlx5: Use consistent RQT size

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 61 ++-
 1 file changed, 21 insertions(+), 40 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost_net: fix double fget()

2022-05-17 Thread Michael S. Tsirkin
On Tue, May 17, 2022 at 10:00:03PM +, Al Viro wrote:
> On Mon, May 16, 2022 at 04:44:19AM -0400, Michael S. Tsirkin wrote:
> > > Signed-off-by: Al Viro 
> > > Signed-off-by: Jason Wang 
> > 
> > Acked-by: Michael S. Tsirkin 
> > 
> > and this is stable material I guess.
> 
> It is, except that commit message ought to be cleaned up.  Something
> along the lines of
> 
> 
> Fix double fget() in vhost_net_set_backend()
> 
> Descriptor table is a shared resource; two fget() on the same descriptor
> may return different struct file references.  get_tap_ptr_ring() is
> called after we'd found (and pinned) the socket we'll be using and it
> tries to find the private tun/tap data structures associated with it.
> Redoing the lookup by the same file descriptor we'd used to get the
> socket is racy - we need to same struct file.
> 
> Thanks to Jason for spotting a braino in the original variant of patch -
> I'd missed the use of fd == -1 for disabling backend, and in that case
> we can end up with sock == NULL and sock != oldsock.
> 
> 
> Does the above sound sane for commit message?  And which tree would you
> prefer it to go through?  I can take it in vfs.git#fixes, or you could
> take it into your tree...

Acked-by: Michael S. Tsirkin 
for the new message and merging through your tree.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

2022-05-16 Thread Michael S. Tsirkin
On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote:
> On Thu, 12 May 2022 11:31:08 +0800
> Jason Wang  wrote:
> 
> > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > > > do the synchronization.
> > > >
> > > > And we probably need to keep the
> > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > > > virtio_ccw_int_handler() to be called from process context (e.g from
> > > > the io_subchannel_quiesce()).
> > > >  
> > >
> > > Sounds correct.  
> > 
> > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
> > is called with irq disabled.
> > 
> > So I will use spin_lock()/spin_unlock() in the next version.
> 
> Can we do some sort of an assertion that if the kernel is built with
> the corresponding debug features will make sure this assumption holds
> (and warn if it does not)? That assertion would also document the fact.

Lockdep will do this automatically if you get it wrong, just like it
did here.

> If an assertion is not possible, I think we should at least place a
> strategic comment that documents our assumption.

That can't hurt.

> Regards,
> Halil
> 
> > 
> > Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device

2022-05-16 Thread Michael S. Tsirkin
On Mon, May 16, 2022 at 06:31:18PM +0800, Yongji Xie wrote:
> On Mon, May 16, 2022 at 5:54 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, May 16, 2022 at 05:31:27PM +0800, Yongji Xie wrote:
> > > On Mon, May 16, 2022 at 5:14 PM Jason Wang  wrote:
> > > >
> > > >
> > > > 在 2022/5/16 14:03, Xie Yongji 写道:
> > > > > Introduce a device object for vdpa management device to control
> > > > > its lifecycle. And the device name will be used to match
> > > > > VDPA_ATTR_MGMTDEV_DEV_NAME field of netlink message rather than
> > > > > using parent device name.
> > > > >
> > > > > With this patch applied, drivers should use vdpa_mgmtdev_alloc()
> > > > > or _vdpa_mgmtdev_alloc() to allocate a vDPA management device
> > > > > before calling vdpa_mgmtdev_register(). And some buggy empty
> > > > > release function can also be removed from the driver codes.
> > > > >
> > > > > Signed-off-by: Xie Yongji 
> > > > > ---
> > > > >   drivers/vdpa/ifcvf/ifcvf_main.c  | 11 ++--
> > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c| 11 ++--
> > > > >   drivers/vdpa/vdpa.c  | 92 
> > > > > 
> > > > >   drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 39 
> > > > >   drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 46 +-
> > > > >   drivers/vdpa/vdpa_user/vduse_dev.c   | 38 
> > > > >   include/linux/vdpa.h | 38 +++-
> > > > >   7 files changed, 168 insertions(+), 107 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c 
> > > > > b/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > index 4366320fb68d..d4087c37cfdf 100644
> > > > > --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > @@ -821,10 +821,11 @@ static int ifcvf_probe(struct pci_dev *pdev, 
> > > > > const struct pci_device_id *id)
> > > > >   u32 dev_type;
> > > > >   int ret;
> > > > >
> > > > > - ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), 
> > > > > GFP_KERNEL);
> > > > > - if (!ifcvf_mgmt_dev) {
> > > > > + ifcvf_mgmt_dev = vdpa_mgmtdev_alloc(struct ifcvf_vdpa_mgmt_dev,
> > > > > + mdev, dev_name(dev), dev);
> > > >
> > > >
> > > > Just wonder if it's better to make vDPA device a child of the mgmt
> > > > device instead of the PCI device?
> > > >
> > > > (Currently we use PCI device as the parent of the vDPA device, or at
> > > > least we can do this for the simulator which doesn't have a parent?)
> > > >
> > >
> > > Make sense. I think we can do it for all vDPA drivers. Make sure the
> > > parent of the vDPA device is the vDPA management device.
> > >
> > > Thanks,
> > > Yongji
> >
> >
> > that's an ABI change though isn't it? parent is exposed in sysfs,
> > right?
> >
> 
> Hmm...yes. So it looks like we can't change it, right?
> 
> Thanks,
> Yongji

Afraid so. a way to find the pci device already exists I think, right?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device

2022-05-16 Thread Michael S. Tsirkin
On Mon, May 16, 2022 at 05:31:27PM +0800, Yongji Xie wrote:
> On Mon, May 16, 2022 at 5:14 PM Jason Wang  wrote:
> >
> >
> > 在 2022/5/16 14:03, Xie Yongji 写道:
> > > Introduce a device object for vdpa management device to control
> > > its lifecycle. And the device name will be used to match
> > > VDPA_ATTR_MGMTDEV_DEV_NAME field of netlink message rather than
> > > using parent device name.
> > >
> > > With this patch applied, drivers should use vdpa_mgmtdev_alloc()
> > > or _vdpa_mgmtdev_alloc() to allocate a vDPA management device
> > > before calling vdpa_mgmtdev_register(). And some buggy empty
> > > release function can also be removed from the driver codes.
> > >
> > > Signed-off-by: Xie Yongji 
> > > ---
> > >   drivers/vdpa/ifcvf/ifcvf_main.c  | 11 ++--
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c| 11 ++--
> > >   drivers/vdpa/vdpa.c  | 92 
> > >   drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 39 
> > >   drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 46 +-
> > >   drivers/vdpa/vdpa_user/vduse_dev.c   | 38 
> > >   include/linux/vdpa.h | 38 +++-
> > >   7 files changed, 168 insertions(+), 107 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c 
> > > b/drivers/vdpa/ifcvf/ifcvf_main.c
> > > index 4366320fb68d..d4087c37cfdf 100644
> > > --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> > > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> > > @@ -821,10 +821,11 @@ static int ifcvf_probe(struct pci_dev *pdev, const 
> > > struct pci_device_id *id)
> > >   u32 dev_type;
> > >   int ret;
> > >
> > > - ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), 
> > > GFP_KERNEL);
> > > - if (!ifcvf_mgmt_dev) {
> > > + ifcvf_mgmt_dev = vdpa_mgmtdev_alloc(struct ifcvf_vdpa_mgmt_dev,
> > > + mdev, dev_name(dev), dev);
> >
> >
> > Just wonder if it's better to make vDPA device a child of the mgmt
> > device instead of the PCI device?
> >
> > (Currently we use PCI device as the parent of the vDPA device, or at
> > least we can do this for the simulator which doesn't have a parent?)
> >
> 
> Make sense. I think we can do it for all vDPA drivers. Make sure the
> parent of the vDPA device is the vDPA management device.
> 
> Thanks,
> Yongji


that's an ABI change though isn't it? parent is exposed in sysfs,
right?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa/mlx5: Use consistent RQT size

2022-05-16 Thread Michael S. Tsirkin
On Mon, May 16, 2022 at 11:47:35AM +0300, Eli Cohen wrote:
> The current code evaluates RQT size based on the configured number of
> virtqueues. This can raise an issue in the following scenario:
> 
> Assume MQ was negotiated.
> 1. mlx5_vdpa_set_map() gets called.
> 2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
>than the configured max VQs.
> 3. A second set_map gets called, but now a smaller number of VQs is used
>to evaluate the size of the RQT.
> 4. handle_ctrl_mq() is called with a value larger than what the RQT can
>hold. This will emit errors and the driver state is compromised.
> 
> To fix this, we use a new field in struct mlx5_vdpa_net to hold the
> required number of entries in the RQT. This value is evaluated in
> mlx5_vdpa_set_driver_features() where we have the negotiated features
> all set up.
> 
> In addition to that, we take into consideration the max capability of RQT
> entries early when the device is added so we don't need to take consider
> it when creating the RQT.
> 
> Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
> max_vas / 2 and make the code clearer.
> 
> Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
> Acked-by: Jason Wang 
> Signed-off-by: Eli Cohen 

I picked this up. Thanks!

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 61 +++
>  1 file changed, 21 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 79001301b383..e0de44000d92 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -161,6 +161,7 @@ struct mlx5_vdpa_net {
>   struct mlx5_flow_handle *rx_rule_mcast;
>   bool setup;
>   u32 cur_num_vqs;
> + u32 rqt_size;
>   struct notifier_block nb;
>   struct vdpa_callback config_cb;
>   struct mlx5_vdpa_wq_ent cvq_ent;
> @@ -204,17 +205,12 @@ static __virtio16 cpu_to_mlx5vdpa16(struct 
> mlx5_vdpa_dev *mvdev, u16 val)
>   return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), val);
>  }
>  
> -static inline u32 mlx5_vdpa_max_qps(int max_vqs)
> -{
> - return max_vqs / 2;
> -}
> -
>  static u16 ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev)
>  {
>   if (!(mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
>   return 2;
>  
> - return 2 * mlx5_vdpa_max_qps(mvdev->max_vqs);
> + return mvdev->max_vqs;
>  }
>  
>  static bool is_ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev, u16 idx)
> @@ -1236,25 +1232,13 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtqueue *
>  static int create_rqt(struct mlx5_vdpa_net *ndev)
>  {
>   __be32 *list;
> - int max_rqt;
>   void *rqtc;
>   int inlen;
>   void *in;
>   int i, j;
>   int err;
> - int num;
> -
> - if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
> - num = 1;
> - else
> - num = ndev->cur_num_vqs / 2;
>  
> - max_rqt = min_t(int, roundup_pow_of_two(num),
> - 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
> - if (max_rqt < 1)
> - return -EOPNOTSUPP;
> -
> - inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + max_rqt * 
> MLX5_ST_SZ_BYTES(rq_num);
> + inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + ndev->rqt_size * 
> MLX5_ST_SZ_BYTES(rq_num);
>   in = kzalloc(inlen, GFP_KERNEL);
>   if (!in)
>   return -ENOMEM;
> @@ -1263,12 +1247,12 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
>   rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
>  
>   MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
> - MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt);
> + MLX5_SET(rqtc, rqtc, rqt_max_size, ndev->rqt_size);
>   list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
> - for (i = 0, j = 0; i < max_rqt; i++, j += 2)
> - list[i] = cpu_to_be32(ndev->vqs[j % (2 * num)].virtq_id);
> + for (i = 0, j = 0; i < ndev->rqt_size; i++, j += 2)
> + list[i] = cpu_to_be32(ndev->vqs[j % 
> ndev->cur_num_vqs].virtq_id);
>  
> - MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt);
> + MLX5_SET(rqtc, rqtc, rqt_actual_size, ndev->rqt_size);
>   err = mlx5_vdpa_create_rqt(>mvdev, in, inlen, >res.rqtn);
>   kfree(in);
>   if (err)
> @@ -1282,19 +1266,13 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
>  static int modify_rqt(struct mlx5_vdpa_net *ndev, int num)
>  {
>   __be32 *list;
> - int max_rqt;
>   void *rqtc;
>   int inlen;
>   void *in;
>   int i, j;
>   int err;
>  
> - max_rqt = min_t(int, roundup_pow_of_two(ndev->cur_num_vqs / 2),
> - 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
> - if (max_rqt < 1)
> - return -EOPNOTSUPP;
> -
> - inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + max_rqt * 
> MLX5_ST_SZ_BYTES(rq_num);
> + inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + 

Re: [GIT PULL] virtio: last minute fixup

2022-05-16 Thread Michael S. Tsirkin
On Thu, May 12, 2022 at 10:10:34AM -0700, Linus Torvalds wrote:
> On Thu, May 12, 2022 at 6:30 AM Michael Ellerman  wrote:
> >
> > Links to other random places don't serve that function.
> 
> What "function"?
> 
> This is my argument. Those Link: things need to have a *reason*.
> 
> Saying "they are a change ID" is not a reason. That's just a random
> word-salad. You need to have an active reason that you can explain,
> not just say "look, I want to add a message ID to every commit".

So I want to go to my inbox and compare the patch as received with what
is in my tree.  What did I change? And I tweak both the patch content
when applying and the subject so these are not good indicators.  Is this
at all convincing?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost_net: fix double fget()

2022-05-16 Thread Michael S. Tsirkin
On Mon, May 16, 2022 at 04:42:13PM +0800, Jason Wang wrote:
> From: Al Viro 
> 
> Here's another piece of code assuming that repeated fget() will yield the
> same opened file: in vhost_net_set_backend() we have
> 
> sock = get_socket(fd);
> if (IS_ERR(sock)) {
> r = PTR_ERR(sock);
> goto err_vq;
> }
> 
> /* start polling new socket */
> oldsock = vhost_vq_get_backend(vq);
> if (sock != oldsock) {
> ...
> vhost_vq_set_backend(vq, sock);
> ...
> if (index == VHOST_NET_VQ_RX)
> nvq->rx_ring = get_tap_ptr_ring(fd);
> 
> with
> static struct socket *get_socket(int fd)
> {
> struct socket *sock;
> 
> /* special case to disable backend */
> if (fd == -1)
> return NULL;
> sock = get_raw_socket(fd);
> if (!IS_ERR(sock))
> return sock;
> sock = get_tap_socket(fd);
> if (!IS_ERR(sock))
> return sock;
> return ERR_PTR(-ENOTSOCK);
> }
> and
> static struct ptr_ring *get_tap_ptr_ring(int fd)
> {
> struct ptr_ring *ring;
> struct file *file = fget(fd);
> 
> if (!file)
> return NULL;
> ring = tun_get_tx_ring(file);
> if (!IS_ERR(ring))
> goto out;
> ring = tap_get_ptr_ring(file);
> if (!IS_ERR(ring))
> goto out;
> ring = NULL;
> out:
> fput(file);
> return ring;
> }
> 
> Again, there is no promise that fd will resolve to the same thing for
> lookups in get_socket() and in get_tap_ptr_ring().  I'm not familiar
> enough with the guts of drivers/vhost to tell how easy it is to turn
> into attack, but it looks like trouble.  If nothing else, the pointer
> returned by tun_get_tx_ring() is not guaranteed to be pinned down by
> anything - the reference to sock will _usually_ suffice, but that
> doesn't help any if we get a different socket on that second fget().
> 
> One possible way to fix it would be the patch below; objections?
> 
> Signed-off-by: Al Viro 
> Signed-off-by: Jason Wang 

Acked-by: Michael S. Tsirkin 

and this is stable material I guess.

> ---
>  drivers/vhost/net.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 28ef323882fb..0bd7d91de792 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1449,13 +1449,9 @@ static struct socket *get_raw_socket(int fd)
>   return ERR_PTR(r);
>  }
>  
> -static struct ptr_ring *get_tap_ptr_ring(int fd)
> +static struct ptr_ring *get_tap_ptr_ring(struct file *file)
>  {
>   struct ptr_ring *ring;
> - struct file *file = fget(fd);
> -
> - if (!file)
> - return NULL;
>   ring = tun_get_tx_ring(file);
>   if (!IS_ERR(ring))
>   goto out;
> @@ -1464,7 +1460,6 @@ static struct ptr_ring *get_tap_ptr_ring(int fd)
>   goto out;
>   ring = NULL;
>  out:
> - fput(file);
>   return ring;
>  }
>  
> @@ -1551,8 +1546,12 @@ static long vhost_net_set_backend(struct vhost_net *n, 
> unsigned index, int fd)
>   r = vhost_net_enable_vq(n, vq);
>   if (r)
>   goto err_used;
> - if (index == VHOST_NET_VQ_RX)
> - nvq->rx_ring = get_tap_ptr_ring(fd);
> + if (index == VHOST_NET_VQ_RX) {
> + if (sock)
> + nvq->rx_ring = get_tap_ptr_ring(sock->file);
> + else
> + nvq->rx_ring = NULL;
> + }
>  
>   oldubufs = nvq->ubufs;
>   nvq->ubufs = ubufs;
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/3] vduse: Pass management device pointer to vduse_dev_init_vdpa()

2022-05-16 Thread Michael S. Tsirkin
On Mon, May 16, 2022 at 02:03:40PM +0800, Xie Yongji wrote:
> Pass management device pointer from vdpa_dev_add() to
> vduse_dev_init_vdpa() rather than using the static
> variable directly.
> 
> No functional change.
> 
> Signed-off-by: Xie Yongji 

Could you pls add a cover letter explaining what is the patchset
trying to achieve? I think I can guess but I'd rather not guess ...


> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 160e40d03084..d3bf55a58cd2 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -1484,9 +1484,8 @@ static struct device vduse_mgmtdev = {
>   .release = vduse_mgmtdev_release,
>  };
>  
> -static struct vdpa_mgmt_dev mgmt_dev;
> -
> -static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> +static int vduse_dev_init_vdpa(struct vduse_dev *dev,
> +struct vdpa_mgmt_dev *mdev, const char *name)
>  {
>   struct vduse_vdpa *vdev;
>   int ret;
> @@ -1509,7 +1508,7 @@ static int vduse_dev_init_vdpa(struct vduse_dev *dev, 
> const char *name)
>   }
>   set_dma_ops(>vdpa.dev, _dev_dma_ops);
>   vdev->vdpa.dma_dev = >vdpa.dev;
> - vdev->vdpa.mdev = _dev;
> + vdev->vdpa.mdev = mdev;
>  
>   return 0;
>  }
> @@ -1526,7 +1525,7 @@ static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, 
> const char *name,
>   mutex_unlock(_lock);
>   return -EINVAL;
>   }
> - ret = vduse_dev_init_vdpa(dev, name);
> + ret = vduse_dev_init_vdpa(dev, mdev, name);
>   mutex_unlock(_lock);
>   if (ret)
>   return ret;
> -- 
> 2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device

2022-05-16 Thread Michael S. Tsirkin
On Mon, May 16, 2022 at 02:03:41PM +0800, Xie Yongji wrote:
> Introduce a device object for vdpa management device to control
> its lifecycle. And the device name will be used to match
> VDPA_ATTR_MGMTDEV_DEV_NAME field of netlink message rather than
> using parent device name.
> 
> With this patch applied, drivers should use vdpa_mgmtdev_alloc()
> or _vdpa_mgmtdev_alloc() to allocate a vDPA management device
> before calling vdpa_mgmtdev_register(). And some buggy empty
> release function can also be removed from the driver codes.
> 
> Signed-off-by: Xie Yongji 
> ---
>  drivers/vdpa/ifcvf/ifcvf_main.c  | 11 ++--
>  drivers/vdpa/mlx5/net/mlx5_vnet.c| 11 ++--
>  drivers/vdpa/vdpa.c  | 92 
>  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 39 
>  drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 46 +-
>  drivers/vdpa/vdpa_user/vduse_dev.c   | 38 
>  include/linux/vdpa.h | 38 +++-
>  7 files changed, 168 insertions(+), 107 deletions(-)
> 
> diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
> index 4366320fb68d..d4087c37cfdf 100644
> --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> @@ -821,10 +821,11 @@ static int ifcvf_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   u32 dev_type;
>   int ret;
>  
> - ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), 
> GFP_KERNEL);
> - if (!ifcvf_mgmt_dev) {
> + ifcvf_mgmt_dev = vdpa_mgmtdev_alloc(struct ifcvf_vdpa_mgmt_dev,
> + mdev, dev_name(dev), dev);
> + if (IS_ERR(ifcvf_mgmt_dev)) {
>   IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management 
> device\n");
> - return -ENOMEM;
> + return PTR_ERR(ifcvf_mgmt_dev);
>   }
>  
>   dev_type = get_dev_type(pdev);
> @@ -842,7 +843,6 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>   }
>  
>   ifcvf_mgmt_dev->mdev.ops = _vdpa_mgmt_dev_ops;
> - ifcvf_mgmt_dev->mdev.device = dev;
>   ifcvf_mgmt_dev->pdev = pdev;
>  
>   ret = pcim_enable_device(pdev);
> @@ -883,7 +883,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>   return 0;
>  
>  err:
> - kfree(ifcvf_mgmt_dev);
> + put_device(_mgmt_dev->mdev.device);
>   return ret;
>  }
>  
> @@ -893,7 +893,6 @@ static void ifcvf_remove(struct pci_dev *pdev)
>  
>   ifcvf_mgmt_dev = pci_get_drvdata(pdev);
>   vdpa_mgmtdev_unregister(_mgmt_dev->mdev);
> - kfree(ifcvf_mgmt_dev);
>  }
>  
>  static struct pci_device_id ifcvf_pci_ids[] = {
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 79001301b383..3a88609dcf13 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2828,12 +2828,12 @@ static int mlx5v_probe(struct auxiliary_device *adev,
>   struct mlx5_vdpa_mgmtdev *mgtdev;
>   int err;
>  
> - mgtdev = kzalloc(sizeof(*mgtdev), GFP_KERNEL);
> - if (!mgtdev)
> - return -ENOMEM;
> + mgtdev = vdpa_mgmtdev_alloc(struct mlx5_vdpa_mgmtdev, mgtdev,
> + dev_name(mdev->device), mdev->device);
> + if (IS_ERR(mgtdev))
> + return PTR_ERR(mgtdev);
>  
>   mgtdev->mgtdev.ops = _ops;
> - mgtdev->mgtdev.device = mdev->device;
>   mgtdev->mgtdev.id_table = id_table;
>   mgtdev->mgtdev.config_attr_mask = 
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR) |
> 
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP) |
> @@ -2852,7 +2852,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
>   return 0;
>  
>  reg_err:
> - kfree(mgtdev);
> + put_device(>mgtdev.device);
>   return err;
>  }
>  
> @@ -2862,7 +2862,6 @@ static void mlx5v_remove(struct auxiliary_device *adev)
>  
>   mgtdev = auxiliary_get_drvdata(adev);
>   vdpa_mgmtdev_unregister(>mgtdev);
> - kfree(mgtdev);
>  }
>  
>  static const struct auxiliary_device_id mlx5v_id_table[] = {
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index 2b75c00b1005..3d3f98dd2bb6 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -337,21 +337,80 @@ void vdpa_unregister_driver(struct vdpa_driver *drv)
>  }
>  EXPORT_SYMBOL_GPL(vdpa_unregister_driver);
>  
> +static inline struct vdpa_mgmt_dev *to_vdpa_mgmt_dev(struct device *dev)
> +{
> + return container_of(dev, struct vdpa_mgmt_dev, device);
> +}
> +
> +static void vdpa_mgmtdev_release(struct device *dev)
> +{
> + kfree(to_vdpa_mgmt_dev(dev));
> +}
> +
>  /**
> - * vdpa_mgmtdev_register - register a vdpa management device
> + * __vdpa_mgmtdev_alloc - allocate and initilaize a vDPA management device

initialize


> + * @name: name of the vdpa management device
> + * @parent: the parent device; optional
> + * @size: size of the data 

Re: [PATCH RESEND V3 3/3] vdpa/mlx5: Use consistent RQT size

2022-05-16 Thread Michael S. Tsirkin
On Mon, May 16, 2022 at 08:17:18AM +, Eli Cohen wrote:
> Hi Michael,
> 
> When are you going to pull this fix?
> It fixes a real problem and was reviewed and acked.

Do I understand it correctly that this is a stand-alone patch?
Sorry, my process have been thrown off by it being labeled 3/3 but not
being part of a thread. Do not do this for single patches please.
And I suspect 0-day machinery didn't process it either.
Can you repost as a stand-along patch please?
I will then process ASAP.

Thanks!

> > -Original Message-
> > From: Eli Cohen 
> > Sent: Wednesday, April 6, 2022 11:53 AM
> > To: m...@redhat.com; jasow...@redhat.com
> > Cc: hdan...@sina.com; virtualization@lists.linux-foundation.org; 
> > linux-ker...@vger.kernel.org; Eli Cohen 
> > Subject: [PATCH RESEND V3 3/3] vdpa/mlx5: Use consistent RQT size
> > 
> > The current code evaluates RQT size based on the configured number of
> > virtqueues. This can raise an issue in the following scenario:
> > 
> > Assume MQ was negotiated.
> > 1. mlx5_vdpa_set_map() gets called.
> > 2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
> >than the configured max VQs.
> > 3. A second set_map gets called, but now a smaller number of VQs is used
> >to evaluate the size of the RQT.
> > 4. handle_ctrl_mq() is called with a value larger than what the RQT can
> >hold. This will emit errors and the driver state is compromised.
> > 
> > To fix this, we use a new field in struct mlx5_vdpa_net to hold the
> > required number of entries in the RQT. This value is evaluated in
> > mlx5_vdpa_set_driver_features() where we have the negotiated features
> > all set up.
> > 
> > In addition to that, we take into consideration the max capability of RQT
> > entries early when the device is added so we don't need to take consider
> > it when creating the RQT.
> > 
> > Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
> > max_vas / 2 and make the code clearer.
> > 
> > Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
> > Acked-by: Jason Wang 
> > Signed-off-by: Eli Cohen 
> > ---
> > V2 -> V3:
> > Fix typo in change log
> > Add acked-by Jason
> > 
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 61 +++
> >  1 file changed, 21 insertions(+), 40 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 79001301b383..e0de44000d92 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -161,6 +161,7 @@ struct mlx5_vdpa_net {
> > struct mlx5_flow_handle *rx_rule_mcast;
> > bool setup;
> > u32 cur_num_vqs;
> > +   u32 rqt_size;
> > struct notifier_block nb;
> > struct vdpa_callback config_cb;
> > struct mlx5_vdpa_wq_ent cvq_ent;
> > @@ -204,17 +205,12 @@ static __virtio16 cpu_to_mlx5vdpa16(struct 
> > mlx5_vdpa_dev *mvdev, u16 val)
> > return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), val);
> >  }
> > 
> > -static inline u32 mlx5_vdpa_max_qps(int max_vqs)
> > -{
> > -   return max_vqs / 2;
> > -}
> > -
> >  static u16 ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev)
> >  {
> > if (!(mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
> > return 2;
> > 
> > -   return 2 * mlx5_vdpa_max_qps(mvdev->max_vqs);
> > +   return mvdev->max_vqs;
> >  }
> > 
> >  static bool is_ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev, u16 idx)
> > @@ -1236,25 +1232,13 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, 
> > struct mlx5_vdpa_virtqueue *
> >  static int create_rqt(struct mlx5_vdpa_net *ndev)
> >  {
> > __be32 *list;
> > -   int max_rqt;
> > void *rqtc;
> > int inlen;
> > void *in;
> > int i, j;
> > int err;
> > -   int num;
> > -
> > -   if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
> > -   num = 1;
> > -   else
> > -   num = ndev->cur_num_vqs / 2;
> > 
> > -   max_rqt = min_t(int, roundup_pow_of_two(num),
> > -   1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
> > -   if (max_rqt < 1)
> > -   return -EOPNOTSUPP;
> > -
> > -   inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + max_rqt * 
> > MLX5_ST_SZ_BYTES(rq_num);
> > +   inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + ndev->rqt_size * 
> > MLX5_ST_SZ_BYTES(rq_num);
> > in = kzalloc(inlen, GFP_KERNEL);
> > if (!in)
> > return -ENOMEM;
> > @@ -1263,12 +1247,12 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
> > rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
> > 
> > MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
> > -   MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt);
> > +   MLX5_SET(rqtc, rqtc, rqt_max_size, ndev->rqt_size);
> > list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
> > -   for (i = 0, j = 0; i < max_rqt; i++, j += 2)
> > -   list[i] = cpu_to_be32(ndev->vqs[j % (2 * num)].virtq_id);
> > +   for (i = 0, j = 0; i < ndev->rqt_size; i++, j += 2)
> > +

Re: [GIT PULL] virtio: last minute fixup

2022-05-13 Thread Michael S. Tsirkin
On Wed, May 11, 2022 at 02:24:23PM +0200, Jörg Rödel wrote:
> On Tue, May 10, 2022 at 11:23:11AM -0700, Linus Torvalds wrote:
> > And - once again - I want to complain about the "Link:" in that commit.
> 
> I have to say that for me (probably for others as well) those Link tags
> pointing to the patch submission have quite some value:
> 
>   1) First of all it is an easy proof that the patch was actually
>  submitted somewhere for public review before it went into a
>  maintainers tree.
> 
>   2) The patch submission is often the entry point to the
>  discussion which lead to this patch. From that email I can
>  see what was discussed and often there is even a link to
>  previous versions and the discussions that happened there. It
>  helps to better understand how a patch came to be the way it
>  is. I know this should ideally be part of the commit message,
>  but in reality this is what I also use the link tag for.
> 
>   3) When backporting a patch to a downstream kernel it often
>  helps a lot to see the whole patch-set the change was
>  submitted in, especially when it comes to fixes. With the
>  Link: tag the whole submission thread is easy to find.
> 
> I can stop adding them to patches if you want, but as I said, I think
> there is some value in them which make me want to keep them.
> 
> Regards,
> 
>   Joerg

Yea, me too ... Linus, will it be less problematic if it's a different
tag, other than Link? What if it's Message-Id: ? Still a
problem?


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 0/9] Introduce akcipher service for virtio-crypto

2022-05-13 Thread Michael S. Tsirkin
On Fri, May 13, 2022 at 06:19:10AM -0400, Michael S. Tsirkin wrote:
> On Thu, Apr 28, 2022 at 09:59:34PM +0800, zhenwei pi wrote:
> > Hi, Lei & MST
> > 
> > Daniel has started to review the akcipher framework and nettle & gcrypt
> > implementation, this part seems to be ready soon. Thanks a lot to Daniel!
> > 
> > And the last patch "crypto: Introduce RSA algorithm" handles akcipher
> > requests from guest and uses the new akcipher service. The new feature
> > can be used to test by the builtin driver. I would appreciate it if you
> > could review patch.
> 
> 
> I applied the first 6 patches. Tests need to address Daniel's comments.

Oh sorry, spoke too soon - I noticed mingw issues, and in fact Daniel noticed 
them too.
Pls address and repost the series. Thanks!

> > v4 -> v5:
> > - Move QCryptoAkCipher into akcipherpriv.h, and modify the related comments.
> > - Rename asn1_decoder.c to der.c.
> > - Code style fix: use 'cleanup' & 'error' lables.
> > - Allow autoptr type to auto-free.
> > - Add test cases for rsakey to handle DER error.
> > - Other minor fixes.
> > 
> > v3 -> v4:
> > - Coding style fix: Akcipher -> AkCipher, struct XXX -> XXX, Rsa -> RSA,
> > XXX-alg -> XXX-algo.
> > - Change version info in qapi/crypto.json, from 7.0 -> 7.1.
> > - Remove ecdsa from qapi/crypto.json, it would be introduced with the 
> > implemetion later.
> > - Use QCryptoHashAlgothrim instead of QCryptoRSAHashAlgorithm(removed) in 
> > qapi/crypto.json.
> > - Rename arguments of qcrypto_akcipher_XXX to keep aligned with 
> > qcrypto_cipher_XXX(dec/enc/sign/vefiry -> in/out/in2), and add 
> > qcrypto_akcipher_max_XXX APIs.
> > - Add new API: qcrypto_akcipher_supports.
> > - Change the return value of qcrypto_akcipher_enc/dec/sign, these functions 
> > return the actual length of result.
> > - Separate ASN.1 source code and test case clean.
> > - Disable RSA raw encoding for akcipher-nettle.
> > - Separate RSA key parser into rsakey.{hc}, and implememts it with 
> > builtin-asn1-decoder and nettle respectivly.
> > - Implement RSA(pkcs1 and raw encoding) algorithm by gcrypt. This has 
> > higher priority than nettle.
> > - For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length 
> > of returned result maybe less than the dst buffer size, return the actual 
> > length of result instead of the buffer length to the guest side. (in 
> > function virtio_crypto_akcipher_input_data_helper)
> > - Other minor changes.
> > 
> > Thanks to Daniel!
> > 
> > Eric pointed out this missing part of use case, send it here again.
> > 
> > In our plan, the feature is designed for HTTPS offloading case and other 
> > applications which use kernel RSA/ecdsa by keyctl syscall. The full picture 
> > shows bellow:
> > 
> > 
> >   Nginx/openssl[1] ... Apps
> > Guest   -
> >virtio-crypto driver[2]
> > -
> >virtio-crypto backend[3]
> > Host-
> >   /  |  \
> >   builtin[4]   vhost keyctl[5] ...
> > 
> > 
> > [1] User applications can offload RSA calculation to kernel by keyctl 
> > syscall. There is no keyctl engine in openssl currently, we developed a 
> > engine and tried to contribute it to openssl upstream, but openssl 1.x does 
> > not accept new feature. Link:
> > https://github.com/openssl/openssl/pull/16689
> > 
> > This branch is available and maintained by Lei 
> > https://github.com/TousakaRin/openssl/tree/OpenSSL_1_1_1-kctl_engine
> > 
> > We tested nginx(change config file only) with openssl keyctl engine, it 
> > works fine.
> > 
> > [2] virtio-crypto driver is used to communicate with host side, send 
> > requests to host side to do asymmetric calculation.
> > https://lkml.org/lkml/2022/3/1/1425
> > 
> > [3] virtio-crypto backend handles requests from guest side, and forwards 
> > request to crypto backend driver of QEMU.
> > 
> > [4] Currently RSA is supported only in builtin driver. This driver is 
> > supposed to test the full feature without other software(Ex vhost process) 
> > and hardware dependence. ecdsa is introduced into qapi type without 
> > implementation, this may be implemented in Q3-2022 or later. If ecdsa type 
> > definition should be added with the implementation together, I'll remove 
> &

Re: [PATCH v5 0/9] Introduce akcipher service for virtio-crypto

2022-05-13 Thread Michael S. Tsirkin
On Thu, Apr 28, 2022 at 09:59:34PM +0800, zhenwei pi wrote:
> Hi, Lei & MST
> 
> Daniel has started to review the akcipher framework and nettle & gcrypt
> implementation, this part seems to be ready soon. Thanks a lot to Daniel!
> 
> And the last patch "crypto: Introduce RSA algorithm" handles akcipher
> requests from guest and uses the new akcipher service. The new feature
> can be used to test by the builtin driver. I would appreciate it if you
> could review patch.


I applied the first 6 patches. Tests need to address Daniel's comments.

> v4 -> v5:
> - Move QCryptoAkCipher into akcipherpriv.h, and modify the related comments.
> - Rename asn1_decoder.c to der.c.
> - Code style fix: use 'cleanup' & 'error' lables.
> - Allow autoptr type to auto-free.
> - Add test cases for rsakey to handle DER error.
> - Other minor fixes.
> 
> v3 -> v4:
> - Coding style fix: Akcipher -> AkCipher, struct XXX -> XXX, Rsa -> RSA,
> XXX-alg -> XXX-algo.
> - Change version info in qapi/crypto.json, from 7.0 -> 7.1.
> - Remove ecdsa from qapi/crypto.json, it would be introduced with the 
> implemetion later.
> - Use QCryptoHashAlgothrim instead of QCryptoRSAHashAlgorithm(removed) in 
> qapi/crypto.json.
> - Rename arguments of qcrypto_akcipher_XXX to keep aligned with 
> qcrypto_cipher_XXX(dec/enc/sign/vefiry -> in/out/in2), and add 
> qcrypto_akcipher_max_XXX APIs.
> - Add new API: qcrypto_akcipher_supports.
> - Change the return value of qcrypto_akcipher_enc/dec/sign, these functions 
> return the actual length of result.
> - Separate ASN.1 source code and test case clean.
> - Disable RSA raw encoding for akcipher-nettle.
> - Separate RSA key parser into rsakey.{hc}, and implememts it with 
> builtin-asn1-decoder and nettle respectivly.
> - Implement RSA(pkcs1 and raw encoding) algorithm by gcrypt. This has higher 
> priority than nettle.
> - For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length 
> of returned result maybe less than the dst buffer size, return the actual 
> length of result instead of the buffer length to the guest side. (in function 
> virtio_crypto_akcipher_input_data_helper)
> - Other minor changes.
> 
> Thanks to Daniel!
> 
> Eric pointed out this missing part of use case, send it here again.
> 
> In our plan, the feature is designed for HTTPS offloading case and other 
> applications which use kernel RSA/ecdsa by keyctl syscall. The full picture 
> shows bellow:
> 
> 
>   Nginx/openssl[1] ... Apps
> Guest   -
>virtio-crypto driver[2]
> -
>virtio-crypto backend[3]
> Host-
>   /  |  \
>   builtin[4]   vhost keyctl[5] ...
> 
> 
> [1] User applications can offload RSA calculation to kernel by keyctl 
> syscall. There is no keyctl engine in openssl currently, we developed a 
> engine and tried to contribute it to openssl upstream, but openssl 1.x does 
> not accept new feature. Link:
> https://github.com/openssl/openssl/pull/16689
> 
> This branch is available and maintained by Lei 
> https://github.com/TousakaRin/openssl/tree/OpenSSL_1_1_1-kctl_engine
> 
> We tested nginx(change config file only) with openssl keyctl engine, it works 
> fine.
> 
> [2] virtio-crypto driver is used to communicate with host side, send requests 
> to host side to do asymmetric calculation.
> https://lkml.org/lkml/2022/3/1/1425
> 
> [3] virtio-crypto backend handles requests from guest side, and forwards 
> request to crypto backend driver of QEMU.
> 
> [4] Currently RSA is supported only in builtin driver. This driver is 
> supposed to test the full feature without other software(Ex vhost process) 
> and hardware dependence. ecdsa is introduced into qapi type without 
> implementation, this may be implemented in Q3-2022 or later. If ecdsa type 
> definition should be added with the implementation together, I'll remove this 
> in next version.
> 
> [5] keyctl backend is in development, we will post this feature in Q2-2022. 
> keyctl backend can use hardware acceleration(Ex, Intel QAT).
> 
> Setup the full environment, tested with Intel QAT on host side, the QPS of 
> HTTPS increase to ~200% in a guest.
> 
> VS PCI passthrough: the most important benefit of this solution makes the VM 
> migratable.
> 
> v2 -> v3:
> - Introduce akcipher types to qapi
> - Add test/benchmark suite for akcipher class
> - Seperate 'virtio_crypto: Support virtio crypto asym operation' into:
>   - crypto: Introduce akcipher crypto class
>   - virtio-crypto: Introduce RSA algorithm
> 
> v1 -> v2:
> - Update virtio_crypto.h from v2 version of related kernel patch.
> 
> v1:
> - Support akcipher for virtio-crypto.
> - Introduce akcipher class.
> - Introduce ASN1 decoder into QEMU.
> - Implement RSA backend by nettle/hogweed.
> 
> Lei He (6):
>   qapi: crypto-akcipher: Introduce akcipher types 

Re: [PATCH v1] vdpa: Warn if MTU configured is too low

2022-05-11 Thread Michael S. Tsirkin
On Wed, May 11, 2022 at 01:56:42PM +0300, Eli Cohen wrote:
> Following the recommendation in virio spec 1.1, a device offering
> VIRTIO_NET_F_MTU should set the mtu to at least 1280 bytes.
> 
> Print a warning if this recommendation is not met.
> 
> Signed-off-by: Eli Cohen 
> ---
> v0 -> v1:
>   chage pr_warn to netlink warning to userspace
> 
>  drivers/vdpa/vdpa.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index 91f4c13c7c7c..0fb4a615f267 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -583,6 +583,9 @@ vdpa_nl_cmd_mgmtdev_get_dumpit(struct sk_buff *msg, 
> struct netlink_callback *cb)
>BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) | \
>BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
>  
> +/* Recommended virtio spec 1.1 section 5.1.4.1 */

I'd add name of section here too.

> +#define VIRTIO_MIN_PREFERRED_MTU 1280
> +

Preferred is kind of confusing here. I guess you are trying to
say it's not mandatory but I don't think this conveys this information.

Recommended (matching text below)?


>  static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct 
> genl_info *info)
>  {
>   struct vdpa_dev_set_config config = {};
> @@ -634,6 +637,10 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff 
> *skb, struct genl_info *i
>   err = PTR_ERR(mdev);
>   goto err;
>   }
> + if ((mdev->supported_features & BIT_ULL(VIRTIO_NET_F_MTU)) &&
> + (config.mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) &&
> + config.net.mtu < VIRTIO_MIN_PREFERRED_MTU))
> + NL_SET_ERR_MSG_MOD(info->extack, "MTU is below recommended 
> value\n");
>   if ((config.mask & mdev->config_attr_mask) != config.mask) {
>   NL_SET_ERR_MSG_MOD(info->extack,
>  "All provided attributes are not supported");


Pity we can't include the actual value here, but oh well. At least let's
include the recommended value, we can do that:
"MTU is below recommended value of " 
__stringify(VIRTIO_MIN_PREFERRED_MTU) "\n"


> @@ -1135,7 +1142,7 @@ static const struct nla_policy 
> vdpa_nl_policy[VDPA_ATTR_MAX + 1] = {
>   [VDPA_ATTR_DEV_NAME] = { .type = NLA_STRING },
>   [VDPA_ATTR_DEV_NET_CFG_MACADDR] = NLA_POLICY_ETH_ADDR,
>   /* virtio spec 1.1 section 5.1.4.1 for valid MTU range */
> - [VDPA_ATTR_DEV_NET_CFG_MTU] = NLA_POLICY_MIN(NLA_U16, 68),
> + [VDPA_ATTR_DEV_NET_CFG_MTU] = NLA_POLICY_MIN(NLA_U16, ETH_MIN_MTU),
>  };
>  
>  static const struct genl_ops vdpa_nl_ops[] = {
> -- 
> 2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa: Warn if MTU configured is too low

2022-05-11 Thread Michael S. Tsirkin
On Wed, May 11, 2022 at 05:34:25PM +0800, Jason Wang wrote:
> On Wed, May 11, 2022 at 4:48 PM Eli Cohen  wrote:
> >
> > Following the recommendation in virio spec 1.1, a device offering
> > VIRTIO_NET_F_MTU should set the mtu to at least 1280 bytes.
> >
> > Print a warning if this recommendation is not met.
> >
> > Signed-off-by: Eli Cohen 
> 
> I wonder why it's a must?


It's a SHOULD in the spec.  I guess 1280 is to allow IPv6.

> > ---
> >  drivers/vdpa/vdpa.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> > index 91f4c13c7c7c..961168fe9094 100644
> > --- a/drivers/vdpa/vdpa.c
> > +++ b/drivers/vdpa/vdpa.c
> > @@ -583,6 +583,9 @@ vdpa_nl_cmd_mgmtdev_get_dumpit(struct sk_buff *msg, 
> > struct netlink_callback *cb)
> >  BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) | \
> >  BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
> >
> > +/* Recommended virtio spec 1.1 section 5.1.4.1 */
> > +#define VIRTIO_MIN_PREFERRED_MTU 1280
> > +
> >  static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct 
> > genl_info *info)
> >  {
> > struct vdpa_dev_set_config config = {};
> > @@ -634,6 +637,10 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff 
> > *skb, struct genl_info *i
> > err = PTR_ERR(mdev);
> > goto err;
> > }
> > +   if ((mdev->supported_features & BIT_ULL(VIRTIO_NET_F_MTU)) &&
> > +   (config.mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) &&
> > +   config.net.mtu < VIRTIO_MIN_PREFERRED_MTU))
> 
> Should be <= ?
> 
> Thanks



> > +   pr_warn("MTU is below recommended value\n");
> > if ((config.mask & mdev->config_attr_mask) != config.mask) {
> > NL_SET_ERR_MSG_MOD(info->extack,
> >"All provided attributes are not 
> > supported");
> > @@ -1135,7 +1142,7 @@ static const struct nla_policy 
> > vdpa_nl_policy[VDPA_ATTR_MAX + 1] = {
> > [VDPA_ATTR_DEV_NAME] = { .type = NLA_STRING },
> > [VDPA_ATTR_DEV_NET_CFG_MACADDR] = NLA_POLICY_ETH_ADDR,
> > /* virtio spec 1.1 section 5.1.4.1 for valid MTU range */
> > -   [VDPA_ATTR_DEV_NET_CFG_MTU] = NLA_POLICY_MIN(NLA_U16, 68),
> > +   [VDPA_ATTR_DEV_NET_CFG_MTU] = NLA_POLICY_MIN(NLA_U16, ETH_MIN_MTU),
> >  };
> >
> >  static const struct genl_ops vdpa_nl_ops[] = {
> > --
> > 2.35.1
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] virtio: last minute fixup

2022-05-11 Thread Michael S. Tsirkin
On Tue, May 10, 2022 at 04:50:47PM -0700, Linus Torvalds wrote:
> On Tue, May 10, 2022 at 4:12 PM Nathan Chancellor  wrote:
> >
> > For what it's worth, as someone who is frequently tracking down and
> > reporting issues, a link to the mailing list post in the commit message
> > makes it much easier to get these reports into the right hands, as the
> > original posting is going to have all relevant parties in one location
> > and it will usually have all the context necessary to triage the
> > problem.
> 
> Honestly, I think such a thing would be trivial to automate with
> something like just a patch-id lookup, rather than a "Link:".
> 
> And such a lookup model ("where was this patch posted") would work for
> any patch (and often also find previous unmodified versions of
> it when it has been posted multiple times).
> 
> I suspect that most of the building blocks of such automation
> effectively already exists, since I think the lore infrastructure
> already integrates with patchwork, and patchwork already has a "look
> up by patch id".
> 
> Wouldn't it be cool if you had some webby interface to just go from
> commit SHA1 to patch ID to a lore.kernel.org lookup of where said
> patch was done?

Yes, that would be cool!

> Of course, I personally tend to just search by the commit contents
> instead, which works just about as well. If the first line of the
> commit isn't very unique, add a "f:author" to the search.
>
> IOW, I really don't find much value in the "Link to original
> submission", because that thing is *already* trivial to find, and the
> lore search is actually better in many ways (it also tends to find
> people *reporting* that commit, which is often what you really want -
> the reason you're doing the search is that there's something going on
> with it).
> 
> My argument here really is that "find where this commit was posted" is
> 
>  (a) not generally the most interesting thing
> 
>  (b) doesn't even need that "Link:" line.
> 
> but what *is* interesting, and where the "Link:" line is very useful,
> is finding where the original problem that *caused* that patch to be
> posted in the first place.
> 
> Yes, obviously you can find that original problem by searching too if
> the commit message has enough other information.
> 
> For example, if there is an oops quoted in the commit message, I have
> personally searched for parts of that kind of information to find the
> original report and discussion.
> 
> So that whole "searching is often an option" is true for pretty much
> _any_ Link:, but I think that for the whole "original submission" it's
> so mindless and can be automated that it really doesn't add much real
> value at all.
> 
> Linus

For me a problematic use-case is multiple versions of the patchset.
So I have a tree and I apply a patchset, start testing etc. Meanwhile author
posts another version. At that point I want to know which version
did I apply. Since people put that within [] in the subject, it
gets stripped off.

Thinking about it some more, how about sticking a link to the *cover
letter* in the commit, instead?  That would serve an extra useful purpose of
being able to figure out which patches are part of the same patchset.
And maybe Change "Link:" to "Patchset:" or "Cover-letter:"?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] virtio: last minute fixup

2022-05-11 Thread Michael S. Tsirkin
On Tue, May 10, 2022 at 11:23:11AM -0700, Linus Torvalds wrote:
> On Tue, May 10, 2022 at 5:24 AM Michael S. Tsirkin  wrote:
> >
> > A last minute fixup of the transitional ID numbers.
> > Important to get these right - if users start to depend on the
> > wrong ones they are very hard to fix.
> 
> Hmm. I've pulled this, but those numbers aren't exactly "new".
> 
> They've been that way since 5.14, so what makes you think people
> haven't already started depending on them?

Yes they have been in the header but they are not used by *Linux* yet.
My worry is for when we start using them and then someone backports
the patches without backporting the macro fix.
Maybe we should just drop these until there's a user, but I am
a bit wary of a step like this so late in the cycle.

> And - once again - I want to complain about the "Link:" in that commit.
> 
> It points to a completely useless patch submission. It doesn't point
> to anything useful at all.
> 
> I think it's a disease that likely comes from "b4", and people decided
> that "hey, I can use the -l parameter to add that Link: field", and it
> looks better that way.
> 
> And then they add it all the time, whether it makes any sense or not.
> 
> I've mainly noticed it with the -tip tree, but maybe that's just
> because I've happened to look at it.
> 
> I really hate those worthless links that basically add zero actual
> information to the commit.
> 
> The "Link" field is for _useful_ links. Not "let's add a link just
> because we can".
> 
>Linus


OK I will stop doing this.
I thought they are handy for when there are several versions of the
patch. It helps me make sure I applied the latest one. Saving the
message ID of the original mail in some other way would also be ok.
Any suggestions for a better way to do this?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[GIT PULL] virtio: last minute fixup

2022-05-10 Thread Michael S. Tsirkin
The following changes since commit 1c80cf031e0204fde471558ee40183695773ce13:

  vdpa: mlx5: synchronize driver status with CVQ (2022-03-30 04:18:14 -0400)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 7ff960a6fe399fdcbca6159063684671ae57eee9:

  virtio: fix virtio transitional ids (2022-05-10 07:22:28 -0400)


virtio: last minute fixup

A last minute fixup of the transitional ID numbers.
Important to get these right - if users start to depend on the
wrong ones they are very hard to fix.

Signed-off-by: Michael S. Tsirkin 


Shunsuke Mie (1):
  virtio: fix virtio transitional ids

 include/uapi/linux/virtio_ids.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 8/9] virtio: harden vring IRQ

2022-05-10 Thread Michael S. Tsirkin
On Sat, May 07, 2022 at 03:19:53PM +0800, Jason Wang wrote:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
> 
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>that is used by some device such as virtio-blk
> 2) done only for PCI transport
> 
> The vq->broken is re-used in this patch for implementing the IRQ
> hardening. The vq->broken is set to true during both initialization
> and reset. And the vq->broken is set to false in
> virtio_device_ready(). Then vring_interrupt can check and return when
> vq->broken is true. And in this case, switch to return IRQ_NONE to let
> the interrupt core aware of such invalid interrupt to prevent IRQ
> storm.
> 
> The reason of using a per queue variable instead of a per device one
> is that we may need it for per queue reset hardening in the future.
> 
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Cc: Halil Pasic 
> Cc: Cornelia Huck 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio.c   | 15 ---
>  drivers/virtio/virtio_ring.c  | 11 +++
>  include/linux/virtio_config.h | 12 
>  3 files changed, 31 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 8dde44ea044a..696f5ba4f38e 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
>   * */
>  void virtio_reset_device(struct virtio_device *dev)
>  {
> + /*
> +  * The below virtio_synchronize_cbs() guarantees that any
> +  * interrupt for this line arriving after
> +  * virtio_synchronize_vqs() has completed is guaranteed to see
> +  * driver_ready == false.
> +  */
> + virtio_break_device(dev);
> + virtio_synchronize_cbs(dev);
> +
>   dev->config->reset(dev);
>  }
>  EXPORT_SYMBOL_GPL(virtio_reset_device);
> @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
>   dev->config_enabled = false;
>   dev->config_change_pending = false;
>  
> + INIT_LIST_HEAD(>vqs);
> + spin_lock_init(>vqs_list_lock);
> +
>   /* We always start by resetting the device, in case a previous
>* driver messed it up.  This also tests that code path a little. */
>   virtio_reset_device(dev);
> @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
>   /* Acknowledge that we've seen the device. */
>   virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>  
> - INIT_LIST_HEAD(>vqs);
> - spin_lock_init(>vqs_list_lock);
> -
>   /*
>* device_add() causes the bus infrastructure to look for a matching
>* driver.
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 5b7df7c455f0..9dfad2890d7a 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1690,7 +1690,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>   vq->we_own_ring = true;
>   vq->notify = notify;
>   vq->weak_barriers = weak_barriers;
> - vq->broken = false;
> + vq->broken = true;
>   vq->last_used_idx = 0;
>   vq->event_triggered = false;
>   vq->num_added = 0;
> @@ -2136,8 +2136,11 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   return IRQ_NONE;
>   }
>  
> - if (unlikely(vq->broken))
> - return IRQ_HANDLED;
> + if (unlikely(vq->broken)) {
> + dev_warn_once(>vq.vdev->dev,
> +   "virtio vring IRQ raised before DRIVER_OK");
> + return IRQ_NONE;
> + }
>  
>   /* Just a hint for performance: so it's ok that this can be racy! */
>   if (vq->event)
> @@ -2179,7 +2182,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
> index,
>   vq->we_own_ring = false;
>   vq->notify = notify;
>   vq->weak_barriers = weak_barriers;
> - vq->broken = false;
> + vq->broken = true;
>   vq->last_used_idx = 0;
>   vq->event_triggered = false;
>   vq->num_added = 0;
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index d8a2340f928e..23f1694cdbd5 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -256,6 +256,18 @@ void virtio_device_ready(struct virtio_device *dev)
>   unsigned status = dev->config->get_status(dev);
>  
>   BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +
> + /*
> +  * The virtio_synchronize_cbs() makes sure vring_interrupt()
> 

Re: [PATCH V4 6/9] virtio-ccw: implement synchronize_cbs()

2022-05-10 Thread Michael S. Tsirkin
On Sat, May 07, 2022 at 03:19:51PM +0800, Jason Wang wrote:
> This patch tries to implement the synchronize_cbs() for ccw. For the
> vring_interrupt() that is called via virtio_airq_handler(), the
> synchronization is simply done via the airq_info's lock. For the
> vring_interrupt() that is called via virtio_ccw_int_handler(), a per
> device spinlock for irq is introduced ans used in the synchronization
> method.
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Cc: Halil Pasic 
> Cc: Cornelia Huck 
> Signed-off-by: Jason Wang 
> ---
>  drivers/s390/virtio/virtio_ccw.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index d35e7a3f7067..001e1f0e6037 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -62,6 +62,7 @@ struct virtio_ccw_device {
>   unsigned int revision; /* Transport revision */
>   wait_queue_head_t wait_q;
>   spinlock_t lock;
> + rwlock_t irq_lock;
>   struct mutex io_lock; /* Serializes I/O requests */
>   struct list_head virtqueues;
>   bool is_thinint;
> @@ -984,6 +985,27 @@ static const char *virtio_ccw_bus_name(struct 
> virtio_device *vdev)
>   return dev_name(>cdev->dev);
>  }
>  
> +static void virtio_ccw_synchronize_cbs(struct virtio_device *vdev)
> +{
> + struct virtio_ccw_device *vcdev = to_vc_device(vdev);
> + struct airq_info *info = vcdev->airq_info;
> +
> + if (info) {
> + /*
> +  * Synchronize with the vring_interrupt() with airq indicator
> +  */
> + write_lock(>lock);
> + write_unlock(>lock);
> + } else {
> + /*
> +  * Synchronize with the vring_interrupt() called by
> +  * virtio_ccw_int_handler().
> +  */
> + write_lock(>irq_lock);
> + write_unlock(>irq_lock);
> + }
> +}
> +
>  static const struct virtio_config_ops virtio_ccw_config_ops = {
>   .get_features = virtio_ccw_get_features,
>   .finalize_features = virtio_ccw_finalize_features,
> @@ -995,6 +1017,7 @@ static const struct virtio_config_ops 
> virtio_ccw_config_ops = {
>   .find_vqs = virtio_ccw_find_vqs,
>   .del_vqs = virtio_ccw_del_vqs,
>   .bus_name = virtio_ccw_bus_name,
> + .synchronize_cbs = virtio_ccw_synchronize_cbs,
>  };
>  
>  
> @@ -1079,6 +1102,7 @@ static void virtio_ccw_int_handler(struct ccw_device 
> *cdev,
>  {
>   __u32 activity = intparm & VIRTIO_CCW_INTPARM_MASK;
>   struct virtio_ccw_device *vcdev = dev_get_drvdata(>dev);
> + unsigned long flags;
>   int i;
>   struct virtqueue *vq;
>  
> @@ -1106,6 +1130,7 @@ static void virtio_ccw_int_handler(struct ccw_device 
> *cdev,
>   vcdev->err = -EIO;
>   }
>   virtio_ccw_check_activity(vcdev, activity);
> + read_lock_irqsave(>irq_lock, flags);
>   for_each_set_bit(i, indicators(vcdev),
>sizeof(*indicators(vcdev)) * BITS_PER_BYTE) {
>   /* The bit clear must happen before the vring kick. */

Cornelia sent a lockdep trace on this.

Basically I think this gets the irqsave/restore logic wrong.
It attempts to disable irqs in the handler (which is an interrupt
anyway).
And it does not disable irqs in the synchronize_cbs.

As a result in interrupt might try to take a read lock while
.synchronize_cbs has the writer lock, resulting in a deadlock.

I think you want regular read_lock + write_lock_irq.


> @@ -1114,6 +1139,7 @@ static void virtio_ccw_int_handler(struct ccw_device 
> *cdev,
>   vq = virtio_ccw_vq_by_ind(vcdev, i);
>   vring_interrupt(0, vq);
>   }
> + read_unlock_irqrestore(>irq_lock, flags);
>   if (test_bit(0, indicators2(vcdev))) {
>   virtio_config_changed(>vdev);
>   clear_bit(0, indicators2(vcdev));
> @@ -1284,6 +1310,7 @@ static int virtio_ccw_online(struct ccw_device *cdev)
>   init_waitqueue_head(>wait_q);
>   INIT_LIST_HEAD(>virtqueues);
>   spin_lock_init(>lock);
> + rwlock_init(>irq_lock);
>   mutex_init(>io_lock);
>  
>   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio: fix virtio transitional ids

2022-05-10 Thread Michael S. Tsirkin
On Tue, May 10, 2022 at 07:27:23PM +0900, Shunsuke Mie wrote:
> This commit fixes the transitional PCI device ID.
> 
> Fixes: d61914ea6ada ("virtio: update virtio id table, add transitional ids")
> Signed-off-by: Shunsuke Mie 

Absolutely! I don't understand how I could have missed this.

Applied, thanks!

> ---
>  include/uapi/linux/virtio_ids.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 80d76b75bccd..7aa2eb766205 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -73,12 +73,12 @@
>   * Virtio Transitional IDs
>   */
>  
> -#define VIRTIO_TRANS_ID_NET  1000 /* transitional virtio net */
> -#define VIRTIO_TRANS_ID_BLOCK1001 /* transitional virtio 
> block */
> -#define VIRTIO_TRANS_ID_BALLOON  1002 /* transitional virtio 
> balloon */
> -#define VIRTIO_TRANS_ID_CONSOLE  1003 /* transitional virtio 
> console */
> -#define VIRTIO_TRANS_ID_SCSI 1004 /* transitional virtio SCSI */
> -#define VIRTIO_TRANS_ID_RNG  1005 /* transitional virtio rng */
> -#define VIRTIO_TRANS_ID_9P   1009 /* transitional virtio 9p console 
> */
> +#define VIRTIO_TRANS_ID_NET  0x1000 /* transitional virtio net */
> +#define VIRTIO_TRANS_ID_BLOCK0x1001 /* transitional virtio 
> block */
> +#define VIRTIO_TRANS_ID_BALLOON  0x1002 /* transitional virtio 
> balloon */
> +#define VIRTIO_TRANS_ID_CONSOLE  0x1003 /* transitional virtio 
> console */
> +#define VIRTIO_TRANS_ID_SCSI 0x1004 /* transitional virtio SCSI */
> +#define VIRTIO_TRANS_ID_RNG  0x1005 /* transitional virtio rng */
> +#define VIRTIO_TRANS_ID_9P   0x1009 /* transitional virtio 9p 
> console */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> -- 
> 2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_net: Remove unused case in virtio_skb_set_hash()

2022-05-09 Thread Michael S. Tsirkin
On Mon, May 09, 2022 at 09:14:32PM +0800, Tang Bin wrote:
> In this function, "VIRTIO_NET_HASH_REPORT_NONE" is included
> in "default", so it canbe removed.
> 
> Signed-off-by: Tang Bin 

What's the point of this?

> ---
>  drivers/net/virtio_net.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 87838cbe3..b3e5d8637 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1172,7 +1172,6 @@ static void virtio_skb_set_hash(const struct 
> virtio_net_hdr_v1_hash *hdr_hash,
>   case VIRTIO_NET_HASH_REPORT_IPv6_EX:
>   rss_hash_type = PKT_HASH_TYPE_L3;
>   break;
> - case VIRTIO_NET_HASH_REPORT_NONE:
>   default:
>   rss_hash_type = PKT_HASH_TYPE_NONE;
>   }
> -- 
> 2.20.1.windows.1
> 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 00/19] Control VQ support in vDPA

2022-05-09 Thread Michael S. Tsirkin
On Mon, May 09, 2022 at 11:42:10AM +0800, Jason Wang wrote:
> On Thu, Mar 31, 2022 at 2:05 AM Gautam Dawar  wrote:
> >
> > Hi All:
> >
> > This series tries to add the support for control virtqueue in vDPA.
> >
> > Control virtqueue is used by networking device for accepting various
> > commands from the driver. It's a must to support multiqueue and other
> > configurations.
> >
> > When used by vhost-vDPA bus driver for VM, the control virtqueue
> > should be shadowed via userspace VMM (Qemu) instead of being assigned
> > directly to Guest. This is because Qemu needs to know the device state
> > in order to start and stop device correctly (e.g for Live Migration).
> >
> > This requies to isolate the memory mapping for control virtqueue
> > presented by vhost-vDPA to prevent guest from accessing it directly.
> >
> > To achieve this, vDPA introduce two new abstractions:
> >
> > - address space: identified through address space id (ASID) and a set
> >  of memory mapping in maintained
> > - virtqueue group: the minimal set of virtqueues that must share an
> >  address space
> >
> > Device needs to advertise the following attributes to vDPA:
> >
> > - the number of address spaces supported in the device
> > - the number of virtqueue groups supported in the device
> > - the mappings from a specific virtqueue to its virtqueue groups
> >
> > The mappings from virtqueue to virtqueue groups is fixed and defined
> > by vDPA device driver. E.g:
> >
> > - For the device that has hardware ASID support, it can simply
> >   advertise a per virtqueue group.
> > - For the device that does not have hardware ASID support, it can
> >   simply advertise a single virtqueue group that contains all
> >   virtqueues. Or if it wants a software emulated control virtqueue, it
> >   can advertise two virtqueue groups, one is for cvq, another is for
> >   the rest virtqueues.
> >
> > vDPA also allow to change the association between virtqueue group and
> > address space. So in the case of control virtqueue, userspace
> > VMM(Qemu) may use a dedicated address space for the control virtqueue
> > group to isolate the memory mapping.
> >
> > The vhost/vhost-vDPA is also extend for the userspace to:
> >
> > - query the number of virtqueue groups and address spaces supported by
> >   the device
> > - query the virtqueue group for a specific virtqueue
> > - assocaite a virtqueue group with an address space
> > - send ASID based IOTLB commands
> >
> > This will help userspace VMM(Qemu) to detect whether the control vq
> > could be supported and isolate memory mappings of control virtqueue
> > from the others.
> >
> > To demonstrate the usage, vDPA simulator is extended to support
> > setting MAC address via a emulated control virtqueue.
> >
> > Please review.
> 
> Michael, this looks good to me, do you have comments on this?
> 
> Thanks


I'll merge this for next.

> >
> > Changes since RFC v2:
> >
> > - Fixed memory leak for asid 0 in vhost_vdpa_remove_as()
> > - Removed unnecessary NULL check for iotlb in vhost_vdpa_unmap() and
> >   changed its return type to void.
> > - Removed insignificant used_as member field from struct vhost_vdpa.
> > - Corrected the iommu parameter in call to vringh_set_iotlb() from
> >   vdpasim_set_group_asid()
> > - Fixed build errors with vdpa_sim_net
> > - Updated alibaba, vdpa_user and virtio_pci vdpa parent drivers to
> >   call updated vDPA APIs and ensured successful build
> > - Tested control (MAC address configuration) and data-path using
> >   single virtqueue pair on Xilinx (now AMD) SN1022 SmartNIC device
> >   and vdpa_sim_net software device using QEMU release at [1]
> > - Removed two extra blank lines after set_group_asid() in
> >   include/linux/vdpa.h
> >
> > Changes since v1:
> >
> > - Rebased the v1 patch series on vhost branch of MST vhost git repo
> >   git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost
> > - Updates to accommodate vdpa_sim changes from monolithic module in
> >   kernel used v1 patch series to current modularized class (net, block)
> >   based approach.
> > - Added new attributes (ngroups and nas) to "vdpasim_dev_attr" and
> >   propagated them from vdpa_sim_net to vdpa_sim
> > - Widened the data-type for "asid" member of vhost_msg_v2 to __u32
> >   to accommodate PASID
> > - Fixed the buildbot warnings
> > - Resolved all checkpatch.pl errors and warnings
> > - Tested both control and datapath with Xilinx Smartnic SN1000 series
> >   device using QEMU implementing the Shadow virtqueue and support for
> >   VQ groups and ASID available at [1]
> >
> > Changes since RFC:
> >
> > - tweak vhost uAPI documentation
> > - switch to use device specific IOTLB really in patch 4
> > - tweak the commit log
> > - fix that ASID in vhost is claimed to be 32 actually but 16bit
> >   actually
> > - fix use after free when using ASID with IOTLB batching requests
> > - switch to use Stefano's patch for having separated iov
> > - remove unused 

Re: [PATCH net-next 5/6] net: virtio: switch to netif_napi_add_weight()

2022-05-07 Thread Michael S. Tsirkin
On Fri, May 06, 2022 at 10:07:50AM -0700, Jakub Kicinski wrote:
> virtio netdev driver uses a custom napi weight, switch to the new
> API for setting custom weight.
> 
> Signed-off-by: Jakub Kicinski 

Acked-by: Michael S. Tsirkin 

> ---
> CC: m...@redhat.com
> CC: jasow...@redhat.com
> CC: virtualization@lists.linux-foundation.org
> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ebb98b796352..db05b5e930be 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3313,8 +3313,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
>   INIT_DELAYED_WORK(>refill, refill_work);
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   vi->rq[i].pages = NULL;
> - netif_napi_add(vi->dev, >rq[i].napi, virtnet_poll,
> -napi_weight);
> + netif_napi_add_weight(vi->dev, >rq[i].napi, virtnet_poll,
> +   napi_weight);
>   netif_napi_add_tx_weight(vi->dev, >sq[i].napi,
>virtnet_poll_tx,
>napi_tx ? napi_weight : 0);
> -- 
> 2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RE: [PATCH v5 5/5] virtio-crypto: enable retry for virtio-crypto-dev

2022-05-06 Thread Michael S. Tsirkin
On Fri, May 06, 2022 at 05:55:33PM +0800, zhenwei pi wrote:
> On 5/6/22 17:34, Gonglei (Arei) wrote:
> > 
> > 
> > > -Original Message-
> > > From: zhenwei pi [mailto:pizhen...@bytedance.com]
> > > Sent: Thursday, May 5, 2022 5:24 PM
> > > To: Gonglei (Arei) ; m...@redhat.com
> > > Cc: jasow...@redhat.com; herb...@gondor.apana.org.au;
> > > linux-ker...@vger.kernel.org; virtualization@lists.linux-foundation.org;
> > > linux-cry...@vger.kernel.org; helei.si...@bytedance.com;
> > > pizhen...@bytedance.com; da...@davemloft.net
> > > Subject: [PATCH v5 5/5] virtio-crypto: enable retry for virtio-crypto-dev
> > > 
> > > From: lei he 
> > > 
> > > Enable retry for virtio-crypto-dev, so that crypto-engine can process
> > > cipher-requests parallelly.
> > > 
> > > Cc: Michael S. Tsirkin 
> > > Cc: Jason Wang 
> > > Cc: Gonglei 
> > > Signed-off-by: lei he 
> > > Signed-off-by: zhenwei pi 
> > > ---
> > >   drivers/crypto/virtio/virtio_crypto_core.c | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/crypto/virtio/virtio_crypto_core.c
> > > b/drivers/crypto/virtio/virtio_crypto_core.c
> > > index 60490ffa3df1..f67e0d4c1b0c 100644
> > > --- a/drivers/crypto/virtio/virtio_crypto_core.c
> > > +++ b/drivers/crypto/virtio/virtio_crypto_core.c
> > > @@ -144,7 +144,8 @@ static int virtcrypto_find_vqs(struct virtio_crypto 
> > > *vi)
> > >   spin_lock_init(>data_vq[i].lock);
> > >   vi->data_vq[i].vq = vqs[i];
> > >   /* Initialize crypto engine */
> > > - vi->data_vq[i].engine = crypto_engine_alloc_init(dev, 1);
> > > + vi->data_vq[i].engine = crypto_engine_alloc_init_and_set(dev, 
> > > true,
> > > NULL, 1,
> > > + 
> > > virtqueue_get_vring_size(vqs[i]));
> > 
> > Here the '1' can be 'true' too.
> > 
> > Sure, you can add
> > 
> > Reviewed-by: Gonglei 
> > 
> > Regards,
> > -Gonglei
> > 
> > >   if (!vi->data_vq[i].engine) {
> > >   ret = -ENOMEM;
> > >   goto err_engine;
> > > --
> > > 2.20.1
> > 
> 
> Thanks to Lei!
> 
> Hi, Michael
> I would appreciate it if you could apply this minor change, or I send the v6
> series, which one do you prefer?
> 
> -- 


send v6 with acks and change pls

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: PING: [PATCH v4 0/5] virtio-crypto: Improve performance

2022-05-04 Thread Michael S. Tsirkin
On Thu, May 05, 2022 at 03:14:40AM +, Gonglei (Arei) wrote:
> 
> 
> > -Original Message-
> > From: zhenwei pi [mailto:pizhen...@bytedance.com]
> > Sent: Thursday, May 5, 2022 10:35 AM
> > To: Gonglei (Arei) ; m...@redhat.com;
> > jasow...@redhat.com
> > Cc: herb...@gondor.apana.org.au; linux-ker...@vger.kernel.org;
> > virtualization@lists.linux-foundation.org; linux-cry...@vger.kernel.org;
> > helei.si...@bytedance.com; da...@davemloft.net
> > Subject: PING: [PATCH v4 0/5] virtio-crypto: Improve performance
> > 
> > Hi, Lei
> > 
> > Jason replied in another patch:
> > Still hundreds of lines of changes, I'd leave this change to other 
> > maintainers to
> > decide.
> > 
> > Quite frankly, the virtio crypto driver changed only a few in the past, and 
> > the
> > performance of control queue is not good enough. I am in doubt about that 
> > this
> > driver is not used widely. So I'd like to rework a lot, it would be best to 
> > complete
> > this work in 5.18 window.
> > 
> > This gets different point with Jason. I would appreciate it if you could 
> > give me
> > any hint.
> > 
> 
> This is already in my todo list.
> 
> Regards,
> -Gonglei

It's been out a month though, not really acceptable latency for review.
So I would apply this for next,  but you need to address Dan Captenter's
comment, and look for simular patterns elesewhere in your patch.


> > On 4/24/22 18:41, zhenwei pi wrote:
> > > Hi, Lei
> > > I'd like to move helper and callback functions(Eg, 
> > > virtcrypto_clear_request
> > >   and virtcrypto_ctrlq_callback) from xx_core.c to xx_common.c, then
> > > the xx_core.c supports:
> > >- probe/remove/irq affinity seting for a virtio device
> > >- basic virtio related operations
> > >
> > > xx_common.c supports:
> > >- common helpers/functions for algos
> > >
> > > Do you have any suggestion about this?
> > >
> > > v3 -> v4:
> > >   - Don't create new file virtio_common.c, the new functions are added
> > > into virtio_crypto_core.c
> > >   - Split the first patch into two parts:
> > >   1, change code style,
> > >   2, use private buffer instead of shared buffer
> > >   - Remove relevant change.
> > >   - Other minor changes.
> > >
> > > v2 -> v3:
> > >   - Jason suggested that spliting the first patch into two part:
> > >   1, using private buffer
> > >   2, remove the busy polling
> > > Rework as Jason's suggestion, this makes the smaller change in
> > > each one and clear.
> > >
> > > v1 -> v2:
> > >   - Use kfree instead of kfree_sensitive for insensitive buffer.
> > >   - Several coding style fix.
> > >   - Use memory from current node, instead of memory close to device
> > >   - Add more message in commit, also explain why removing per-device
> > > request buffer.
> > >   - Add necessary comment in code to explain why using kzalloc to
> > > allocate struct virtio_crypto_ctrl_request.
> > >
> > > v1:
> > > The main point of this series is to improve the performance for virtio
> > > crypto:
> > > - Use wait mechanism instead of busy polling for ctrl queue, this
> > >reduces CPU and lock racing, it's possiable to create/destroy session
> > >parallelly, QPS increases from ~40K/s to ~200K/s.
> > > - Enable retry on crypto engine to improve performance for data queue,
> > >this allows the larger depth instead of 1.
> > > - Fix dst data length in akcipher service.
> > > - Other style fix.
> > >
> > > lei he (2):
> > >virtio-crypto: adjust dst_len at ops callback
> > >virtio-crypto: enable retry for virtio-crypto-dev
> > >
> > > zhenwei pi (3):
> > >virtio-crypto: change code style
> > >virtio-crypto: use private buffer for control request
> > >virtio-crypto: wait ctrl queue instead of busy polling
> > >
> > >   .../virtio/virtio_crypto_akcipher_algs.c  |  83 ++-
> > >   drivers/crypto/virtio/virtio_crypto_common.h  |  21 ++-
> > >   drivers/crypto/virtio/virtio_crypto_core.c|  55 ++-
> > >   .../virtio/virtio_crypto_skcipher_algs.c  | 140 --
> > >   4 files changed, 180 insertions(+), 119 deletions(-)
> > >
> > 
> > --
> > zhenwei pi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/2] vsock/virtio: add support for device suspend/resume

2022-05-02 Thread Michael S. Tsirkin
On Thu, Apr 28, 2022 at 03:22:39PM +0200, Stefano Garzarella wrote:
> Vilas reported that virtio-vsock no longer worked properly after
> suspend/resume (echo mem >/sys/power/state).
> It was impossible to connect to the host and vice versa.
> 
> Indeed, the support has never been implemented.
> 
> This series implement .freeze and .restore callbacks of struct virtio_driver
> to support device suspend/resume.
> 
> The first patch factors our the code to initialize and delete VQs.
> The second patch uses that code to support device suspend/resume.
> 
> Signed-off-by: Stefano Garzarella 


Acked-by: Michael S. Tsirkin 

> Stefano Garzarella (2):
>   vsock/virtio: factor our the code to initialize and delete VQs
>   vsock/virtio: add support for device suspend/resume
> 
>  net/vmw_vsock/virtio_transport.c | 197 ---
>  1 file changed, 131 insertions(+), 66 deletions(-)
> 
> -- 
> 2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] virtio-pci: Use cpumask_available to fix compilation error

2022-04-28 Thread Michael S. Tsirkin
On Thu, Apr 28, 2022 at 11:48:01AM +0200, Christophe Marie Francois Dupont de 
Dinechin wrote:
> 
> 
> > On 15 Apr 2022, at 10:48, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Apr 14, 2022 at 05:08:55PM +0200, Christophe de Dinechin wrote:
> >> With GCC 12 and defconfig, we get the following error:
> >> 
> >> |   CC  drivers/virtio/virtio_pci_common.o
> >> | drivers/virtio/virtio_pci_common.c: In function ‘vp_del_vqs’:
> >> | drivers/virtio/virtio_pci_common.c:257:29: error: the comparison will
> >> |  always evaluate as ‘true’ for the pointer operand in
> >> |  ‘vp_dev->msix_affinity_masks + (sizetype)((long unsigned int)i * 8)’
> >> |  must not be NULL [-Werror=address]
> >> |   257 | if (vp_dev->msix_affinity_masks[i])
> >> |   | ^~
> >> 
> >> This happens in the case where CONFIG_CPUMASK_OFFSTACK is not defined,
> >> since we typedef cpumask_var_t as an array. The compiler is essentially
> >> complaining that an array pointer cannot be NULL. This is not a very
> >> important warning, but there is a function called cpumask_available that
> >> seems to be defined just for that case, so the fix is easy.
> >> 
> >> Signed-off-by: Christophe de Dinechin 
> >> Signed-off-by: Christophe de Dinechin 
> > 
> > There was an alternate patch proposed for this by
> > Murilo Opsfelder Araujo. What do you think about that approach?
> 
> I responded on the other thread, but let me share the response here:
> 
> [to muri...@linux.ibm.com]
> Apologies for the delay in responding, broken laptop…
> 
> In the case where CONFIG_CPUMASK_OFFSTACK is not defined, we have:
> 
>   typedef struct cpumask cpumask_var_t[1];
> 
> So that vp_dev->msix_affinity_masks[i] is statically not null (that’s the 
> warning)
> but also a static pointer, so not kfree-safe IMO.


Not sure I understand what you are saying here.

> > 
> > 
> >> ---
> >> drivers/virtio/virtio_pci_common.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/drivers/virtio/virtio_pci_common.c 
> >> b/drivers/virtio/virtio_pci_common.c
> >> index d724f676608b..5c44a2f13c93 100644
> >> --- a/drivers/virtio/virtio_pci_common.c
> >> +++ b/drivers/virtio/virtio_pci_common.c
> >> @@ -254,7 +254,7 @@ void vp_del_vqs(struct virtio_device *vdev)
> >> 
> >>if (vp_dev->msix_affinity_masks) {
> >>for (i = 0; i < vp_dev->msix_vectors; i++)
> >> -  if (vp_dev->msix_affinity_masks[i])
> >> +  if (cpumask_available(vp_dev->msix_affinity_masks[i]))
> >>
> >> free_cpumask_var(vp_dev->msix_affinity_masks[i]);
> >>}
> >> 
> >> -- 
> >> 2.35.1
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-pci: Remove wrong address verification in vp_del_vqs()

2022-04-28 Thread Michael S. Tsirkin
On Thu, Apr 28, 2022 at 11:55:31AM +0200, Christophe Marie Francois Dupont de 
Dinechin wrote:
> 
> 
> > On 28 Apr 2022, at 11:51, Christophe Marie Francois Dupont de Dinechin 
> >  wrote:
> > 
> > 
> > 
> >> On 28 Apr 2022, at 11:46, Christophe Marie Francois Dupont de Dinechin 
> >>  wrote:
> >> 
> >> 
> >> 
> >>> On 15 Apr 2022, at 05:51, Murilo Opsfelder Araújo  
> >>> wrote:
> >>> 
> >>> On 4/14/22 23:30, Murilo Opsfelder Araujo wrote:
>  GCC 12 enhanced -Waddress when comparing array address to null [0],
>  which warns:
>  drivers/virtio/virtio_pci_common.c: In function ‘vp_del_vqs’:
>  drivers/virtio/virtio_pci_common.c:257:29: warning: the comparison will 
>  always evaluate as ‘true’ for the pointer operand in 
>  ‘vp_dev->msix_affinity_masks + (sizetype)((long unsigned int)i * 256)’ 
>  must not be NULL [-Waddress]
>  257 | if (vp_dev->msix_affinity_masks[i])
>  | ^~
>  In fact, the verification is comparing the result of a pointer
>  arithmetic, the address "msix_affinity_masks + i", which will always
>  evaluate to true.
>  Under the hood, free_cpumask_var() calls kfree(), which is safe to pass
>  NULL, not requiring non-null verification. So remove the verification
>  to make compiler happy (happy compiler, happy life).
>  [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102103
>  Signed-off-by: Murilo Opsfelder Araujo 
>  ---
>  drivers/virtio/virtio_pci_common.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>  diff --git a/drivers/virtio/virtio_pci_common.c 
>  b/drivers/virtio/virtio_pci_common.c
>  index d724f676608b..5046efcffb4c 100644
>  --- a/drivers/virtio/virtio_pci_common.c
>  +++ b/drivers/virtio/virtio_pci_common.c
>  @@ -254,8 +254,7 @@ void vp_del_vqs(struct virtio_device *vdev)
>   if (vp_dev->msix_affinity_masks) {
>   for (i = 0; i < vp_dev->msix_vectors; i++)
>  -if (vp_dev->msix_affinity_masks[i])
>  -
>  free_cpumask_var(vp_dev->msix_affinity_masks[i]);
>  +
>  free_cpumask_var(vp_dev->msix_affinity_masks[i]);
>   }
>   if (vp_dev->msix_enabled) {
> >>> 
> >>> After I sent this message, I realized that Christophe (copied here)
> >>> had already proposed a fix:
> >>> 
> >>> https://lore.kernel.org/lkml/20220414150855.2407137-4-dinec...@redhat.com/
> >>> 
> >>> Christophe,
> >>> 
> >>> Since free_cpumask_var() calls kfree() and kfree() is null-safe,
> >>> can we just drop this null verification and call free_cpumask_var() right 
> >>> away?
> >> 
> >> Apologies for the delay in responding, broken laptop…
> >> 
> >> In the case where CONFIG_CPUMASK_OFFSTACK is not defined, we have:
> >> 
> >>typedef struct cpumask cpumask_var_t[1];
> >> 
> >> So that vp_dev->msix_affinity_masks[i] is statically not null (that’s the 
> >> warning)
> >> but also a static pointer, so not kfree-safe IMO.
> > 
> > … which also renders my own patch invalid :-/
> > 
> > Compiler warnings are good. Clearly not sufficient.
> 
> Ah, I just noticed that free_cpumask_var is a noop in that case.
> 
> So yes, your fix is better :-)

ACK then?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-28 Thread Michael S. Tsirkin
On Thu, Apr 28, 2022 at 02:02:16PM +0800, Jason Wang wrote:
> On Thu, Apr 28, 2022 at 1:55 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Apr 28, 2022 at 01:51:59PM +0800, Jason Wang wrote:
> > > On Thu, Apr 28, 2022 at 1:24 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Thu, Apr 28, 2022 at 11:04:41AM +0800, Jason Wang wrote:
> > > > > > But my guess is that rwlock + some testing for the legacy indicator 
> > > > > > case
> > > > > > just to double check if there is a heavy regression despite of our
> > > > > > expectations to see none should do the trick.
> > > > >
> > > > > I suggest this, rwlock (for not airq) seems better than spinlock, but
> > > > > at worst case it will cause cache line bouncing. But I wonder if it's
> > > > > noticeable (anyhow it has been used for airq).
> > > > >
> > > > > Thanks
> > > >
> > > > Which existing rwlock does airq use right now? Can we take it to sync?
> > >
> > > It's the rwlock in airq_info, it has already been used in this patch.
> > >
> > > write_lock(>lock);
> > > write_unlock(>lock);
> > >
> > > But the problem is, it looks to me there could be a case that airq is
> > > not used, (virtio_ccw_int_hander()). That's why the patch use a
> > > spinlock, it could be optimized with using a rwlock as well.
> > >
> > > Thanks
> >
> > Ah, right. So let's take that on the legacy path too and Halil promises
> > to test to make sure performance isn't impacted too badly?
> 
> I think what you meant is using a dedicated rwlock instead of trying
> to reuse one of the airq_info locks.
> 
> If this is true, it should be fine.
> 
> Thanks

yes

> >
> > > >
> > > > --
> > > > MST
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-27 Thread Michael S. Tsirkin
On Thu, Apr 28, 2022 at 01:51:59PM +0800, Jason Wang wrote:
> On Thu, Apr 28, 2022 at 1:24 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Apr 28, 2022 at 11:04:41AM +0800, Jason Wang wrote:
> > > > But my guess is that rwlock + some testing for the legacy indicator case
> > > > just to double check if there is a heavy regression despite of our
> > > > expectations to see none should do the trick.
> > >
> > > I suggest this, rwlock (for not airq) seems better than spinlock, but
> > > at worst case it will cause cache line bouncing. But I wonder if it's
> > > noticeable (anyhow it has been used for airq).
> > >
> > > Thanks
> >
> > Which existing rwlock does airq use right now? Can we take it to sync?
> 
> It's the rwlock in airq_info, it has already been used in this patch.
> 
> write_lock(>lock);
> write_unlock(>lock);
> 
> But the problem is, it looks to me there could be a case that airq is
> not used, (virtio_ccw_int_hander()). That's why the patch use a
> spinlock, it could be optimized with using a rwlock as well.
> 
> Thanks

Ah, right. So let's take that on the legacy path too and Halil promises
to test to make sure performance isn't impacted too badly?

> >
> > --
> > MST
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-27 Thread Michael S. Tsirkin
On Thu, Apr 28, 2022 at 11:04:41AM +0800, Jason Wang wrote:
> > But my guess is that rwlock + some testing for the legacy indicator case
> > just to double check if there is a heavy regression despite of our
> > expectations to see none should do the trick.
> 
> I suggest this, rwlock (for not airq) seems better than spinlock, but
> at worst case it will cause cache line bouncing. But I wonder if it's
> noticeable (anyhow it has been used for airq).
> 
> Thanks

Which existing rwlock does airq use right now? Can we take it to sync?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-27 Thread Michael S. Tsirkin
On Thu, Apr 28, 2022 at 04:43:15AM +0200, Halil Pasic wrote:
> On Wed, 27 Apr 2022 11:27:03 +0200
> Cornelia Huck  wrote:
> 
> > On Tue, Apr 26 2022, "Michael S. Tsirkin"  wrote:
> > 
> > > On Tue, Apr 26, 2022 at 05:47:17PM +0200, Cornelia Huck wrote:  
> > >> On Mon, Apr 25 2022, "Michael S. Tsirkin"  wrote:
> > >>   
> > >> > On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote:  
> > >> >> On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote:  
> > >> >> > 
> > >> >> > 在 2022/4/26 11:38, Michael S. Tsirkin 写道:  
> > >> >> > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin 
> > >> >> > > wrote:  
> > >> >> > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:  
> > >> >> > > > > On Mon, 25 Apr 2022 09:59:55 -0400
> > >> >> > > > > "Michael S. Tsirkin"  wrote:
> > >> >> > > > >   
> > >> >> > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck 
> > >> >> > > > > > wrote:  
> > >> >> > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin" 
> > >> >> > > > > > >  wrote:  
> > >> >> > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang 
> > >> >> > > > > > > > wrote:  
> > >> >> > > > > > > > > This patch tries to implement the synchronize_cbs() 
> > >> >> > > > > > > > > for ccw. For the
> > >> >> > > > > > > > > vring_interrupt() that is called via 
> > >> >> > > > > > > > > virtio_airq_handler(), the
> > >> >> > > > > > > > > synchronization is simply done via the airq_info's 
> > >> >> > > > > > > > > lock. For the
> > >> >> > > > > > > > > vring_interrupt() that is called via 
> > >> >> > > > > > > > > virtio_ccw_int_handler(), a per
> > >> >> > > > > > > > > device spinlock for irq is introduced ans used in the 
> > >> >> > > > > > > > > synchronization
> > >> >> > > > > > > > > method.
> > >> >> > > > > > > > > 
> > >> >> > > > > > > > > Cc: Thomas Gleixner 
> > >> >> > > > > > > > > Cc: Peter Zijlstra 
> > >> >> > > > > > > > > Cc: "Paul E. McKenney" 
> > >> >> > > > > > > > > Cc: Marc Zyngier 
> > >> >> > > > > > > > > Cc: Halil Pasic 
> > >> >> > > > > > > > > Cc: Cornelia Huck 
> > >> >> > > > > > > > > Signed-off-by: Jason Wang   
> > >> >> > > > > > > > 
> > >> >> > > > > > > > This is the only one that is giving me pause. Halil, 
> > >> >> > > > > > > > Cornelia,
> > >> >> > > > > > > > should we be concerned about the performance impact 
> > >> >> > > > > > > > here?
> > >> >> > > > > > > > Any chance it can be tested?  
> > >> >> > > > > > > We can have a bunch of devices using the same airq 
> > >> >> > > > > > > structure, and the
> > >> >> > > > > > > sync cb creates a choke point, same as 
> > >> >> > > > > > > registering/unregistering.  
> > >> >> > > > > > BTW can callbacks for multiple VQs run on multiple CPUs at 
> > >> >> > > > > > the moment?  
> > >> >> > > > > I'm not sure I understand the question.
> > >> >> > > > > 
> > >> >> > > > > I do think we can have multiple CPUs that are executing some 
> > >> >> > > > > portion of
> > >> >> > > > > virtio_ccw_int_handler(). So I guess the answer is y

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-27 Thread Michael S. Tsirkin
On Wed, Apr 27, 2022 at 03:57:57PM +0800, Jason Wang wrote:
> On Wed, Apr 27, 2022 at 2:30 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Apr 27, 2022 at 11:53:25AM +0800, Jason Wang wrote:
> > > On Tue, Apr 26, 2022 at 2:30 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Apr 26, 2022 at 12:07:39PM +0800, Jason Wang wrote:
> > > > > On Tue, Apr 26, 2022 at 11:55 AM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote:
> > > > > > > >
> > > > > > > > 在 2022/4/26 11:38, Michael S. Tsirkin 写道:
> > > > > > > > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin 
> > > > > > > > > wrote:
> > > > > > > > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> > > > > > > > > > > On Mon, 25 Apr 2022 09:59:55 -0400
> > > > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin" 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason 
> > > > > > > > > > > > > > Wang wrote:
> > > > > > > > > > > > > > > This patch tries to implement the 
> > > > > > > > > > > > > > > synchronize_cbs() for ccw. For the
> > > > > > > > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > > > > > > > virtio_airq_handler(), the
> > > > > > > > > > > > > > > synchronization is simply done via the 
> > > > > > > > > > > > > > > airq_info's lock. For the
> > > > > > > > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > > > > > > > virtio_ccw_int_handler(), a per
> > > > > > > > > > > > > > > device spinlock for irq is introduced ans used in 
> > > > > > > > > > > > > > > the synchronization
> > > > > > > > > > > > > > > method.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > > > > > > > > Cc: Marc Zyngier 
> > > > > > > > > > > > > > > Cc: Halil Pasic 
> > > > > > > > > > > > > > > Cc: Cornelia Huck 
> > > > > > > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This is the only one that is giving me pause. 
> > > > > > > > > > > > > > Halil, Cornelia,
> > > > > > > > > > > > > > should we be concerned about the performance impact 
> > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > Any chance it can be tested?
> > > > > > > > > > > > > We can have a bunch of devices using the same airq 
> > > > > > > > > > > > > structure, and the
> > > > > > > > > > > > > sync cb creates a choke point, same as 
> > > > > > > > > > > > > registering/unregistering.
> > > > > > > > > > > > BTW can callbacks for multiple VQs run on multiple CPUs 
> > > > > > > > >

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-27 Thread Michael S. Tsirkin
On Wed, Apr 27, 2022 at 11:53:25AM +0800, Jason Wang wrote:
> On Tue, Apr 26, 2022 at 2:30 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Apr 26, 2022 at 12:07:39PM +0800, Jason Wang wrote:
> > > On Tue, Apr 26, 2022 at 11:55 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote:
> > > > > >
> > > > > > 在 2022/4/26 11:38, Michael S. Tsirkin 写道:
> > > > > > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> > > > > > > > > On Mon, 25 Apr 2022 09:59:55 -0400
> > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin" 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > This patch tries to implement the synchronize_cbs() 
> > > > > > > > > > > > > for ccw. For the
> > > > > > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > > > > > virtio_airq_handler(), the
> > > > > > > > > > > > > synchronization is simply done via the airq_info's 
> > > > > > > > > > > > > lock. For the
> > > > > > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > > > > > virtio_ccw_int_handler(), a per
> > > > > > > > > > > > > device spinlock for irq is introduced ans used in the 
> > > > > > > > > > > > > synchronization
> > > > > > > > > > > > > method.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > > > > > > Cc: Marc Zyngier 
> > > > > > > > > > > > > Cc: Halil Pasic 
> > > > > > > > > > > > > Cc: Cornelia Huck 
> > > > > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > > > > >
> > > > > > > > > > > > This is the only one that is giving me pause. Halil, 
> > > > > > > > > > > > Cornelia,
> > > > > > > > > > > > should we be concerned about the performance impact 
> > > > > > > > > > > > here?
> > > > > > > > > > > > Any chance it can be tested?
> > > > > > > > > > > We can have a bunch of devices using the same airq 
> > > > > > > > > > > structure, and the
> > > > > > > > > > > sync cb creates a choke point, same as 
> > > > > > > > > > > registering/unregistering.
> > > > > > > > > > BTW can callbacks for multiple VQs run on multiple CPUs at 
> > > > > > > > > > the moment?
> > > > > > > > > I'm not sure I understand the question.
> > > > > > > > >
> > > > > > > > > I do think we can have multiple CPUs that are executing some 
> > > > > > > > > portion of
> > > > > > > > > virtio_ccw_int_handler(). So I guess the answer is yes. 
> > > > > > > > > Connie what do you think?
> > > > > > > > >
> > > > > > > > > On the other hand we could also end up serializing 
> > > > > > > > > synchr

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-27 Thread Michael S. Tsirkin
On Wed, Apr 27, 2022 at 11:53:25AM +0800, Jason Wang wrote:
> On Tue, Apr 26, 2022 at 2:30 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Apr 26, 2022 at 12:07:39PM +0800, Jason Wang wrote:
> > > On Tue, Apr 26, 2022 at 11:55 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote:
> > > > > >
> > > > > > 在 2022/4/26 11:38, Michael S. Tsirkin 写道:
> > > > > > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> > > > > > > > > On Mon, 25 Apr 2022 09:59:55 -0400
> > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin" 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > This patch tries to implement the synchronize_cbs() 
> > > > > > > > > > > > > for ccw. For the
> > > > > > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > > > > > virtio_airq_handler(), the
> > > > > > > > > > > > > synchronization is simply done via the airq_info's 
> > > > > > > > > > > > > lock. For the
> > > > > > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > > > > > virtio_ccw_int_handler(), a per
> > > > > > > > > > > > > device spinlock for irq is introduced ans used in the 
> > > > > > > > > > > > > synchronization
> > > > > > > > > > > > > method.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > > > > > > Cc: Marc Zyngier 
> > > > > > > > > > > > > Cc: Halil Pasic 
> > > > > > > > > > > > > Cc: Cornelia Huck 
> > > > > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > > > > >
> > > > > > > > > > > > This is the only one that is giving me pause. Halil, 
> > > > > > > > > > > > Cornelia,
> > > > > > > > > > > > should we be concerned about the performance impact 
> > > > > > > > > > > > here?
> > > > > > > > > > > > Any chance it can be tested?
> > > > > > > > > > > We can have a bunch of devices using the same airq 
> > > > > > > > > > > structure, and the
> > > > > > > > > > > sync cb creates a choke point, same as 
> > > > > > > > > > > registering/unregistering.
> > > > > > > > > > BTW can callbacks for multiple VQs run on multiple CPUs at 
> > > > > > > > > > the moment?
> > > > > > > > > I'm not sure I understand the question.
> > > > > > > > >
> > > > > > > > > I do think we can have multiple CPUs that are executing some 
> > > > > > > > > portion of
> > > > > > > > > virtio_ccw_int_handler(). So I guess the answer is yes. 
> > > > > > > > > Connie what do you think?
> > > > > > > > >
> > > > > > > > > On the other hand we could also end up serializing 
> > > > > > > > > synchro

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-26 Thread Michael S. Tsirkin
On Tue, Apr 26, 2022 at 05:47:17PM +0200, Cornelia Huck wrote:
> On Mon, Apr 25 2022, "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote:
> >> On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote:
> >> > 
> >> > 在 2022/4/26 11:38, Michael S. Tsirkin 写道:
> >> > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin wrote:
> >> > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> >> > > > > On Mon, 25 Apr 2022 09:59:55 -0400
> >> > > > > "Michael S. Tsirkin"  wrote:
> >> > > > > 
> >> > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote:
> >> > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin"  
> >> > > > > > > wrote:
> >> > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote:
> >> > > > > > > > > This patch tries to implement the synchronize_cbs() for 
> >> > > > > > > > > ccw. For the
> >> > > > > > > > > vring_interrupt() that is called via 
> >> > > > > > > > > virtio_airq_handler(), the
> >> > > > > > > > > synchronization is simply done via the airq_info's lock. 
> >> > > > > > > > > For the
> >> > > > > > > > > vring_interrupt() that is called via 
> >> > > > > > > > > virtio_ccw_int_handler(), a per
> >> > > > > > > > > device spinlock for irq is introduced ans used in the 
> >> > > > > > > > > synchronization
> >> > > > > > > > > method.
> >> > > > > > > > > 
> >> > > > > > > > > Cc: Thomas Gleixner 
> >> > > > > > > > > Cc: Peter Zijlstra 
> >> > > > > > > > > Cc: "Paul E. McKenney" 
> >> > > > > > > > > Cc: Marc Zyngier 
> >> > > > > > > > > Cc: Halil Pasic 
> >> > > > > > > > > Cc: Cornelia Huck 
> >> > > > > > > > > Signed-off-by: Jason Wang 
> >> > > > > > > > 
> >> > > > > > > > This is the only one that is giving me pause. Halil, 
> >> > > > > > > > Cornelia,
> >> > > > > > > > should we be concerned about the performance impact here?
> >> > > > > > > > Any chance it can be tested?
> >> > > > > > > We can have a bunch of devices using the same airq structure, 
> >> > > > > > > and the
> >> > > > > > > sync cb creates a choke point, same as 
> >> > > > > > > registering/unregistering.
> >> > > > > > BTW can callbacks for multiple VQs run on multiple CPUs at the 
> >> > > > > > moment?
> >> > > > > I'm not sure I understand the question.
> >> > > > > 
> >> > > > > I do think we can have multiple CPUs that are executing some 
> >> > > > > portion of
> >> > > > > virtio_ccw_int_handler(). So I guess the answer is yes. Connie 
> >> > > > > what do you think?
> >> > > > > 
> >> > > > > On the other hand we could also end up serializing 
> >> > > > > synchronize_cbs()
> >> > > > > calls for different devices if they happen to use the same 
> >> > > > > airq_info. But
> >> > > > > this probably was not your question
> >> > > > 
> >> > > > I am less concerned about  synchronize_cbs being slow and more about
> >> > > > the slowdown in interrupt processing itself.
> >> > > > 
> >> > > > > > this patch serializes them on a spinlock.
> >> > > > > > 
> >> > > > > Those could then pile up on the newly introduced spinlock.
> 
> How bad would that be in practice? IIUC, we hit on the spinlock when
> - doing synchronize_cbs (should be rare)
> - processing queue interrupts for devices using per-device indicators
>   (which is the non-preferred path, which I would basically only expect
>   when running on an ancient or non-standard hypervisor)

this one is my concern. I am worried serializing everything on a single lock
will drastically regress performance here.


> - configuration change interrupts (should be rare)
> - during setup, reset, etc. (should not be a concern)
> 
> >> > > > > 
> >> > > > > Regards,
> >> > > > > Halil
> >> > > > Hmm yea ... not good.
> >> > > Is there any other way to synchronize with all callbacks?
> >> > 
> >> > 
> >> > Maybe using rwlock as airq handler?
> >> > 
> >> > Thanks
> >> > 
> >> 
> >> rwlock is still a shared cacheline bouncing between CPUs and
> >> a bunch of ordering instructions.
> >> Maybe something per-cpu + some IPIs to run things on all CPUs instead?
> >
> > ... and I think classic and device interrupts are different enough
> > here ...
> 
> You mean classic (per-device) and adapter interrupts, right?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vduse: Fix NULL pointer dereference on sysfs access

2022-04-26 Thread Michael S. Tsirkin
On Tue, Apr 26, 2022 at 10:37:17PM +0800, Yongji Xie wrote:
> On Tue, Apr 26, 2022 at 10:18 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Apr 26, 2022 at 10:02:02PM +0800, Yongji Xie wrote:
> > > > This should not be needed, when your module is unloaded, all devices it
> > > > handled should be properly removed by it.
> > > >
> > >
> > > I see. But it's not easy to achieve that currently. Maybe we need
> > > something like DEVICE_NEEDS_RESET support in virtio core.
> >
> > Not sure what the connection is.
> >
> 
> If we want to force remove all working vduse devices during module
> unload, we might need to send a DEVICE_NEEDS_RESET notification to
> device driver to do some cleanup before, e.g., return error for all
> inflight I/Os.
> 
> Thanks,
> Yongji

IMHO DEVICE_NEEDS_RESET won't help much with that, it's more in
case device is still there but needs a reset to start working.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] vduse: Fix NULL pointer dereference on sysfs access

2022-04-26 Thread Michael S. Tsirkin
On Tue, Apr 26, 2022 at 10:02:02PM +0800, Yongji Xie wrote:
> > This should not be needed, when your module is unloaded, all devices it
> > handled should be properly removed by it.
> >
> 
> I see. But it's not easy to achieve that currently. Maybe we need
> something like DEVICE_NEEDS_RESET support in virtio core.

Not sure what the connection is.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v9 00/32] virtio pci support VIRTIO_F_RING_RESET (refactor vring)

2022-04-26 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 11:43:14AM +0800, Xuan Zhuo wrote:
> The virtio spec already supports the virtio queue reset function. This patch 
> set
> is to add this function to the kernel. The relevant virtio spec information is
> here:
> 
> https://github.com/oasis-tcs/virtio-spec/issues/124
> 
> Also regarding MMIO support for queue reset, I plan to support it after this
> patch is passed.

Regarding the spec, there's now an issue proposing
some changes to the interface. What do you think about that
proposal? Could you respond on that thread on the virtio TC mailing list?


> This patch set implements the refactoring of vring. Finally, the
> virtuque_resize() interface is provided based on the reset function of the
> transport layer.
> 
> Test environment:
> Host: 4.19.91
> Qemu: QEMU emulator version 6.2.50 (with vq reset support)
> Test Cmd:  ethtool -G eth1 rx $1 tx $2; ethtool -g eth1
> 
> The default is split mode, modify Qemu virtio-net to add PACKED feature 
> to test
> packed mode.
> 
> Qemu code:
> 
> https://github.com/fengidri/qemu/compare/89f3bfa3265554d1d591ee4d7f1197b6e3397e84...master
> 
> In order to simplify the review of this patch set, the function of reusing
> the old buffers after resize will be introduced in subsequent patch sets.
> 
> Please review. Thanks.
> 
> v9:
>   1. Provide a virtqueue_resize() interface directly
>   2. A patch set including vring resize, virtio pci reset, virtio-net resize
>   3. No more separate structs
> 
> v8:
>   1. Provide a virtqueue_reset() interface directly
>   2. Split the two patch sets, this is the first part
>   3. Add independent allocation helper for allocating state, extra
> 
> v7:
>   1. fix #6 subject typo
>   2. fix #6 ring_size_in_bytes is uninitialized
>   3. check by: make W=12
> 
> v6:
>   1. virtio_pci: use synchronize_irq(irq) to sync the irq callbacks
>   2. Introduce virtqueue_reset_vring() to implement the reset of vring during
>  the reset process. May use the old vring if num of the vq not change.
>   3. find_vqs() support sizes to special the max size of each vq
> 
> v5:
>   1. add virtio-net support set_ringparam
> 
> v4:
>   1. just the code of virtio, without virtio-net
>   2. Performing reset on a queue is divided into these steps:
> 1. reset_vq: reset one vq
> 2. recycle the buffer from vq by virtqueue_detach_unused_buf()
> 3. release the ring of the vq by vring_release_virtqueue()
> 4. enable_reset_vq: re-enable the reset queue
>   3. Simplify the parameters of enable_reset_vq()
>   4. add container structures for virtio_pci_common_cfg
> 
> v3:
>   1. keep vq, irq unreleased
> 
> Xuan Zhuo (32):
>   virtio: add helper virtqueue_get_vring_max_size()
>   virtio: struct virtio_config_ops add callbacks for queue_reset
>   virtio_ring: update the document of the virtqueue_detach_unused_buf
> for queue reset
>   virtio_ring: remove the arg vq of vring_alloc_desc_extra()
>   virtio_ring: extract the logic of freeing vring
>   virtio_ring: split: extract the logic of alloc queue
>   virtio_ring: split: extract the logic of alloc state and extra
>   virtio_ring: split: extract the logic of attach vring
>   virtio_ring: split: extract the logic of vq init
>   virtio_ring: split: introduce virtqueue_reinit_split()
>   virtio_ring: split: introduce virtqueue_resize_split()
>   virtio_ring: packed: extract the logic of alloc queue
>   virtio_ring: packed: extract the logic of alloc state and extra
>   virtio_ring: packed: extract the logic of attach vring
>   virtio_ring: packed: extract the logic of vq init
>   virtio_ring: packed: introduce virtqueue_reinit_packed()
>   virtio_ring: packed: introduce virtqueue_resize_packed()
>   virtio_ring: introduce virtqueue_resize()
>   virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
>   virtio: queue_reset: add VIRTIO_F_RING_RESET
>   virtio_pci: queue_reset: update struct virtio_pci_common_cfg and
> option functions
>   virtio_pci: queue_reset: extract the logic of active vq for modern pci
>   virtio_pci: queue_reset: support VIRTIO_F_RING_RESET
>   virtio: find_vqs() add arg sizes
>   virtio_pci: support the arg sizes of find_vqs()
>   virtio_mmio: support the arg sizes of find_vqs()
>   virtio: add helper virtio_find_vqs_ctx_size()
>   virtio_net: set the default max ring size by find_vqs()
>   virtio_net: get ringparam by virtqueue_get_vring_max_size()
>   virtio_net: split free_unused_bufs()
>   virtio_net: support rx/tx queue resize
>   virtio_net: support set_ringparam
> 
>  arch/um/drivers/virtio_uml.c |   3 +-
>  drivers/net/virtio_net.c | 219 +++-
>  drivers/platform/mellanox/mlxbf-tmfifo.c |   3 +
>  drivers/remoteproc/remoteproc_virtio.c   |   3 +
>  drivers/s390/virtio/virtio_ccw.c |   4 +
>  drivers/virtio/virtio_mmio.c |  11 +-
>  drivers/virtio/virtio_pci_common.c   |  28 +-
>  drivers/virtio/virtio_pci_common.h   |   3 +-
>  

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-26 Thread Michael S. Tsirkin
On Tue, Apr 26, 2022 at 12:07:39PM +0800, Jason Wang wrote:
> On Tue, Apr 26, 2022 at 11:55 AM Michael S. Tsirkin  wrote:
> >
> > On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/4/26 11:38, Michael S. Tsirkin 写道:
> > > > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> > > > > > > On Mon, 25 Apr 2022 09:59:55 -0400
> > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > >
> > > > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote:
> > > > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin"  
> > > > > > > > > wrote:
> > > > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote:
> > > > > > > > > > > This patch tries to implement the synchronize_cbs() for 
> > > > > > > > > > > ccw. For the
> > > > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > > > virtio_airq_handler(), the
> > > > > > > > > > > synchronization is simply done via the airq_info's lock. 
> > > > > > > > > > > For the
> > > > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > > > virtio_ccw_int_handler(), a per
> > > > > > > > > > > device spinlock for irq is introduced ans used in the 
> > > > > > > > > > > synchronization
> > > > > > > > > > > method.
> > > > > > > > > > >
> > > > > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > > > > Cc: Marc Zyngier 
> > > > > > > > > > > Cc: Halil Pasic 
> > > > > > > > > > > Cc: Cornelia Huck 
> > > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > > >
> > > > > > > > > > This is the only one that is giving me pause. Halil, 
> > > > > > > > > > Cornelia,
> > > > > > > > > > should we be concerned about the performance impact here?
> > > > > > > > > > Any chance it can be tested?
> > > > > > > > > We can have a bunch of devices using the same airq structure, 
> > > > > > > > > and the
> > > > > > > > > sync cb creates a choke point, same as 
> > > > > > > > > registering/unregistering.
> > > > > > > > BTW can callbacks for multiple VQs run on multiple CPUs at the 
> > > > > > > > moment?
> > > > > > > I'm not sure I understand the question.
> > > > > > >
> > > > > > > I do think we can have multiple CPUs that are executing some 
> > > > > > > portion of
> > > > > > > virtio_ccw_int_handler(). So I guess the answer is yes. Connie 
> > > > > > > what do you think?
> > > > > > >
> > > > > > > On the other hand we could also end up serializing 
> > > > > > > synchronize_cbs()
> > > > > > > calls for different devices if they happen to use the same 
> > > > > > > airq_info. But
> > > > > > > this probably was not your question
> > > > > >
> > > > > > I am less concerned about  synchronize_cbs being slow and more about
> > > > > > the slowdown in interrupt processing itself.
> > > > > >
> > > > > > > > this patch serializes them on a spinlock.
> > > > > > > >
> > > > > > > Those could then pile up on the newly introduced spinlock.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Halil
> > > > > > Hmm yea ... not good.
> > > > > Is there any other way to synchronize with all callbacks?
> > > >
> > > >
> > > > Maybe using rwlock as airq handler?
> > > >
> > > > Thanks
> > > >
> > >
> > > rwlock is still a shared cacheline bouncing between CPUs and
> > > a bunch of ordering instructions.
> 
> Yes, but it should be faster than spinlocks anyhow.
> 
> > > Maybe something per-cpu + some IPIs to run things on all CPUs instead?
> 
> Is this something like a customized version of synchronzie_rcu_expedited()?

With interrupts running in an RCU read size critical section?
Quite possibly that is also an option.
This will need a bunch of documentation since this is not
a standard use of RCU, and probably get a confirmation
from RCU maintainers that whatever assumptions we make
are guaranteed to hold down the road.

> >
> > ... and I think classic and device interrupts are different enough
> > here ...
> 
> Yes.
> 
> Thanks
> 
> >
> > > > >
> > > > > > --
> > > > > > MST
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-25 Thread Michael S. Tsirkin
On Mon, Apr 25, 2022 at 11:53:24PM -0400, Michael S. Tsirkin wrote:
> On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote:
> > 
> > 在 2022/4/26 11:38, Michael S. Tsirkin 写道:
> > > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> > > > > On Mon, 25 Apr 2022 09:59:55 -0400
> > > > > "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote:
> > > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin"  wrote:
> > > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote:
> > > > > > > > > This patch tries to implement the synchronize_cbs() for ccw. 
> > > > > > > > > For the
> > > > > > > > > vring_interrupt() that is called via virtio_airq_handler(), 
> > > > > > > > > the
> > > > > > > > > synchronization is simply done via the airq_info's lock. For 
> > > > > > > > > the
> > > > > > > > > vring_interrupt() that is called via 
> > > > > > > > > virtio_ccw_int_handler(), a per
> > > > > > > > > device spinlock for irq is introduced ans used in the 
> > > > > > > > > synchronization
> > > > > > > > > method.
> > > > > > > > > 
> > > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > > Cc: Marc Zyngier 
> > > > > > > > > Cc: Halil Pasic 
> > > > > > > > > Cc: Cornelia Huck 
> > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > 
> > > > > > > > This is the only one that is giving me pause. Halil, Cornelia,
> > > > > > > > should we be concerned about the performance impact here?
> > > > > > > > Any chance it can be tested?
> > > > > > > We can have a bunch of devices using the same airq structure, and 
> > > > > > > the
> > > > > > > sync cb creates a choke point, same as registering/unregistering.
> > > > > > BTW can callbacks for multiple VQs run on multiple CPUs at the 
> > > > > > moment?
> > > > > I'm not sure I understand the question.
> > > > > 
> > > > > I do think we can have multiple CPUs that are executing some portion 
> > > > > of
> > > > > virtio_ccw_int_handler(). So I guess the answer is yes. Connie what 
> > > > > do you think?
> > > > > 
> > > > > On the other hand we could also end up serializing synchronize_cbs()
> > > > > calls for different devices if they happen to use the same airq_info. 
> > > > > But
> > > > > this probably was not your question
> > > > 
> > > > I am less concerned about  synchronize_cbs being slow and more about
> > > > the slowdown in interrupt processing itself.
> > > > 
> > > > > > this patch serializes them on a spinlock.
> > > > > > 
> > > > > Those could then pile up on the newly introduced spinlock.
> > > > > 
> > > > > Regards,
> > > > > Halil
> > > > Hmm yea ... not good.
> > > Is there any other way to synchronize with all callbacks?
> > 
> > 
> > Maybe using rwlock as airq handler?
> > 
> > Thanks
> > 
> 
> rwlock is still a shared cacheline bouncing between CPUs and
> a bunch of ordering instructions.
> Maybe something per-cpu + some IPIs to run things on all CPUs instead?

... and I think classic and device interrupts are different enough
here ...

> > > 
> > > > -- 
> > > > MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-25 Thread Michael S. Tsirkin
On Tue, Apr 26, 2022 at 11:42:45AM +0800, Jason Wang wrote:
> 
> 在 2022/4/26 11:38, Michael S. Tsirkin 写道:
> > On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> > > > On Mon, 25 Apr 2022 09:59:55 -0400
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote:
> > > > > > On Mon, Apr 25 2022, "Michael S. Tsirkin"  wrote:
> > > > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote:
> > > > > > > > This patch tries to implement the synchronize_cbs() for ccw. 
> > > > > > > > For the
> > > > > > > > vring_interrupt() that is called via virtio_airq_handler(), the
> > > > > > > > synchronization is simply done via the airq_info's lock. For the
> > > > > > > > vring_interrupt() that is called via virtio_ccw_int_handler(), 
> > > > > > > > a per
> > > > > > > > device spinlock for irq is introduced ans used in the 
> > > > > > > > synchronization
> > > > > > > > method.
> > > > > > > > 
> > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > Cc: Marc Zyngier 
> > > > > > > > Cc: Halil Pasic 
> > > > > > > > Cc: Cornelia Huck 
> > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > 
> > > > > > > This is the only one that is giving me pause. Halil, Cornelia,
> > > > > > > should we be concerned about the performance impact here?
> > > > > > > Any chance it can be tested?
> > > > > > We can have a bunch of devices using the same airq structure, and 
> > > > > > the
> > > > > > sync cb creates a choke point, same as registering/unregistering.
> > > > > BTW can callbacks for multiple VQs run on multiple CPUs at the moment?
> > > > I'm not sure I understand the question.
> > > > 
> > > > I do think we can have multiple CPUs that are executing some portion of
> > > > virtio_ccw_int_handler(). So I guess the answer is yes. Connie what do 
> > > > you think?
> > > > 
> > > > On the other hand we could also end up serializing synchronize_cbs()
> > > > calls for different devices if they happen to use the same airq_info. 
> > > > But
> > > > this probably was not your question
> > > 
> > > I am less concerned about  synchronize_cbs being slow and more about
> > > the slowdown in interrupt processing itself.
> > > 
> > > > > this patch serializes them on a spinlock.
> > > > > 
> > > > Those could then pile up on the newly introduced spinlock.
> > > > 
> > > > Regards,
> > > > Halil
> > > Hmm yea ... not good.
> > Is there any other way to synchronize with all callbacks?
> 
> 
> Maybe using rwlock as airq handler?
> 
> Thanks
> 

rwlock is still a shared cacheline bouncing between CPUs and
a bunch of ordering instructions.
Maybe something per-cpu + some IPIs to run things on all CPUs instead?

> > 
> > > -- 
> > > MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-25 Thread Michael S. Tsirkin
On Mon, Apr 25, 2022 at 11:35:41PM -0400, Michael S. Tsirkin wrote:
> On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> > On Mon, 25 Apr 2022 09:59:55 -0400
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote:
> > > > On Mon, Apr 25 2022, "Michael S. Tsirkin"  wrote:
> > > >   
> > > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote:  
> > > > >> This patch tries to implement the synchronize_cbs() for ccw. For the
> > > > >> vring_interrupt() that is called via virtio_airq_handler(), the
> > > > >> synchronization is simply done via the airq_info's lock. For the
> > > > >> vring_interrupt() that is called via virtio_ccw_int_handler(), a per
> > > > >> device spinlock for irq is introduced ans used in the synchronization
> > > > >> method.
> > > > >> 
> > > > >> Cc: Thomas Gleixner 
> > > > >> Cc: Peter Zijlstra 
> > > > >> Cc: "Paul E. McKenney" 
> > > > >> Cc: Marc Zyngier 
> > > > >> Cc: Halil Pasic 
> > > > >> Cc: Cornelia Huck 
> > > > >> Signed-off-by: Jason Wang   
> > > > >
> > > > >
> > > > > This is the only one that is giving me pause. Halil, Cornelia,
> > > > > should we be concerned about the performance impact here?
> > > > > Any chance it can be tested?  
> > > > 
> > > > We can have a bunch of devices using the same airq structure, and the
> > > > sync cb creates a choke point, same as registering/unregistering.  
> > > 
> > > BTW can callbacks for multiple VQs run on multiple CPUs at the moment?
> > 
> > I'm not sure I understand the question.
> > 
> > I do think we can have multiple CPUs that are executing some portion of
> > virtio_ccw_int_handler(). So I guess the answer is yes. Connie what do you 
> > think?
> > 
> > On the other hand we could also end up serializing synchronize_cbs()
> > calls for different devices if they happen to use the same airq_info. But
> > this probably was not your question
> 
> 
> I am less concerned about  synchronize_cbs being slow and more about
> the slowdown in interrupt processing itself.
> 
> > > this patch serializes them on a spinlock.
> > >
> > 
> > Those could then pile up on the newly introduced spinlock.
> > 
> > Regards,
> > Halil
> 
> Hmm yea ... not good.

Is there any other way to synchronize with all callbacks?

> -- 
> MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-25 Thread Michael S. Tsirkin
On Tue, Apr 26, 2022 at 04:29:11AM +0200, Halil Pasic wrote:
> On Mon, 25 Apr 2022 09:59:55 -0400
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote:
> > > On Mon, Apr 25 2022, "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote:  
> > > >> This patch tries to implement the synchronize_cbs() for ccw. For the
> > > >> vring_interrupt() that is called via virtio_airq_handler(), the
> > > >> synchronization is simply done via the airq_info's lock. For the
> > > >> vring_interrupt() that is called via virtio_ccw_int_handler(), a per
> > > >> device spinlock for irq is introduced ans used in the synchronization
> > > >> method.
> > > >> 
> > > >> Cc: Thomas Gleixner 
> > > >> Cc: Peter Zijlstra 
> > > >> Cc: "Paul E. McKenney" 
> > > >> Cc: Marc Zyngier 
> > > >> Cc: Halil Pasic 
> > > >> Cc: Cornelia Huck 
> > > >> Signed-off-by: Jason Wang   
> > > >
> > > >
> > > > This is the only one that is giving me pause. Halil, Cornelia,
> > > > should we be concerned about the performance impact here?
> > > > Any chance it can be tested?  
> > > 
> > > We can have a bunch of devices using the same airq structure, and the
> > > sync cb creates a choke point, same as registering/unregistering.  
> > 
> > BTW can callbacks for multiple VQs run on multiple CPUs at the moment?
> 
> I'm not sure I understand the question.
> 
> I do think we can have multiple CPUs that are executing some portion of
> virtio_ccw_int_handler(). So I guess the answer is yes. Connie what do you 
> think?
> 
> On the other hand we could also end up serializing synchronize_cbs()
> calls for different devices if they happen to use the same airq_info. But
> this probably was not your question


I am less concerned about  synchronize_cbs being slow and more about
the slowdown in interrupt processing itself.

> > this patch serializes them on a spinlock.
> >
> 
> Those could then pile up on the newly introduced spinlock.
> 
> Regards,
> Halil

Hmm yea ... not good.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3 7/9] virtio: allow to unbreak virtqueue

2022-04-25 Thread Michael S. Tsirkin
On Mon, Apr 25, 2022 at 02:44:06PM +0200, Cornelia Huck wrote:
> On Mon, Apr 25 2022, Jason Wang  wrote:
> 
> > This patch allows the virtio_break_device() to accept a boolean value
> > then we can unbreak the virtqueue.
> >
> > Signed-off-by: Jason Wang 
> > ---
> >  drivers/char/virtio_console.c  | 2 +-
> >  drivers/crypto/virtio/virtio_crypto_core.c | 2 +-
> >  drivers/s390/virtio/virtio_ccw.c   | 4 ++--
> >  drivers/virtio/virtio_pci_common.c | 2 +-
> >  drivers/virtio/virtio_ring.c   | 4 ++--
> >  include/linux/virtio.h | 2 +-
> >  6 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index e3c430539a17..afede977f7b3 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -1958,7 +1958,7 @@ static void virtcons_remove(struct virtio_device 
> > *vdev)
> > spin_unlock_irq(_lock);
> >  
> > /* Device is going away, exit any polling for buffers */
> > -   virtio_break_device(vdev);
> > +   virtio_break_device(vdev, true);
> > if (use_multiport(portdev))
> > flush_work(>control_work);
> > else
> > diff --git a/drivers/crypto/virtio/virtio_crypto_core.c 
> > b/drivers/crypto/virtio/virtio_crypto_core.c
> > index c6f482db0bc0..fd17f3f2e958 100644
> > --- a/drivers/crypto/virtio/virtio_crypto_core.c
> > +++ b/drivers/crypto/virtio/virtio_crypto_core.c
> > @@ -215,7 +215,7 @@ static int virtcrypto_update_status(struct 
> > virtio_crypto *vcrypto)
> > dev_warn(>vdev->dev,
> > "Unknown status bits: 0x%x\n", status);
> >  
> > -   virtio_break_device(vcrypto->vdev);
> > +   virtio_break_device(vcrypto->vdev, true);
> > return -EPERM;
> > }
> >  
> > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> > b/drivers/s390/virtio/virtio_ccw.c
> > index c19f07a82d62..9a963f5af5b5 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -1211,7 +1211,7 @@ static void virtio_ccw_remove(struct ccw_device *cdev)
> >  
> > if (vcdev && cdev->online) {
> > if (vcdev->device_lost)
> > -   virtio_break_device(>vdev);
> > +   virtio_break_device(>vdev, true);
> > unregister_virtio_device(>vdev);
> > spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
> > dev_set_drvdata(>dev, NULL);
> > @@ -1228,7 +1228,7 @@ static int virtio_ccw_offline(struct ccw_device *cdev)
> > if (!vcdev)
> > return 0;
> > if (vcdev->device_lost)
> > -   virtio_break_device(>vdev);
> > +   virtio_break_device(>vdev, true);
> > unregister_virtio_device(>vdev);
> > spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
> > dev_set_drvdata(>dev, NULL);
> > diff --git a/drivers/virtio/virtio_pci_common.c 
> > b/drivers/virtio/virtio_pci_common.c
> > index d724f676608b..39a711ddff30 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -583,7 +583,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> >  * layers can abort any ongoing operation.
> >  */
> > if (!pci_device_is_present(pci_dev))
> > -   virtio_break_device(_dev->vdev);
> > +   virtio_break_device(_dev->vdev, true);
> >  
> > pci_disable_sriov(pci_dev);
> >  
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index cfb028ca238e..6da13495a70c 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2382,7 +2382,7 @@ EXPORT_SYMBOL_GPL(virtqueue_is_broken);
> >   * This should prevent the device from being used, allowing drivers to
> >   * recover.  You may need to grab appropriate locks to flush.
> >   */
> > -void virtio_break_device(struct virtio_device *dev)
> > +void virtio_break_device(struct virtio_device *dev, bool broken)
> 
> I think we need to be careful to say when it is safe to unset 'broken'.
> 
> The current callers set all queues to broken in case of surprise removal
> (ccw, pci), removal (console), or the device behaving badly
> (crypto). There's also code setting individual queues to broken. We do
> not want to undo any of these, unless the device has gone through a
> reset in the meanwhile. Maybe add:
> 
> "It is only safe to call this function to *remove* the broken flag for a
> device that is (re)transitioning to becoming usable; calling it that way
> during normal usage may have unpredictable consequences."
> 
> (Not sure how to word this; especially if we consider future usage of
> queue reset.)


Right. I would prefer __virtio_unbreak_device or something similar
with a bit comment explaining it's only safe to call during probe.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-25 Thread Michael S. Tsirkin
On Mon, Apr 25, 2022 at 10:54:24AM +0200, Cornelia Huck wrote:
> On Mon, Apr 25 2022, "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote:
> >> This patch tries to implement the synchronize_cbs() for ccw. For the
> >> vring_interrupt() that is called via virtio_airq_handler(), the
> >> synchronization is simply done via the airq_info's lock. For the
> >> vring_interrupt() that is called via virtio_ccw_int_handler(), a per
> >> device spinlock for irq is introduced ans used in the synchronization
> >> method.
> >> 
> >> Cc: Thomas Gleixner 
> >> Cc: Peter Zijlstra 
> >> Cc: "Paul E. McKenney" 
> >> Cc: Marc Zyngier 
> >> Cc: Halil Pasic 
> >> Cc: Cornelia Huck 
> >> Signed-off-by: Jason Wang 
> >
> >
> > This is the only one that is giving me pause. Halil, Cornelia,
> > should we be concerned about the performance impact here?
> > Any chance it can be tested?
> 
> We can have a bunch of devices using the same airq structure, and the
> sync cb creates a choke point, same as registering/unregistering.

BTW can callbacks for multiple VQs run on multiple CPUs at the moment?
this patch serializes them on a spinlock.

> If
> invoking the sync cb is a rare operation (same as (un)registering), it
> should not affect interrupt processing for other devices too much, but
> it really should be rare.
> 
> For testing, you would probably want to use a setup with many devices
> that share the same airq area (you can fit a lot of devices if they have
> few queues), generate traffic on the queues, and then do something that
> triggers the callback (adding/removing a new device in a loop?)
> 
> I currently don't have such a setup handy; Halil, would you be able to
> test that?
> 
> >
> >> ---
> >>  drivers/s390/virtio/virtio_ccw.c | 27 +++
> >>  1 file changed, 27 insertions(+)
> >> 
> >> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> >> b/drivers/s390/virtio/virtio_ccw.c
> >> index d35e7a3f7067..c19f07a82d62 100644
> >> --- a/drivers/s390/virtio/virtio_ccw.c
> >> +++ b/drivers/s390/virtio/virtio_ccw.c
> >> @@ -62,6 +62,7 @@ struct virtio_ccw_device {
> >>unsigned int revision; /* Transport revision */
> >>wait_queue_head_t wait_q;
> >>spinlock_t lock;
> >> +  spinlock_t irq_lock;
> >>struct mutex io_lock; /* Serializes I/O requests */
> >>struct list_head virtqueues;
> >>bool is_thinint;
> >> @@ -984,6 +985,27 @@ static const char *virtio_ccw_bus_name(struct 
> >> virtio_device *vdev)
> >>return dev_name(>cdev->dev);
> >>  }
> >>  
> >> +static void virtio_ccw_synchronize_cbs(struct virtio_device *vdev)
> >> +{
> >> +  struct virtio_ccw_device *vcdev = to_vc_device(vdev);
> >> +  struct airq_info *info = vcdev->airq_info;
> >> +
> >> +  /*
> >> +   * Synchronize with the vring_interrupt() called by
> >> +   * virtio_ccw_int_handler().
> >> +   */
> >> +  spin_lock(>irq_lock);
> >> +  spin_unlock(>irq_lock);
> >> +
> >> +  if (info) {
> >> +  /*
> >> +   * Synchronize with the vring_interrupt() with airq indicator
> >> +   */
> >> +  write_lock(>lock);
> >> +  write_unlock(>lock);
> >> +  }
> 
> I think we can make this an either/or operation (devices will either use
> classic interrupts or adapter interrupts)?
> 
> >> +}
> >> +
> >>  static const struct virtio_config_ops virtio_ccw_config_ops = {
> >>.get_features = virtio_ccw_get_features,
> >>.finalize_features = virtio_ccw_finalize_features,

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net v3] virtio_net: fix wrong buf address calculation when using xdp

2022-04-25 Thread Michael S. Tsirkin
  [   41.322193]  skb_release_data+0x13f/0x1c0
>  [   41.322902]  __kfree_skb+0x20/0x30
>  [   41.343870]  tcp_recvmsg_locked+0x671/0x880
>  [   41.363764]  tcp_recvmsg+0x5e/0x1c0
>  [   41.384102]  inet_recvmsg+0x42/0x100
>  [   41.406783]  ? sock_recvmsg+0x1d/0x70
>  [   41.428201]  sock_read_iter+0x84/0xd0
>  [   41.445592]  ? 0xa300
>  [   41.462442]  new_sync_read+0x148/0x160
>  [   41.479314]  ? 0xa300
>  [   41.496937]  vfs_read+0x138/0x190
>  [   41.517198]  ksys_read+0x87/0xc0
>  [   41.535336]  do_syscall_64+0x3b/0x90
>  [   41.551637]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>  [   41.568050] RIP: 0033:0x48765b
>  [   41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a 
> fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 
> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>  [   41.632818] RSP: 002b:00c000a2f5b8 EFLAGS: 0212 ORIG_RAX: 
> 
>  [   41.664588] RAX: ffda RBX: 00c62000 RCX: 
> 0048765b
>  [   41.681205] RDX: 5e54 RSI: 00c000e66000 RDI: 
> 0016
>  [   41.697164] RBP: 00c000a2f608 R08: 0001 R09: 
> 01b4
>  [   41.713034] R10: 00b6 R11: 0212 R12: 
> 00e9
>  [   41.728755] R13: 0001 R14: 00c000a92000 R15: 
> 
>  [   41.744254]  
>  [   41.758585] Modules linked in: br_netfilter bridge veth netconsole 
> virtio_net
> 
>  and
> 
>  [   33.524802] BUG: Bad page state in process systemd-network  pfn:11e60
>  [   33.528617] page e05dc0147b00 e05dc04e7a00 8ae9851ec000 (1) 
> len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data 8ae9851ec10c 
> data_meta 8ae9851ec108 data_end 8ae9851ec14e
>  [   33.529764] page:3792b5ba refcount:0 mapcount:-512 
> mapping: index:0x0 pfn:0x11e60
>  [   33.532463] flags: 0xfc000(node=0|zone=1|lastcpupid=0x1f)
>  [   33.532468] raw: 000fc000  dead0122 
> 
>  [   33.532470] raw:   fdff 
> 
>  [   33.532471] page dumped because: nonzero mapcount
>  [   33.532472] Modules linked in: br_netfilter bridge veth netconsole 
> virtio_net
>  [   33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not 
> tainted 5.18.0-rc1+ #37
>  [   33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.15.0-1.fc35 04/01/2014
>  [   33.532484] Call Trace:
>  [   33.532496]  
>  [   33.532500]  dump_stack_lvl+0x45/0x5a
>  [   33.532506]  bad_page.cold+0x63/0x94
>  [   33.532510]  free_pcp_prepare+0x290/0x420
>  [   33.532515]  free_unref_page+0x1b/0x100
>  [   33.532518]  skb_release_data+0x13f/0x1c0
>  [   33.532524]  kfree_skb_reason+0x3e/0xc0
>  [   33.532527]  ip6_mc_input+0x23c/0x2b0
>  [   33.532531]  ip6_sublist_rcv_finish+0x83/0x90
>  [   33.532534]  ip6_sublist_rcv+0x22b/0x2b0
> 
> [3] XDP program to reproduce(xdp_pass.c):
>  #include 
>  #include 
> 
>  SEC("xdp_pass")
>  int xdp_pkt_pass(struct xdp_md *ctx)
>  {
>   bpf_xdp_adjust_head(ctx, -(int)32);
>   return XDP_PASS;
>  }
> 
>  char _license[] SEC("license") = "GPL";
> 
>  compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o
>  load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass
> 
> CC: sta...@vger.kernel.org
> CC: Jason Wang 
> CC: Xuan Zhuo 
> CC: Daniel Borkmann 
> CC: "Michael S. Tsirkin" 
> CC: virtualization@lists.linux-foundation.org
> Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
> Signed-off-by: Nikolay Aleksandrov 

Acked-by: Michael S. Tsirkin 
> ---
> v3: Add a comment explaining why offset and headroom are equal,
> no code changes
> v2: Recalculate headroom based on data, data_hard_start and data_meta
> 
>  drivers/net/virtio_net.c | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 87838cbe38cf..cbba9d2e8f32 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1005,6 +1005,24 @@ static struct sk_buff *receive_mergeable(struct 
> net_device *dev,
>* xdp.data_meta were adjusted
>*/
>   len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
> +
> + /* recalculate headroom if xdp.data or xdp_data_meta
> +  * were adjusted, note that offset should always point
> +  

Re: [PATCH V3 6/9] virtio-ccw: implement synchronize_cbs()

2022-04-25 Thread Michael S. Tsirkin
On Mon, Apr 25, 2022 at 10:44:15AM +0800, Jason Wang wrote:
> This patch tries to implement the synchronize_cbs() for ccw. For the
> vring_interrupt() that is called via virtio_airq_handler(), the
> synchronization is simply done via the airq_info's lock. For the
> vring_interrupt() that is called via virtio_ccw_int_handler(), a per
> device spinlock for irq is introduced ans used in the synchronization
> method.
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Cc: Halil Pasic 
> Cc: Cornelia Huck 
> Signed-off-by: Jason Wang 


This is the only one that is giving me pause. Halil, Cornelia,
should we be concerned about the performance impact here?
Any chance it can be tested?

> ---
>  drivers/s390/virtio/virtio_ccw.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index d35e7a3f7067..c19f07a82d62 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -62,6 +62,7 @@ struct virtio_ccw_device {
>   unsigned int revision; /* Transport revision */
>   wait_queue_head_t wait_q;
>   spinlock_t lock;
> + spinlock_t irq_lock;
>   struct mutex io_lock; /* Serializes I/O requests */
>   struct list_head virtqueues;
>   bool is_thinint;
> @@ -984,6 +985,27 @@ static const char *virtio_ccw_bus_name(struct 
> virtio_device *vdev)
>   return dev_name(>cdev->dev);
>  }
>  
> +static void virtio_ccw_synchronize_cbs(struct virtio_device *vdev)
> +{
> + struct virtio_ccw_device *vcdev = to_vc_device(vdev);
> + struct airq_info *info = vcdev->airq_info;
> +
> + /*
> +  * Synchronize with the vring_interrupt() called by
> +  * virtio_ccw_int_handler().
> +  */
> + spin_lock(>irq_lock);
> + spin_unlock(>irq_lock);
> +
> + if (info) {
> + /*
> +  * Synchronize with the vring_interrupt() with airq indicator
> +  */
> + write_lock(>lock);
> + write_unlock(>lock);
> + }
> +}
> +
>  static const struct virtio_config_ops virtio_ccw_config_ops = {
>   .get_features = virtio_ccw_get_features,
>   .finalize_features = virtio_ccw_finalize_features,
> @@ -995,6 +1017,7 @@ static const struct virtio_config_ops 
> virtio_ccw_config_ops = {
>   .find_vqs = virtio_ccw_find_vqs,
>   .del_vqs = virtio_ccw_del_vqs,
>   .bus_name = virtio_ccw_bus_name,
> + .synchronize_cbs = virtio_ccw_synchronize_cbs,
>  };
>  
>  
> @@ -1079,6 +1102,7 @@ static void virtio_ccw_int_handler(struct ccw_device 
> *cdev,
>  {
>   __u32 activity = intparm & VIRTIO_CCW_INTPARM_MASK;
>   struct virtio_ccw_device *vcdev = dev_get_drvdata(>dev);
> + unsigned long flags;
>   int i;
>   struct virtqueue *vq;
>  
> @@ -1106,6 +1130,7 @@ static void virtio_ccw_int_handler(struct ccw_device 
> *cdev,
>   vcdev->err = -EIO;
>   }
>   virtio_ccw_check_activity(vcdev, activity);
> + spin_lock_irqsave(>irq_lock, flags);
>   for_each_set_bit(i, indicators(vcdev),
>sizeof(*indicators(vcdev)) * BITS_PER_BYTE) {
>   /* The bit clear must happen before the vring kick. */
> @@ -1114,6 +1139,7 @@ static void virtio_ccw_int_handler(struct ccw_device 
> *cdev,
>   vq = virtio_ccw_vq_by_ind(vcdev, i);
>   vring_interrupt(0, vq);
>   }
> + spin_unlock_irqrestore(>irq_lock, flags);
>   if (test_bit(0, indicators2(vcdev))) {
>   virtio_config_changed(>vdev);
>   clear_bit(0, indicators2(vcdev));
> @@ -1284,6 +1310,7 @@ static int virtio_ccw_online(struct ccw_device *cdev)
>   init_waitqueue_head(>wait_q);
>   INIT_LIST_HEAD(>virtqueues);
>   spin_lock_init(>lock);
> + spin_lock_init(>irq_lock);
>   mutex_init(>io_lock);
>  
>   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio-net: Unpermitted usage of virtqueue before virtio driver initialization

2022-04-20 Thread Michael S. Tsirkin
On Wed, Apr 20, 2022 at 08:57:18PM +0200, Maciej Szymański wrote:
> On 20.04.2022 19:54, Michael S. Tsirkin wrote:
> > On Wed, Apr 20, 2022 at 04:58:51PM +0200, Maciej Szymański wrote:
> > > On 20.04.2022 13:10, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 20, 2022 at 10:17:27AM +0200, Maciej Szymański wrote:
> > > > > > > > > > Hmm so we have this:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > >if ((dev->features ^ features) & NETIF_F_GRO_HW) 
> > > > > > > > > > {
> > > > > > > > > >if (vi->xdp_enabled)
> > > > > > > > > >return -EBUSY;
> > > > > > > > > > 
> > > > > > > > > >if (features & NETIF_F_GRO_HW)
> > > > > > > > > >offloads = 
> > > > > > > > > > vi->guest_offloads_capable;
> > > > > > > > > >else
> > > > > > > > > >offloads = 
> > > > > > > > > > vi->guest_offloads_capable &
> > > > > > > > > > ~GUEST_OFFLOAD_GRO_HW_MASK;
> > > > > > > > > > 
> > > > > > > > > >err = virtnet_set_guest_offloads(vi, 
> > > > > > > > > > offloads);
> > > > > > > > > >if (err)
> > > > > > > > > >return err;
> > > > > > > > > >vi->guest_offloads = offloads;
> > > > > > > > > >}
> > > > > > > > > > 
> > > > > > > > > > which I guess should have prevented 
> > > > > > > > > > virtnet_set_guest_offloads
> > > > > > > > > > from ever running.
> > > > > > > > > > 
> > > > > > > > > >From your description it sounds like you have observed 
> > > > > > > > > > this
> > > > > > > > > > in practice, right?
> > > > > > > > > > 
> > > > > > > Yes. I have proprietary virtio-net device which advertises 
> > > > > > > following
> > > > > > > guest offload features :
> > > > > > > - VIRTIO_NET_F_GUEST_CSUM
> > > > > > > - VIRTIO_NET_F_GUEST_TSO4
> > > > > > > - VIRTIO_NET_F_GUEST_TSO6
> > > > > > > - VIRTIO_NET_F_GUEST_UFO
> > > > > > > 
> > > > > > > This feature set passes the condition in virtnet_set_features.
> > > > So why isn't dev->features equal to features?
> > > > 
> > > I just double verified and found that my device advertises
> > > VIRTIO_NET_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6 but not
> > > VIRTIO_NET_F_GUEST_CSUM as mentioned before.
> > So, your device is out of spec:
> > 
> > VIRTIO_NET_F_GUEST_TSO4 Requires VIRTIO_NET_F_GUEST_CSUM.
> > 
> > And
> > 
> > The device MUST NOT offer a feature which requires another feature which 
> > was not offered.
> > 
> > 
> > Is this a production device? Can it be fixed?
> The problem seems to be more complicated. In fact
> VIRTIO_NET_F_GUEST_CSUM is offered by our device, but during feature
> negotiation it is being dropped.
> This most likely does not happen when we use MMIO, but for some reason
> happens in QEMU for VHOST_USER + PCI.
> I need to investigate this more deeply...


I don't see where linux would drop it. I suspect it's dropped between
QEMU and vhost user. I'd say let's fix it in the device first.
We can next consider marking vqs broken before device is ready -
Jason what do you think?
Finally, we can add code to avoid acking dependent features
if the feature they depend on has not been negotiated - doing
so is also a spec violation after all.


> 
> > 
> > > That leads to the following situation :
> > > 
> > > in virtio_probe :
> > > 
> > >if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > >  dev->features |= NETIF_F_RXCSUM;
> > >if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > &

Re: virtio-net: Unpermitted usage of virtqueue before virtio driver initialization

2022-04-20 Thread Michael S. Tsirkin
On Wed, Apr 20, 2022 at 04:58:51PM +0200, Maciej Szymański wrote:
> On 20.04.2022 13:10, Michael S. Tsirkin wrote:
> > On Wed, Apr 20, 2022 at 10:17:27AM +0200, Maciej Szymański wrote:
> > > > > > > > Hmm so we have this:
> > > > > > > > 
> > > > > > > > 
> > > > > > > >   if ((dev->features ^ features) & NETIF_F_GRO_HW) {
> > > > > > > >   if (vi->xdp_enabled)
> > > > > > > >   return -EBUSY;
> > > > > > > > 
> > > > > > > >   if (features & NETIF_F_GRO_HW)
> > > > > > > >   offloads = vi->guest_offloads_capable;
> > > > > > > >   else
> > > > > > > >   offloads = vi->guest_offloads_capable 
> > > > > > > > &
> > > > > > > > ~GUEST_OFFLOAD_GRO_HW_MASK;
> > > > > > > > 
> > > > > > > >   err = virtnet_set_guest_offloads(vi, 
> > > > > > > > offloads);
> > > > > > > >   if (err)
> > > > > > > >   return err;
> > > > > > > >   vi->guest_offloads = offloads;
> > > > > > > >   }
> > > > > > > > 
> > > > > > > > which I guess should have prevented virtnet_set_guest_offloads
> > > > > > > > from ever running.
> > > > > > > > 
> > > > > > > >   From your description it sounds like you have observed this
> > > > > > > > in practice, right?
> > > > > > > > 
> > > > > Yes. I have proprietary virtio-net device which advertises following
> > > > > guest offload features :
> > > > > - VIRTIO_NET_F_GUEST_CSUM
> > > > > - VIRTIO_NET_F_GUEST_TSO4
> > > > > - VIRTIO_NET_F_GUEST_TSO6
> > > > > - VIRTIO_NET_F_GUEST_UFO
> > > > > 
> > > > > This feature set passes the condition in virtnet_set_features.
> > So why isn't dev->features equal to features?
> > 
> I just double verified and found that my device advertises
> VIRTIO_NET_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6 but not
> VIRTIO_NET_F_GUEST_CSUM as mentioned before.

So, your device is out of spec:

VIRTIO_NET_F_GUEST_TSO4 Requires VIRTIO_NET_F_GUEST_CSUM.

And

The device MUST NOT offer a feature which requires another feature which was 
not offered.


Is this a production device? Can it be fixed?


> That leads to the following situation :
> 
> in virtio_probe :
> 
>   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> dev->features |= NETIF_F_RXCSUM;
>   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>   virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> dev->features |= NETIF_F_GRO_HW;
>   if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> dev->hw_features |= NETIF_F_GRO_HW;
>
>
> while in netdev_fix_features :
> 
>   if (!(features & NETIF_F_RXCSUM)) {
> /* NETIF_F_GRO_HW implies doing RXCSUM since every packet
>  * successfully merged by hardware must also have the
>  * checksum verified by hardware.  If the user does not
>  * want to enable RXCSUM, logically, we should disable GRO_HW.
>  */
> if (features & NETIF_F_GRO_HW) {
>   netdev_dbg(dev, "Dropping NETIF_F_GRO_HW since no RXCSUM
> feature.\n");
>   features &= ~NETIF_F_GRO_HW;
> }
>   }
> 
> As result dev->features and features passed from
> __netdev_update_features differs exactly in NETIF_F_GRO_HW bit.
> 
> 
> Please mind our privacy 
> notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/>
>  pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 
> DSGVO finden Sie 
> hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-net: Unpermitted usage of virtqueue before virtio driver initialization

2022-04-20 Thread Michael S. Tsirkin
On Wed, Apr 20, 2022 at 10:17:27AM +0200, Maciej Szymański wrote:
> > > > > > Hmm so we have this:
> > > > > > 
> > > > > > 
> > > > > >  if ((dev->features ^ features) & NETIF_F_GRO_HW) {
> > > > > >  if (vi->xdp_enabled)
> > > > > >  return -EBUSY;
> > > > > > 
> > > > > >  if (features & NETIF_F_GRO_HW)
> > > > > >  offloads = vi->guest_offloads_capable;
> > > > > >  else
> > > > > >  offloads = vi->guest_offloads_capable &
> > > > > > ~GUEST_OFFLOAD_GRO_HW_MASK;
> > > > > > 
> > > > > >  err = virtnet_set_guest_offloads(vi, offloads);
> > > > > >  if (err)
> > > > > >  return err;
> > > > > >  vi->guest_offloads = offloads;
> > > > > >  }
> > > > > > 
> > > > > > which I guess should have prevented virtnet_set_guest_offloads
> > > > > > from ever running.
> > > > > > 
> > > > > >  From your description it sounds like you have observed this
> > > > > > in practice, right?
> > > > > > 
> > > Yes. I have proprietary virtio-net device which advertises following
> > > guest offload features :
> > > - VIRTIO_NET_F_GUEST_CSUM
> > > - VIRTIO_NET_F_GUEST_TSO4
> > > - VIRTIO_NET_F_GUEST_TSO6
> > > - VIRTIO_NET_F_GUEST_UFO
> > > 
> > > This feature set passes the condition in virtnet_set_features.

So why isn't dev->features equal to features?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-net: Unpermitted usage of virtqueue before virtio driver initialization

2022-04-20 Thread Michael S. Tsirkin
On Wed, Apr 20, 2022 at 11:07:00AM +0800, Jason Wang wrote:
> On Tue, Apr 19, 2022 at 11:03 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Apr 19, 2022 at 04:12:31PM +0200, Maciej Szymański wrote:
> > > Hello,
> > >
> > > I've found a problem in virtio-net driver.
> > > If virtio-net backend device advertises guest offload features, there is
> > > an unpermitted usage of control virtqueue before driver is initialized.
> > > According to VIRTIO specification 2.1.2 :
> > > "The device MUST NOT consume buffers or send any used buffer
> > > notifications to the driver before DRIVER_OK."
> >
> > Right.
> >
> > > During an initialization, driver calls register_netdevice which invokes
> > > callback function virtnet_set_features from __netdev_update_features.
> > > If guest offload features are advertised by the device,
> > > virtnet_set_guest_offloads is using virtnet_send_command to write and
> > > read from VQ.
> > > That leads to initialization stuck as device is not permitted yet to use 
> > > VQ.
> >
> >
> >
> > Hmm so we have this:
> >
> >
> > if ((dev->features ^ features) & NETIF_F_GRO_HW) {
> > if (vi->xdp_enabled)
> > return -EBUSY;
> >
> > if (features & NETIF_F_GRO_HW)
> > offloads = vi->guest_offloads_capable;
> > else
> > offloads = vi->guest_offloads_capable &
> >~GUEST_OFFLOAD_GRO_HW_MASK;
> >
> > err = virtnet_set_guest_offloads(vi, offloads);
> > if (err)
> > return err;
> > vi->guest_offloads = offloads;
> > }
> >
> > which I guess should have prevented virtnet_set_guest_offloads from ever 
> > running.
> >
> > From your description it sounds like you have observed this
> > in practice, right?
> >
> >
> >
> > > I have attached a patch for kernel 5.18-rc3 which fixes the problem by
> > > deferring feature set after virtio driver initialization.
> > >
> > > Best Regards,
> > >
> > > --
> > > Maciej Szymański
> > > Senior Staff Engineer
> > >
> > > OpenSynergy GmbH
> > > Rotherstr. 20, 10245 Berlin
> > >
> > > Phone:+49 30 60 98 54 0 -86
> > > Fax:  +49 30 60 98 54 0 -99
> > > E-Mail:   maciej.szyman...@opensynergy.com
> > >
> > > www.opensynergy.com
> > >
> > > Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 
> > > 108616B
> > > Geschäftsführer/Managing Director: Regis Adjamah
> > >
> > > Please mind our privacy 
> > > notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/>
> > >  pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 
> > > 13 DSGVO finden Sie 
> > > hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>
> >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 87838cb..a44462d 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -264,6 +264,8 @@ struct virtnet_info {
> > > unsigned long guest_offloads;
> > > unsigned long guest_offloads_capable;
> > >
> > > +   netdev_features_t features;
> > > +
> >
> > I don't much like how we are forced to keep a copy of features
> > here :( At least pls add a comment explaining what's going on,
> > who owns this etc.
> >
> > > /* failover when STANDBY feature enabled */
> > > struct failover *failover;
> > >  };
> > > @@ -2976,6 +2978,15 @@ static int virtnet_get_phys_port_name(struct 
> > > net_device *dev, char *buf,
> > >
> > >  static int virtnet_set_features(struct net_device *dev,
> > > netdev_features_t features)
> > > +{
> > > +   struct virtnet_info *vi = netdev_priv(dev);
> > > +   vi->features = features;
> > > +
> > > +   return 0;
> > > +}
> >
> >
> > Looks like this breaks changing features after initialization -
> > 

Re: virtio-net: Unpermitted usage of virtqueue before virtio driver initialization

2022-04-19 Thread Michael S. Tsirkin
On Tue, Apr 19, 2022 at 04:12:31PM +0200, Maciej Szymański wrote:
> Hello,
> 
> I've found a problem in virtio-net driver.
> If virtio-net backend device advertises guest offload features, there is
> an unpermitted usage of control virtqueue before driver is initialized.
> According to VIRTIO specification 2.1.2 :
> "The device MUST NOT consume buffers or send any used buffer
> notifications to the driver before DRIVER_OK."

Right.

> During an initialization, driver calls register_netdevice which invokes
> callback function virtnet_set_features from __netdev_update_features.
> If guest offload features are advertised by the device,
> virtnet_set_guest_offloads is using virtnet_send_command to write and
> read from VQ.
> That leads to initialization stuck as device is not permitted yet to use VQ.



Hmm so we have this:


if ((dev->features ^ features) & NETIF_F_GRO_HW) {
if (vi->xdp_enabled)
return -EBUSY;

if (features & NETIF_F_GRO_HW)
offloads = vi->guest_offloads_capable;
else
offloads = vi->guest_offloads_capable &
   ~GUEST_OFFLOAD_GRO_HW_MASK;

err = virtnet_set_guest_offloads(vi, offloads);
if (err)
return err;
vi->guest_offloads = offloads;
}

which I guess should have prevented virtnet_set_guest_offloads from ever 
running.

From your description it sounds like you have observed this
in practice, right?



> I have attached a patch for kernel 5.18-rc3 which fixes the problem by
> deferring feature set after virtio driver initialization.
> 
> Best Regards,
> 
> --
> Maciej Szymański
> Senior Staff Engineer
> 
> OpenSynergy GmbH
> Rotherstr. 20, 10245 Berlin
> 
> Phone:+49 30 60 98 54 0 -86
> Fax:  +49 30 60 98 54 0 -99
> E-Mail:   maciej.szyman...@opensynergy.com
> 
> www.opensynergy.com
> 
> Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
> Geschäftsführer/Managing Director: Regis Adjamah
> 
> Please mind our privacy 
> notice
>  pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 
> DSGVO finden Sie 
> hier.

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 87838cb..a44462d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -264,6 +264,8 @@ struct virtnet_info {
> unsigned long guest_offloads;
> unsigned long guest_offloads_capable;
>  
> +   netdev_features_t features;
> +

I don't much like how we are forced to keep a copy of features
here :( At least pls add a comment explaining what's going on,
who owns this etc.

> /* failover when STANDBY feature enabled */
> struct failover *failover;
>  };
> @@ -2976,6 +2978,15 @@ static int virtnet_get_phys_port_name(struct 
> net_device *dev, char *buf,
>  
>  static int virtnet_set_features(struct net_device *dev,
> netdev_features_t features)
> +{
> +   struct virtnet_info *vi = netdev_priv(dev);
> +   vi->features = features;
> +
> +   return 0;
> +}


Looks like this breaks changing features after initialization -
these will never be propagated to hardware now.

> +
> +static int virtnet_set_features_deferred(struct net_device *dev,
> +   netdev_features_t features)
>  {
> struct virtnet_info *vi = netdev_priv(dev);
> u64 offloads;
> @@ -3644,6 +3655,13 @@ static int virtnet_probe(struct virtio_device *vdev)
>  
> virtio_device_ready(vdev);
>  
> +   /* Deferred feature set after device ready */
> +   err = virtnet_set_features_deferred(dev, vi->features);


It seems that if this is called e.g. for a device without a CVQ and
there are things that actually need to change then it will BUG_ON.


> +   if (err) {
> +   pr_debug("virtio_net: set features failed\n");
> +   goto free_unregister_netdev;
> +   }
> +
> err = virtnet_cpu_notif_add(vi);
> if (err) {
> pr_debug("virtio_net: registering cpu notifier failed\n");
> 

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 1/2] net/af_packet: adjust network header position for VLAN tagged packets

2022-04-19 Thread Michael S. Tsirkin
On Tue, Apr 19, 2022 at 09:56:02AM -0400, Willem de Bruijn wrote:
> > >
> > > We should also maintain feature consistency between packet_snd,
> > > tpacket_snd and to the limitations of its feature set to
> > > packet_sendmsg_spkt. The no_fcs is already lacking in tpacket_snd as
> > > far as I can tell. But packet_sendmsg_spkt also sets it and calls
> > > packet_parse_headers.
> >
> > Yes, I think we could fix the tpacket_snd() in another patch.
> >
> > There are also some duplicated codes in these *_snd functions.
> > I think we can move them out to one single function.
> 
> Please don't refactor this code. It will complicate future backports
> of stable fixes.

Hmm I don't know offhand which duplication this refers to specifically
so maybe it's not worth addressing specifically but generally not
cleaning up code just because of backports seems wrong ...

> > > Because this patch touches many other packets besides the ones
> > > intended, I am a bit concerned about unintended consequences. Perhaps
> >
> > Yes, makes sense.
> >
> > > stretching the definition of the flags to include VLAN is acceptable
> > > (unlike outright tunnels), but even then I would suggest for net-next.
> >
> > As I asked, I'm not familiar with virtio code. Do you think if I should
> > add a new VIRTIO_NET_HDR_GSO_VLAN flag? It's only a L2 flag without any L3
> > info. If I add something like VIRTIO_NET_HDR_GSO_VLAN_TCPV4/TCPV6/UDP. That
> > would add more combinations. Which doesn't like a good idea.
> 
> I would prefer a new flag to denote this type, so that we can be
> strict and only change the datapath for packets that have this flag
> set (and thus express the intent).
> 
> But the VIRTIO_NET_HDR types are defined in the virtio spec. The
> maintainers should probably chime in.

Yes, it's a UAPI extension, not to be done lightly. In this case IIUC
gso_type in the header is only u8 - 8 bits and 5 of these are already
used.  So I don't think the virtio TC will be all that happy to burn up
a bit unless a clear benefit can be demonstrated. 

I agree with the net-next proposal, I think it's more a feature than a
bugfix. In particular I think a Fixes tag can also be dropped in that
IIUC GSO for vlan packets didn't work even before that commit - right?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Re: [PATCH 1/4] virtio-crypto: wait ctrl queue instead of busy polling

2022-04-15 Thread Michael S. Tsirkin
On Fri, Apr 15, 2022 at 06:50:19PM +0800, zhenwei pi wrote:
> On 4/15/22 16:41, Michael S. Tsirkin wrote:
> > > diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c 
> > > b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> > > index f3ec9420215e..bf7c1aa4be37 100644
> > > --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> > > +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> > > @@ -102,107 +102,100 @@ static int 
> > > virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher
> > >   {
> > >   struct scatterlist outhdr_sg, key_sg, inhdr_sg, *sgs[3];
> > >   struct virtio_crypto *vcrypto = ctx->vcrypto;
> > > + struct virtio_crypto_ctrl_request *vc_ctrl_req = NULL;
> > 
> > this is initialized down the road, I think you can skip = NULL here.
> > 
> OK.
> > >   uint8_t *pkey;
> > > - unsigned int inlen;
> > > - int err;
> > > + int err = -ENOMEM;
> > 
> > I would assign this in the single case where this value is used.
> > 
> OK
> > >   unsigned int num_out = 0, num_in = 0;
> > > + int node = dev_to_node(>vdev->dev);
> > are you sure it is
> > better to allocate close to device and not to current node
> > which is the default?
> > 
> Also with this part:
>  /* Internal representation of a data virtqueue */
> @@ -65,11 +66,6 @@ struct virtio_crypto {
>   /* Maximum size of per request */
>   u64 max_size;
> 
> - /* Control VQ buffers: protected by the ctrl_lock */
> - struct virtio_crypto_op_ctrl_req ctrl;
> - struct virtio_crypto_session_input input;
> - struct virtio_crypto_inhdr ctrl_status;
> -
>   unsigned long status;
>   atomic_t ref_count;
> 
> Orignally virtio crypto driver allocates ctrl_status per-device,
> and protects this with ctrl_lock. This is the reason why the control queue
> reaches the bottleneck of performance. I'll append this in the next version
> in commit message.
> 
> Instead of the single request buffer, declare struct
> virtio_crypto_ctrl_request {
> struct virtio_crypto_op_ctrl_req ctrl;
> struct virtio_crypto_session_input input;
> struct virtio_crypto_inhdr ctrl_status;
>   ... }
> 
> The motivation of this change is to allocate buffer from the same node with
> device during control queue operations.

But are you sure it's a win?  quite possibly it's a win to
have it close to driver not close to device.
This kind of change is really best done separately with some
testing showing it's a win. If that is too much to ask,
make it a separate patch and add some analysis explaining
why device accesses the structure more than the driver.


> > 
> > >   pkey = kmemdup(key, keylen, GFP_ATOMIC);
> > >   if (!pkey)
> > >   return -ENOMEM;
> > > - spin_lock(>ctrl_lock);
> > > - memcpy(>ctrl.header, header, sizeof(vcrypto->ctrl.header));
> > > - memcpy(>ctrl.u, para, sizeof(vcrypto->ctrl.u));
> > > - vcrypto->input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
> > > + vc_ctrl_req = kzalloc_node(sizeof(*vc_ctrl_req), GFP_KERNEL, node);
> > > + if (!vc_ctrl_req)
> > > + goto out;
> > 
> > do you need to allocate it with kzalloc?
> > is anything wrong with just keeping it part of device?
> > even if yes this change is better split in a separate patch, would make the 
> > patch smaller.
> Because there are padding field in
> virtio_crypto_op_ctrl_req_crypto_session_input, I suppose the
> original version also needs to clear padding field.
> So I use kzalloc to make sure that the padding field gets cleared.
> If this is reasonable, to separate this patch is OK to me, or I append this
> reason into commit message and comments in code.

Not sure I understand. Maybe add a code comment explaining
what is cleared and why.

> > > +
> > > +void virtcrypto_ctrlq_callback(struct virtqueue *vq)
> > > +{
> > > + struct virtio_crypto *vcrypto = vq->vdev->priv;
> > > + struct virtio_crypto_ctrl_request *vc_ctrl_req;
> > > + unsigned long flags;
> > > + unsigned int len;
> > > +
> > > + spin_lock_irqsave(>ctrl_lock, flags);
> > > + do {
> > > + virtqueue_disable_cb(vq);
> > > + while ((vc_ctrl_req = virtqueue_get_buf(vq, )) != NULL) {
> > 
> > 
> > you really need to break out of this loop if vq is broken,
> > virtqueue_get_buf will keep returning NULL in this case.
> > 
> I'm a little confused here, if vir

Re: [PATCH 3/3] virtio-pci: Use cpumask_available to fix compilation error

2022-04-15 Thread Michael S. Tsirkin
On Thu, Apr 14, 2022 at 05:08:55PM +0200, Christophe de Dinechin wrote:
> With GCC 12 and defconfig, we get the following error:
> 
> |   CC  drivers/virtio/virtio_pci_common.o
> | drivers/virtio/virtio_pci_common.c: In function ‘vp_del_vqs’:
> | drivers/virtio/virtio_pci_common.c:257:29: error: the comparison will
> |  always evaluate as ‘true’ for the pointer operand in
> |  ‘vp_dev->msix_affinity_masks + (sizetype)((long unsigned int)i * 8)’
> |  must not be NULL [-Werror=address]
> |   257 | if (vp_dev->msix_affinity_masks[i])
> |   | ^~
> 
> This happens in the case where CONFIG_CPUMASK_OFFSTACK is not defined,
> since we typedef cpumask_var_t as an array. The compiler is essentially
> complaining that an array pointer cannot be NULL. This is not a very
> important warning, but there is a function called cpumask_available that
> seems to be defined just for that case, so the fix is easy.
> 
> Signed-off-by: Christophe de Dinechin 
> Signed-off-by: Christophe de Dinechin 

There was an alternate patch proposed for this by
Murilo Opsfelder Araujo. What do you think about that approach?


> ---
>  drivers/virtio/virtio_pci_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index d724f676608b..5c44a2f13c93 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -254,7 +254,7 @@ void vp_del_vqs(struct virtio_device *vdev)
>  
>   if (vp_dev->msix_affinity_masks) {
>   for (i = 0; i < vp_dev->msix_vectors; i++)
> - if (vp_dev->msix_affinity_masks[i])
> + if (cpumask_available(vp_dev->msix_affinity_masks[i]))
>   
> free_cpumask_var(vp_dev->msix_affinity_masks[i]);
>   }
>  
> -- 
> 2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 0/6] virtio: Solution to restrict memory access under Xen using xen-virtio DMA ops layer

2022-04-15 Thread Michael S. Tsirkin
On Thu, Apr 14, 2022 at 10:19:27PM +0300, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko 
> 
> Hello all.
> 
> The purpose of this RFC patch series is to add support for restricting memory 
> access under Xen using specific
> grant table based DMA ops layer. Patch series is based on Juergen Gross’ 
> initial work [1] which implies using
> grant references instead of raw guest physical addresses (GPA) for the virtio 
> communications (some kind of
> the software IOMMU).
> 
> The high level idea is to create new Xen’s grant table based DMA ops layer 
> for the guest Linux whose main
> purpose is to provide a special 64-bit DMA address which is formed by using 
> the grant reference (for a page
> to be shared with the backend) with offset and setting the highest address 
> bit (this is for the backend to
> be able to distinguish grant ref based DMA address from normal GPA). For this 
> to work we need the ability
> to allocate contiguous (consecutive) grant references for multi-page 
> allocations. And the backend then needs
> to offer VIRTIO_F_ACCESS_PLATFORM and VIRTIO_F_VERSION_1 feature bits (it 
> must support virtio-mmio modern
> transport for 64-bit addresses in the virtqueue).

I'm not enough of a xen expert to review this, and I didn't get
all patches, but I'm very happy to see that approach being
taken. VIRTIO_F_ACCESS_PLATFORM and VIRTIO_F_VERSION_1 are
exactly the way to declare not all of memory is accessible.
Thanks!

> Xen's grant mapping mechanism is the secure and safe solution to share pages 
> between domains which proven
> to work and works for years (in the context of traditional Xen PV drivers for 
> example). So far, the foreign
> mapping is used for the virtio backend to map and access guest memory. With 
> the foreign mapping, the backend
> is able to map arbitrary pages from the guest memory (or even from Dom0 
> memory). And as the result, the malicious
> backend which runs in a non-trusted domain can take advantage of this. 
> Instead, with the grant mapping
> the backend is only allowed to map pages which were explicitly granted by the 
> guest before and nothing else. 
> According to the discussions in various mainline threads this solution would 
> likely be welcome because it
> perfectly fits in the security model Xen provides. 
> 
> What is more, the grant table based solution requires zero changes to the Xen 
> hypervisor itself at least
> with virtio-mmio and DT (in comparison, for example, with "foreign mapping + 
> virtio-iommu" solution which would
> require the whole new complex emulator in hypervisor in addition to new 
> functionality/hypercall to pass IOVA
> from the virtio backend running elsewhere to the hypervisor and translate it 
> to the GPA before mapping into
> P2M or denying the foreign mapping request if no corresponding IOVA-GPA 
> mapping present in the IOMMU page table
> for that particular device). We only need to update toolstack to insert a new 
> "xen,dev-domid" property to
> the virtio-mmio device node when creating a guest device-tree (this is an 
> indicator for the guest to use grants
> and the ID of Xen domain where the corresponding backend resides, it is used 
> as an argument to the grant mapping
> APIs). It worth mentioning that toolstack patch is based on non  upstreamed 
> yet “Virtio support for toolstack
> on Arm” series which is on review now [2].
> 
> Please note the following:
> - Patch series only covers Arm and virtio-mmio (device-tree) for now. To 
> enable the restricted memory access
>   feature on Arm the following options should be set:
>   CONFIG_XEN_VIRTIO = y
>   CONFIG_XEN_HVM_VIRTIO_GRANT = y
> - Some callbacks in xen-virtio DMA ops layer (map_sg/unmap_sg, etc) are not 
> implemented yet as they are not
>   needed/used in the first prototype
> 
> Patch series is rebased on Linux 5.18-rc2 tag and tested on Renesas 
> Salvator-X board + H3 ES3.0 SoC (Arm64)
> with standalone userspace (non-Qemu) virtio-mmio based virtio-disk backend 
> running in Driver domain and Linux
> guest running on existing virtio-blk driver (frontend). No issues were 
> observed. Guest domain 'reboot/destroy'
> use-cases work properly. I have also tested other use-cases such as assigning 
> several virtio block devices
> or a mix of virtio and Xen PV block devices to the guest. 
> 
> 1. Xen changes located at (last patch):
> https://github.com/otyshchenko1/xen/commits/libxl_virtio_next
> 2. Linux changes located at:
> https://github.com/otyshchenko1/linux/commits/virtio_grant5
> 3. virtio-disk changes located at:
> https://github.com/otyshchenko1/virtio-disk/commits/virtio_grant
> 
> Any feedback/help would be highly appreciated.
> 
> [1] https://www.youtube.com/watch?v=IrlEdaIUDPk
> [2] 
> https://lore.kernel.org/xen-devel/1649442065-8332-1-git-send-email-olekst...@gmail.com/
> 
> Juergen Gross (2):
>   xen/grants: support allocating consecutive grants
>   virtio: add option to restrict memory access under Xen
> 
> Oleksandr Tyshchenko 

Re: [PATCH 1/4] virtio-crypto: wait ctrl queue instead of busy polling

2022-04-15 Thread Michael S. Tsirkin
On Fri, Apr 15, 2022 at 02:41:33PM +0800, zhenwei pi wrote:
> Originally, after submitting request into virtio crypto control
> queue, the guest side polls the result from the virt queue. There
> are two problems:
> 1, The queue depth is always 1, the performance of a virtio crypto
>device gets limited. Multi user processes share a single control
>queue, and hit spin lock race from control queue. Test on Intel
>Platinum 8260, a single worker gets ~35K/s create/close session
>operations, and 8 workers get ~40K/s operations with 800% CPU
>utilization.
> 2, The control request is supposed to get handled immediately, but
>in the current implementation of QEMU(v6.2), the vCPU thread kicks
>another thread to do this work, the latency also gets unstable.
>Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s:
> usecs   : count distribution
>  0 -> 1  : 0||
>  2 -> 3  : 7||
>  4 -> 7  : 72   ||
>  8 -> 15 : 186485   ||
> 16 -> 31 : 687  ||
> 32 -> 63 : 5||
> 64 -> 127: 3||
>128 -> 255: 1||
>256 -> 511: 0||
>512 -> 1023   : 0||
>   1024 -> 2047   : 0||
>   2048 -> 4095   : 0||
>   4096 -> 8191   : 0||
>   8192 -> 16383  : 2||
> 
> To improve the performance of control queue, wait completion instead
> of busy polling without lock race, and get completed by control queue
> callback. Test this patch, the guest side get ~200K/s operations with
> 300% CPU utilization.
> 
> Signed-off-by: zhenwei pi 
> ---
>  drivers/crypto/virtio/Makefile|   1 +
>  .../virtio/virtio_crypto_akcipher_algs.c  |  87 ++--
>  drivers/crypto/virtio/virtio_crypto_common.c  |  61 
>  drivers/crypto/virtio/virtio_crypto_common.h  |  23 ++-
>  drivers/crypto/virtio/virtio_crypto_core.c|   2 +-
>  .../virtio/virtio_crypto_skcipher_algs.c  | 134 --
>  6 files changed, 183 insertions(+), 125 deletions(-)
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_common.c
> 
> diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
> index bfa6cbae342e..49c1fa80e465 100644
> --- a/drivers/crypto/virtio/Makefile
> +++ b/drivers/crypto/virtio/Makefile
> @@ -3,5 +3,6 @@ obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio_crypto.o
>  virtio_crypto-objs := \
>   virtio_crypto_skcipher_algs.o \
>   virtio_crypto_akcipher_algs.o \
> + virtio_crypto_common.o \
>   virtio_crypto_mgr.o \
>   virtio_crypto_core.o
> diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c 
> b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> index f3ec9420215e..bf7c1aa4be37 100644
> --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> @@ -102,107 +102,100 @@ static int 
> virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher
>  {
>   struct scatterlist outhdr_sg, key_sg, inhdr_sg, *sgs[3];
>   struct virtio_crypto *vcrypto = ctx->vcrypto;
> + struct virtio_crypto_ctrl_request *vc_ctrl_req = NULL;

this is initialized down the road, I think you can skip = NULL here.

>   uint8_t *pkey;
> - unsigned int inlen;
> - int err;
> + int err = -ENOMEM;

I would assign this in the single case where this value is used.

>   unsigned int num_out = 0, num_in = 0;
> + int node = dev_to_node(>vdev->dev);
>  
are you sure it is
better to allocate close to device and not to current node
which is the default?


>   pkey = kmemdup(key, keylen, GFP_ATOMIC);
>   if (!pkey)
>   return -ENOMEM;
>  
> - spin_lock(>ctrl_lock);
> - memcpy(>ctrl.header, header, sizeof(vcrypto->ctrl.header));
> - memcpy(>ctrl.u, para, sizeof(vcrypto->ctrl.u));
> - vcrypto->input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
> + vc_ctrl_req = kzalloc_node(sizeof(*vc_ctrl_req), GFP_KERNEL, node);
> + if (!vc_ctrl_req)
> + goto out;

do you need to allocate it with kzalloc?
is anything wrong with just keeping it part of device?
even if yes this change is better split in a separate patch, would make the 
patch smaller.



>  
> - sg_init_one(_sg, >ctrl, sizeof(vcrypto->ctrl));
> + memcpy(_ctrl_req->ctrl.header, header, 
> sizeof(vc_ctrl_req->ctrl.header));
> + memcpy(_ctrl_req->ctrl.u, para, sizeof(vc_ctrl_req->ctrl.u));
> + sg_init_one(_sg, _ctrl_req->ctrl, 

Re: [PATCH] vDPA/ifcvf: allow userspace to suspend a queue

2022-04-13 Thread Michael S. Tsirkin
On Wed, Apr 13, 2022 at 04:25:22PM +0800, Jason Wang wrote:
> On Mon, Apr 11, 2022 at 11:18 AM Zhu Lingshan  wrote:
> >
> > Formerly, ifcvf driver has implemented a lazy-initialization mechanism
> > for the virtqueues, it would store all virtqueue config fields that
> > passed down from the userspace, then load them to the virtqueues and
> > enable the queues upon DRIVER_OK.
> >
> > To allow the userspace to suspend a virtqueue,
> > this commit passes queue_enable to the virtqueue directly through
> > set_vq_ready().
> >
> > This feature requires and this commits implementing all virtqueue
> > ops(set_vq_addr, set_vq_num and set_vq_ready) to take immediate
> > actions than lazy-initialization, so ifcvf_hw_enable() is retired.
> >
> > To avoid losing virtqueue configurations caused by multiple
> > rounds of reset(), this commit also refactors thed evice reset
> > routine, now it simply reset the config handler and the virtqueues,
> > and only once device-reset().
> >
> > Signed-off-by: Zhu Lingshan 
> > ---
> >  drivers/vdpa/ifcvf/ifcvf_base.c | 94 -
> >  drivers/vdpa/ifcvf/ifcvf_base.h | 11 ++--
> >  drivers/vdpa/ifcvf/ifcvf_main.c | 57 +---
> >  3 files changed, 75 insertions(+), 87 deletions(-)
> >
> > diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c 
> > b/drivers/vdpa/ifcvf/ifcvf_base.c
> > index 48c4dadb0c7c..19eb0dcac123 100644
> > --- a/drivers/vdpa/ifcvf/ifcvf_base.c
> > +++ b/drivers/vdpa/ifcvf/ifcvf_base.c
> > @@ -175,16 +175,12 @@ u8 ifcvf_get_status(struct ifcvf_hw *hw)
> >  void ifcvf_set_status(struct ifcvf_hw *hw, u8 status)
> >  {
> > vp_iowrite8(status, >common_cfg->device_status);
> > +   vp_ioread8(>common_cfg->device_status);
> 
> This looks confusing, the name of the function is to set the status
> but what actually implemented here is to get the status.
> 
> >  }
> >
> >  void ifcvf_reset(struct ifcvf_hw *hw)
> >  {
> > -   hw->config_cb.callback = NULL;
> > -   hw->config_cb.private = NULL;
> > -
> > ifcvf_set_status(hw, 0);
> > -   /* flush set_status, make sure VF is stopped, reset */
> > -   ifcvf_get_status(hw);
> >  }
> >
> >  static void ifcvf_add_status(struct ifcvf_hw *hw, u8 status)
> > @@ -331,68 +327,94 @@ int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, 
> > u16 num)
> > ifcvf_lm = (struct ifcvf_lm_cfg __iomem *)hw->lm_cfg;
> > q_pair_id = qid / hw->nr_vring;
> > avail_idx_addr = _lm->vring_lm_cfg[q_pair_id].idx_addr[qid % 
> > 2];
> > -   hw->vring[qid].last_avail_idx = num;
> > vp_iowrite16(num, avail_idx_addr);
> > +   vp_ioread16(avail_idx_addr);
> 
> This looks like a bug fix.

is this to flush out the status write?  pls add a comment
explaining when and why it's needed.

> >
> > return 0;
> >  }
> >
> > -static int ifcvf_hw_enable(struct ifcvf_hw *hw)
> > +void ifcvf_set_vq_num(struct ifcvf_hw *hw, u16 qid, u32 num)
> >  {
> > -   struct virtio_pci_common_cfg __iomem *cfg;
> > -   u32 i;
> > +   struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
> >
> > -   cfg = hw->common_cfg;
> > -   for (i = 0; i < hw->nr_vring; i++) {
> > -   if (!hw->vring[i].ready)
> > -   break;
> > +   vp_iowrite16(qid, >queue_select);
> > +   vp_iowrite16(num, >queue_size);
> > +   vp_ioread16(>queue_size);
> > +}
> >
> > -   vp_iowrite16(i, >queue_select);
> > -   vp_iowrite64_twopart(hw->vring[i].desc, >queue_desc_lo,
> > ->queue_desc_hi);
> > -   vp_iowrite64_twopart(hw->vring[i].avail, 
> > >queue_avail_lo,
> > - >queue_avail_hi);
> > -   vp_iowrite64_twopart(hw->vring[i].used, >queue_used_lo,
> > ->queue_used_hi);
> > -   vp_iowrite16(hw->vring[i].size, >queue_size);
> > -   ifcvf_set_vq_state(hw, i, hw->vring[i].last_avail_idx);
> > -   vp_iowrite16(1, >queue_enable);
> > -   }
> > +int ifcvf_set_vq_address(struct ifcvf_hw *hw, u16 qid, u64 desc_area,
> > +u64 driver_area, u64 device_area)
> > +{
> > +   struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
> > +
> > +   vp_iowrite16(qid, >queue_select);
> > +   vp_iowrite64_twopart(desc_area, >queue_desc_lo,
> > +>queue_desc_hi);
> > +   vp_iowrite64_twopart(driver_area, >queue_avail_lo,
> > +>queue_avail_hi);
> > +   vp_iowrite64_twopart(device_area, >queue_used_lo,
> > +>queue_used_hi);
> > +   /* to flush IO */
> > +   vp_ioread16(>queue_select);
> 
> Why do we need to flush I/O here?
> 
> >
> > return 0;
> >  }
> >
> > -static void ifcvf_hw_disable(struct ifcvf_hw *hw)
> > +void ifcvf_set_vq_ready(struct ifcvf_hw *hw, u16 qid, bool ready)
> >  {
> > -   u32 i;
> > +   struct 

Re:

2022-04-12 Thread Michael S. Tsirkin
On Tue, Mar 29, 2022 at 10:35:21AM +0200, Thomas Gleixner wrote:
> On Mon, Mar 28 2022 at 06:40, Michael S. Tsirkin wrote:
> > On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
> >> > > So I think we might talk different issues:
> >> > >
> >> > > 1) Whether request_irq() commits the previous setups, I think the
> >> > > answer is yes, since the spin_unlock of desc->lock (release) can
> >> > > guarantee this though there seems no documentation around
> >> > > request_irq() to say this.
> >> > >
> >> > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> >> > > using smp_wmb() before the request_irq().
> 
> That's a complete bogus example especially as there is not a single
> smp_rmb() which pairs with the smp_wmb().
> 
> >> > > And even if write is ordered we still need read to be ordered to be
> >> > > paired with that.
> >
> > IMO it synchronizes with the CPU to which irq is
> > delivered. Otherwise basically all drivers would be broken,
> > wouldn't they be?
> > I don't know whether it's correct on all platforms, but if not
> > we need to fix request_irq.
> 
> There is nothing to fix:
> 
> request_irq()
>raw_spin_lock_irq(desc->lock);   // ACQUIRE
>
>raw_spin_unlock_irq(desc->lock); // RELEASE
> 
> interrupt()
>raw_spin_lock(desc->lock);   // ACQUIRE
>set status to IN_PROGRESS
>raw_spin_unlock(desc->lock); // RELEASE
>invoke handler()
> 
> So anything which the driver set up _before_ request_irq() is visible to
> the interrupt handler. No?
> 
> >> What happens if an interrupt is raised in the middle like:
> >> 
> >> smp_store_release(dev->irq_soft_enabled, true)
> >> IRQ handler
> >> synchornize_irq()
> 
> This is bogus. The obvious order of things is:
> 
> dev->ok = false;
> request_irq();
> 
> moar_setup();
> synchronize_irq();  // ACQUIRE + RELEASE
> dev->ok = true;
> 
> The reverse operation on teardown:
> 
> dev->ok = false;
> synchronize_irq();  // ACQUIRE + RELEASE
> 
> teardown();
> 
> So in both cases a simple check in the handler is sufficient:
> 
> handler()
> if (!dev->ok)
>   return;

Does this need to be if (!READ_ONCE(dev->ok)) ?



> I'm not understanding what you folks are trying to "fix" here. If any
> driver does this in the wrong order, then the driver is broken.
> 
> Sure, you can do the same with:
> 
> dev->ok = false;
> request_irq();
> moar_setup();
> smp_wmb();
> dev->ok = true;
> 
> for the price of a smp_rmb() in the interrupt handler:
> 
> handler()
> if (!dev->ok)
>   return;
> smp_rmb();
> 
> but that's only working for the setup case correctly and not for
> teardown.
> 
> Thanks,
> 
> tglx

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-11 Thread Michael S. Tsirkin
On Mon, Apr 11, 2022 at 04:22:19PM +0800, Jason Wang wrote:
> On Sun, Apr 10, 2022 at 3:51 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Apr 08, 2022 at 03:03:07PM +0200, Halil Pasic wrote:
> > > On Wed, 06 Apr 2022 15:04:32 +0200
> > > Cornelia Huck  wrote:
> > >
> > > > On Wed, Apr 06 2022, "Michael S. Tsirkin"  wrote:
> > > >
> > > > > On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:
> > > > >> This patch implements PCI version of synchronize_vqs().
> > > > >>
> > > > >> Cc: Thomas Gleixner 
> > > > >> Cc: Peter Zijlstra 
> > > > >> Cc: "Paul E. McKenney" 
> > > > >> Cc: Marc Zyngier 
> > > > >> Signed-off-by: Jason Wang 
> > > > >
> > > > > Please add implementations at least for ccw and mmio.
> > > >
> > > > I'm not sure what (if anything) can/should be done for ccw...
> > >
> > > If nothing needs to be done I would like to have at least a comment in
> > > the code that explains why. So that somebody who reads the code
> > > doesn't wonder: why is virtio-ccw not implementing that callback.
> >
> > Right.
> >
> > I am currently thinking instead of making this optional in the
> > core we should make it mandatory, and have transports which do not
> > need to sync have an empty stub with documentation explaining why.
> >
> > Also, do we want to document this sync is explicitly for irq enable/disable?
> > synchronize_irq_enable_disable?
> 
> I would not since the transport is not guaranteed to use an interrupt
> for callbacks.

OK but let's then document this in more detail.
More readers will wonder about what is the callback
trying to accomplish, and Halil requested that as well.

For example, let's document why is sync required on enable.

> >
> >
> > > >
> > > > >
> > > > >> ---
> > > > >>  drivers/virtio/virtio_pci_common.c | 14 ++
> > > > >>  drivers/virtio/virtio_pci_common.h |  2 ++
> > > > >>  drivers/virtio/virtio_pci_legacy.c |  1 +
> > > > >>  drivers/virtio/virtio_pci_modern.c |  2 ++
> > > > >>  4 files changed, 19 insertions(+)
> > > > >>
> > > > >> diff --git a/drivers/virtio/virtio_pci_common.c 
> > > > >> b/drivers/virtio/virtio_pci_common.c
> > > > >> index d724f676608b..b78c8bc93a97 100644
> > > > >> --- a/drivers/virtio/virtio_pci_common.c
> > > > >> +++ b/drivers/virtio/virtio_pci_common.c
> > > > >> @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device 
> > > > >> *vdev)
> > > > >>  synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > >>  }
> > > > >>
> > > > >> +void vp_synchronize_vqs(struct virtio_device *vdev)
> > > > >> +{
> > > > >> +struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > >> +int i;
> > > > >> +
> > > > >> +if (vp_dev->intx_enabled) {
> > > > >> +synchronize_irq(vp_dev->pci_dev->irq);
> > > > >> +return;
> > > > >> +}
> > > > >> +
> > > > >> +for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > >> +synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > >> +}
> > > > >> +
> > > >
> > > > ...given that this seems to synchronize threaded interrupt handlers?
> > > > Halil, do you think ccw needs to do anything? (AFAICS, we only have one
> > > > 'irq' for channel devices anyway, and the handler just calls the
> > > > relevant callbacks directly.)
> > >
> > > Sorry I don't understand enough yet. A more verbose documentation on
> > > "virtio_synchronize_vqs - synchronize with virtqueue callbacks" would
> > > surely benefit me. It may be more than enough for a back-belt but it
> > > ain't enough for me to tell what is the callback supposed to accomplish.
> > >
> > > I will have to study this discussion and the code more thoroughly.
> > > Tentatively I side with Jason and Michael in a sense, that I don't
> > > believe virtio-ccw is safe against rough interrupts.
> 
> That's my feeling as well.
> 
> Thanks
> 
> > >
> > > Sorry for the late response. I intend to revisit this on Monday. If
> > > I don't please feel encouraged to ping.
> > >
> > > Regards,
> > > Halil
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-10 Thread Michael S. Tsirkin
On Fri, Apr 08, 2022 at 03:03:07PM +0200, Halil Pasic wrote:
> On Wed, 06 Apr 2022 15:04:32 +0200
> Cornelia Huck  wrote:
> 
> > On Wed, Apr 06 2022, "Michael S. Tsirkin"  wrote:
> > 
> > > On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:  
> > >> This patch implements PCI version of synchronize_vqs().
> > >> 
> > >> Cc: Thomas Gleixner 
> > >> Cc: Peter Zijlstra 
> > >> Cc: "Paul E. McKenney" 
> > >> Cc: Marc Zyngier 
> > >> Signed-off-by: Jason Wang   
> > >
> > > Please add implementations at least for ccw and mmio.  
> > 
> > I'm not sure what (if anything) can/should be done for ccw...
> 
> If nothing needs to be done I would like to have at least a comment in
> the code that explains why. So that somebody who reads the code
> doesn't wonder: why is virtio-ccw not implementing that callback.

Right.

I am currently thinking instead of making this optional in the
core we should make it mandatory, and have transports which do not
need to sync have an empty stub with documentation explaining why.

Also, do we want to document this sync is explicitly for irq enable/disable?
synchronize_irq_enable_disable?


> > 
> > >  
> > >> ---
> > >>  drivers/virtio/virtio_pci_common.c | 14 ++
> > >>  drivers/virtio/virtio_pci_common.h |  2 ++
> > >>  drivers/virtio/virtio_pci_legacy.c |  1 +
> > >>  drivers/virtio/virtio_pci_modern.c |  2 ++
> > >>  4 files changed, 19 insertions(+)
> > >> 
> > >> diff --git a/drivers/virtio/virtio_pci_common.c 
> > >> b/drivers/virtio/virtio_pci_common.c
> > >> index d724f676608b..b78c8bc93a97 100644
> > >> --- a/drivers/virtio/virtio_pci_common.c
> > >> +++ b/drivers/virtio/virtio_pci_common.c
> > >> @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device 
> > >> *vdev)
> > >>  synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > >>  }
> > >>  
> > >> +void vp_synchronize_vqs(struct virtio_device *vdev)
> > >> +{
> > >> +struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > >> +int i;
> > >> +
> > >> +if (vp_dev->intx_enabled) {
> > >> +synchronize_irq(vp_dev->pci_dev->irq);
> > >> +return;
> > >> +}
> > >> +
> > >> +for (i = 0; i < vp_dev->msix_vectors; ++i)
> > >> +synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > >> +}
> > >> +  
> > 
> > ...given that this seems to synchronize threaded interrupt handlers?
> > Halil, do you think ccw needs to do anything? (AFAICS, we only have one
> > 'irq' for channel devices anyway, and the handler just calls the
> > relevant callbacks directly.)
> 
> Sorry I don't understand enough yet. A more verbose documentation on
> "virtio_synchronize_vqs - synchronize with virtqueue callbacks" would
> surely benefit me. It may be more than enough for a back-belt but it
> ain't enough for me to tell what is the callback supposed to accomplish.
> 
> I will have to study this discussion and the code more thoroughly.
> Tentatively I side with Jason and Michael in a sense, that I don't
> believe virtio-ccw is safe against rough interrupts.
> 
> Sorry for the late response. I intend to revisit this on Monday. If
> I don't please feel encouraged to ping.
> 
> Regards,
> Halil

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 03:04:32PM +0200, Cornelia Huck wrote:
> On Wed, Apr 06 2022, "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:
> >> This patch implements PCI version of synchronize_vqs().
> >> 
> >> Cc: Thomas Gleixner 
> >> Cc: Peter Zijlstra 
> >> Cc: "Paul E. McKenney" 
> >> Cc: Marc Zyngier 
> >> Signed-off-by: Jason Wang 
> >
> > Please add implementations at least for ccw and mmio.
> 
> I'm not sure what (if anything) can/should be done for ccw...
> 
> >
> >> ---
> >>  drivers/virtio/virtio_pci_common.c | 14 ++
> >>  drivers/virtio/virtio_pci_common.h |  2 ++
> >>  drivers/virtio/virtio_pci_legacy.c |  1 +
> >>  drivers/virtio/virtio_pci_modern.c |  2 ++
> >>  4 files changed, 19 insertions(+)
> >> 
> >> diff --git a/drivers/virtio/virtio_pci_common.c 
> >> b/drivers/virtio/virtio_pci_common.c
> >> index d724f676608b..b78c8bc93a97 100644
> >> --- a/drivers/virtio/virtio_pci_common.c
> >> +++ b/drivers/virtio/virtio_pci_common.c
> >> @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> >>synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >>  }
> >>  
> >> +void vp_synchronize_vqs(struct virtio_device *vdev)
> >> +{
> >> +  struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >> +  int i;
> >> +
> >> +  if (vp_dev->intx_enabled) {
> >> +  synchronize_irq(vp_dev->pci_dev->irq);
> >> +  return;
> >> +  }
> >> +
> >> +  for (i = 0; i < vp_dev->msix_vectors; ++i)
> >> +  synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >> +}
> >> +
> 
> ...given that this seems to synchronize threaded interrupt handlers?

No, any handlers at all. The point is to make sure any memory changes
made prior to this op are visible to callbacks.

Jason, maybe add that to the documentation?

> Halil, do you think ccw needs to do anything? (AFAICS, we only have one
> 'irq' for channel devices anyway, and the handler just calls the
> relevant callbacks directly.)

Then you need to synchronize with that.

> >>  /* the notify function used when creating a virt queue */
> >>  bool vp_notify(struct virtqueue *vq)
> >>  {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 5/5] virtio: harden vring IRQ

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:38PM +0800, Jason Wang wrote:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
> 
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>that is used by some device such as virtio-blk
> 2) done only for PCI transport
> 
> In this patch, we tries to borrow the idea from the INTX IRQ hardening
> in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> by introducing a global device_ready variable for each
> virtio_device. Then we can to toggle it during
> virtio_reset_device()/virtio_device_ready(). A
> virtio_synchornize_vqs() is used in both virtio_device_ready() and
> virtio_reset_device() to synchronize with the vring callbacks. With
> this, vring_interrupt() can return check and early if driver_ready is
> false.
> 
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio.c   | 11 +++
>  drivers/virtio/virtio_ring.c  |  9 -
>  include/linux/virtio.h|  2 ++
>  include/linux/virtio_config.h |  8 
>  4 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 8dde44ea044a..2f3a6f8e3d9c 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,17 @@ static int virtio_features_ok(struct virtio_device *dev)
>   * */
>  void virtio_reset_device(struct virtio_device *dev)
>  {
> + if (READ_ONCE(dev->driver_ready)) {
> + /*
> +  * The below virtio_synchronize_vqs() guarantees that any
> +  * interrupt for this line arriving after
> +  * virtio_synchronize_vqs() has completed is guaranteed to see
> +  * driver_ready == false.
> +  */
> + WRITE_ONCE(dev->driver_ready, false);
> + virtio_synchronize_vqs(dev);
> + }
> +
>   dev->config->reset(dev);
>  }
>  EXPORT_SYMBOL_GPL(virtio_reset_device);
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cfb028ca238e..a4592e55c9f8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2127,10 +2127,17 @@ static inline bool more_used(const struct 
> vring_virtqueue *vq)
>   return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
>  }
>  
> -irqreturn_t vring_interrupt(int irq, void *_vq)
> +irqreturn_t vring_interrupt(int irq, void *v)
>  {
> + struct virtqueue *_vq = v;
> + struct virtio_device *vdev = _vq->vdev;
>   struct vring_virtqueue *vq = to_vvq(_vq);
>  
> + if (!READ_ONCE(vdev->driver_ready)) {


I am not sure why we need READ_ONCE here, it's done under lock.


Accrdingly, same thing above for READ_ONCE and WRITE_ONCE.


> + dev_warn_once(>dev, "virtio vring IRQ raised before 
> DRIVER_OK");
> + return IRQ_NONE;
> + }
> +
>   if (!more_used(vq)) {
>   pr_debug("virtqueue interrupt with no work for %p\n", vq);
>   return IRQ_NONE;
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 5464f398912a..dfa2638a293e 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -95,6 +95,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
>   * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
>   * @config_enabled: configuration change reporting enabled
>   * @config_change_pending: configuration change reported while disabled
> + * @driver_ready: whehter the driver is ready (e.g for vring callbacks)
>   * @config_lock: protects configuration change reporting
>   * @dev: underlying device.
>   * @id: the device type identification (used to match it with a driver).
> @@ -109,6 +110,7 @@ struct virtio_device {
>   bool failed;
>   bool config_enabled;
>   bool config_change_pending;
> + bool driver_ready;
>   spinlock_t config_lock;
>   spinlock_t vqs_list_lock; /* Protects VQs list access */
>   struct device dev;
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 08b73d9bbff2..c9e207bf2c9c 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -246,6 +246,14 @@ void virtio_device_ready(struct virtio_device *dev)
>  {
>   unsigned status = dev->config->get_status(dev);
>  
> + virtio_synchronize_vqs(dev);
> +/*
> + * The above virtio_synchronize_vqs() make sure


makes sure

> + * vring_interrupt() will 

Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:
> This patch implements PCI version of synchronize_vqs().
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 

Please add implementations at least for ccw and mmio.

> ---
>  drivers/virtio/virtio_pci_common.c | 14 ++
>  drivers/virtio/virtio_pci_common.h |  2 ++
>  drivers/virtio/virtio_pci_legacy.c |  1 +
>  drivers/virtio/virtio_pci_modern.c |  2 ++
>  4 files changed, 19 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index d724f676608b..b78c8bc93a97 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
>   synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
>  }
>  
> +void vp_synchronize_vqs(struct virtio_device *vdev)
> +{
> + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> + int i;
> +
> + if (vp_dev->intx_enabled) {
> + synchronize_irq(vp_dev->pci_dev->irq);
> + return;
> + }
> +
> + for (i = 0; i < vp_dev->msix_vectors; ++i)
> + synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> +}
> +
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct virtqueue *vq)
>  {
> diff --git a/drivers/virtio/virtio_pci_common.h 
> b/drivers/virtio/virtio_pci_common.h
> index eb17a29fc7ef..2b84d5c1b5bc 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -105,6 +105,8 @@ static struct virtio_pci_device *to_vp_device(struct 
> virtio_device *vdev)
>  void vp_synchronize_vectors(struct virtio_device *vdev);
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct virtqueue *vq);
> +/* synchronize with callbacks */
> +void vp_synchronize_vqs(struct virtio_device *vdev);
>  /* the config->del_vqs() implementation */
>  void vp_del_vqs(struct virtio_device *vdev);
>  /* the config->find_vqs() implementation */
> diff --git a/drivers/virtio/virtio_pci_legacy.c 
> b/drivers/virtio/virtio_pci_legacy.c
> index 6f4e34ce96b8..5a9e62320edc 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -192,6 +192,7 @@ static const struct virtio_config_ops 
> virtio_pci_config_ops = {
>   .reset  = vp_reset,
>   .find_vqs   = vp_find_vqs,
>   .del_vqs= vp_del_vqs,
> + .synchronize_vqs = vp_synchronize_vqs,
>   .get_features   = vp_get_features,
>   .finalize_features = vp_finalize_features,
>   .bus_name   = vp_bus_name,
> diff --git a/drivers/virtio/virtio_pci_modern.c 
> b/drivers/virtio/virtio_pci_modern.c
> index a2671a20ef77..584850389855 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -394,6 +394,7 @@ static const struct virtio_config_ops 
> virtio_pci_config_nodev_ops = {
>   .reset  = vp_reset,
>   .find_vqs   = vp_modern_find_vqs,
>   .del_vqs= vp_del_vqs,
> + .synchronize_vqs = vp_synchronize_vqs,
>   .get_features   = vp_get_features,
>   .finalize_features = vp_finalize_features,
>   .bus_name   = vp_bus_name,
> @@ -411,6 +412,7 @@ static const struct virtio_config_ops 
> virtio_pci_config_ops = {
>   .reset  = vp_reset,
>   .find_vqs   = vp_modern_find_vqs,
>   .del_vqs= vp_del_vqs,
> + .synchronize_vqs = vp_synchronize_vqs,
>   .get_features   = vp_get_features,
>   .finalize_features = vp_finalize_features,
>   .bus_name   = vp_bus_name,
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 3/5] virtio: introduce config op to synchronize vring callbacks

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:36PM +0800, Jason Wang wrote:
> This patch introduce

introduces

> a new

new

> virtio config ops to vring
> callbacks. Transport specific method is required to call
> synchornize_irq() on the IRQs. For the transport that doesn't provide
> synchronize_vqs(), use synchornize_rcu() as a fallback.
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 
> ---
>  include/linux/virtio_config.h | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index b341dd62aa4d..08b73d9bbff2 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -57,6 +57,8 @@ struct virtio_shm_region {
>   *   include a NULL entry for vqs unused by driver
>   *   Returns 0 on success or error status
>   * @del_vqs: free virtqueues found by find_vqs().
> + * @synchronize_vqs: synchronize with the virtqueue callbacks.
> + *   vdev: the virtio_device

I think I prefer synchronize_callbacks

>   * @get_features: get the array of feature bits for this device.
>   *   vdev: the virtio_device
>   *   Returns the first 64 feature bits (all we currently need).
> @@ -89,6 +91,7 @@ struct virtio_config_ops {
>   const char * const names[], const bool *ctx,
>   struct irq_affinity *desc);
>   void (*del_vqs)(struct virtio_device *);
> + void (*synchronize_vqs)(struct virtio_device *);
>   u64 (*get_features)(struct virtio_device *vdev);
>   int (*finalize_features)(struct virtio_device *vdev);
>   const char *(*bus_name)(struct virtio_device *vdev);
> @@ -217,6 +220,19 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, 
> unsigned nvqs,
> desc);
>  }
>  
> +/**
> + * virtio_synchronize_vqs - synchronize with virtqueue callbacks
> + * @vdev: the device
> + */
> +static inline
> +void virtio_synchronize_vqs(struct virtio_device *dev)
> +{
> + if (dev->config->synchronize_vqs)
> + dev->config->synchronize_vqs(dev);
> + else
> + synchronize_rcu();

I am not sure about this fallback and the latency impact.
Maybe synchronize_rcu_expedited is better here.

> +}
> +
>  /**
>   * virtio_device_ready - enable vq use in probe function
>   * @vdev: the device
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 2/5] virtio: use virtio_reset_device() when possible

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:35PM +0800, Jason Wang wrote:
> This allows us to do common extension without duplicating codes.

codes -> code

> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 75c8d560bbd3..8dde44ea044a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
>  
>   /* We always start by resetting the device, in case a previous
>* driver messed it up.  This also tests that code path a little. */
> - dev->config->reset(dev);
> + virtio_reset_device(dev);
>  
>   /* Acknowledge that we've seen the device. */
>   virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> @@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
>  
>   /* We always start by resetting the device, in case a previous
>* driver messed it up. */
> - dev->config->reset(dev);
> + virtio_reset_device(dev);
>  
>   /* Acknowledge that we've seen the device. */
>   virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 1/5] virtio: use virtio_device_ready() in virtio_device_restore()

2022-04-06 Thread Michael S. Tsirkin
patch had wrong mime type. I managed to extra it but pls fix.

> 
> 
> From: Stefano Garzarella 
> 
> It will allows us

will allow us

> to do extension on virtio_device_ready() without
> duplicating codes.

code

> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Stefano Garzarella 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 22f15f444f75..75c8d560bbd3 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
>   goto err;
>   }
>  
> - /* Finally, tell the device we're all set */
> - virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> + /* If restore didn't do it, mark device DRIVER_OK ourselves. */
> + if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> + virtio_device_ready(dev);
>  
>   virtio_config_enable(dev);

it's unfortunate that this adds an extra vmexit since virtio_device_ready
calls get_status too.

We now have:

static inline
void virtio_device_ready(struct virtio_device *dev)
{
unsigned status = dev->config->get_status(dev);

BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
}


I propose adding a helper and putting common code there.

>  
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 0/5] rework on the IRQ hardening of virtio

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:33PM +0800, Jason Wang wrote:
> Hi All:
> 
> This is a rework on the IRQ hardening for virtio which is done
> previously by the following commits are reverted:
> 
> 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
> 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> 
> The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
> with the assumption of the affinity managed IRQ that is used by some
> virtio drivers. And what's more, it is only done for virtio-pci but
> not other transports.
> 
> In this rework, I try to implement a general virtio solution which
> borrows the idea of the INTX hardening by introducing a boolean for
> virtqueue callback enabling and toggle it in virtio_device_ready()
> and virtio_reset_device(). Then vring_interrupt() can simply check and
> return early if the driver is not ready.


All of a sudden all patches are having a wrong mime type.

It is application/octet-stream; should be text/plain

Pls fix and repost, thanks!

> Please review.
> 
> Changes since v1:
> 
> - Use transport specific irq synchronization method when possible
> - Drop the module parameter and enable the hardening unconditonally
> - Tweak the barrier/ordering facilities used in the code
> - Reanme irq_soft_enabled to driver_ready
> - Avoid unnecssary IRQ synchornization (e.g during boot)
> 
> Jason Wang (4):
>   virtio: use virtio_reset_device() when possible
>   virtio: introduce config op to synchronize vring callbacks
>   virtio-pci: implement synchronize_vqs()
>   virtio: harden vring IRQ
> 
> Stefano Garzarella (1):
>   virtio: use virtio_device_ready() in virtio_device_restore()
> 
>  drivers/virtio/virtio.c| 20 
>  drivers/virtio/virtio_pci_common.c | 14 ++
>  drivers/virtio/virtio_pci_common.h |  2 ++
>  drivers/virtio/virtio_pci_legacy.c |  1 +
>  drivers/virtio/virtio_pci_modern.c |  2 ++
>  drivers/virtio/virtio_ring.c   |  9 -
>  include/linux/virtio.h |  2 ++
>  include/linux/virtio_config.h  | 24 
>  8 files changed, 69 insertions(+), 5 deletions(-)
> 
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/4] Introduce akcipher service for virtio-crypto

2022-04-06 Thread Michael S. Tsirkin
On Tue, Apr 05, 2022 at 10:33:42AM +0200, Cornelia Huck wrote:
> On Tue, Apr 05 2022, "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Apr 04, 2022 at 05:39:24PM +0200, Cornelia Huck wrote:
> >> On Mon, Mar 07 2022, "Michael S. Tsirkin"  wrote:
> >> 
> >> > On Mon, Mar 07, 2022 at 10:42:30AM +0800, zhenwei pi wrote:
> >> >> Hi, Michael & Lei
> >> >> 
> >> >> The full patchset has been reviewed by Gonglei, thanks to Gonglei.
> >> >> Should I modify the virtio crypto specification(use "__le32 
> >> >> akcipher_algo;"
> >> >> instead of "__le32 reserve;" only, see v1->v2 change), and start a new 
> >> >> issue
> >> >> for a revoting procedure?
> >> >
> >> > You can but not it probably will be deferred to 1.3. OK with you?
> >> >
> >> >> Also cc Cornelia Huck.
> >> 
> >> [Apologies, I'm horribly behind on my email backlog, and on virtio
> >> things in general :(]
> >> 
> >> The akcipher update had been deferred for 1.2, so I think it will be 1.3
> >> material. However, I just noticed while browsing the fine lwn.net merge
> >> window summary that this seems to have been merged already. That
> >> situation is less than ideal, although I don't expect any really bad
> >> problems, given that there had not been any negative feedback for the
> >> spec proposal that I remember.
> >
> > Let's open a 1.3 branch? What do you think?
> 
> Yes, that's probably best, before things start piling up.

OK, want to do it? And we can then start voting on 1.3 things
straight away.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 8/8] virtio_ring.h: do not include from exported header

2022-04-05 Thread Michael S. Tsirkin
On Tue, Apr 05, 2022 at 08:29:36AM +0200, Arnd Bergmann wrote:
> On Tue, Apr 5, 2022 at 7:35 AM Christoph Hellwig  wrote:
> >
> > On Mon, Apr 04, 2022 at 10:04:02AM +0200, Arnd Bergmann wrote:
> > > The header is shared between kernel and other projects using virtio, such 
> > > as
> > > qemu and any boot loaders booting from virtio devices. It's not 
> > > technically a
> > > /kernel/ ABI, but it is an ABI and for practical reasons the kernel 
> > > version is
> > > maintained as the master copy if I understand it correctly.
> >
> > Besides that fact that as you correctly states these are not a UAPI at
> > all, qemu and bootloades are not specific to Linux and can't require a
> > specific kernel version.  So the same thing we do for file system
> > formats or network protocols applies here:  just copy the damn header.
> > And as stated above any reasonably portable userspace needs to have a
> > copy anyway.
> 
> I think the users all have their own copies, at least the ones I could
> find on codesearch.debian.org.

kvmtool does not seem to have its own copy, just grep vring_init.

> However, there are 27 virtio_*.h
> files in include/uapi/linux that probably should stay together for
> the purpose of defining the virtio protocol, and some others might
> be uapi relevant.
> 
> I see that at least include/uapi/linux/vhost.h has ioctl() definitions
> in it, and includes the virtio_ring.h header indirectly.
> 
> Adding the virtio maintainers to Cc to see if they can provide
> more background on this.
> 
> > If it is just as a "master copy" it can live in drivers/virtio/, just
> > like we do for other formats.
> 
> It has to be in include/linux/ at least because it's used by a number
> of drivers outside of drivers/virtio/.
> 
> Arnd
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 8/8] virtio_ring.h: do not include from exported header

2022-04-05 Thread Michael S. Tsirkin
On Tue, Apr 05, 2022 at 12:01:35AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 08:29:36AM +0200, Arnd Bergmann wrote:
> > I think the users all have their own copies, at least the ones I could
> > find on codesearch.debian.org. However, there are 27 virtio_*.h
> > files in include/uapi/linux that probably should stay together for
> > the purpose of defining the virtio protocol, and some others might
> > be uapi relevant.
> > 
> > I see that at least include/uapi/linux/vhost.h has ioctl() definitions
> > in it, and includes the virtio_ring.h header indirectly.
> 
> Uhh.  We had a somilar mess (but at a smaller scale) in nvme, where
> the uapi nvme.h contained both the UAPI and the protocol definition.
> We took a hard break to only have a nvme_ioctl.h in the uapi header
> and linux/nvme.h for the protocol.  This did break a bit of userspace
> compilation (but not running obviously) at the time, but really made
> the headers much easier to main.  Some userspace keeps on copying
> nvme.h with the protocol definitions.

So far we are quite happy with the status quo, I don't see any issues
maintaining the headers. And yes, through vhost and vringh they are part
of UAPI.

Yes users have their own copies but they synch with the kernel.

That's generally. Specifically the vring_init thing is a legacy thingy
used by kvmtool and maybe others, and it inits the ring in the way that
vring/virtio expect.  Has been there since day 1 and we are careful not
to add more stuff like that, so I don't see a lot of gain from incurring
this pain for users.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/4] Introduce akcipher service for virtio-crypto

2022-04-04 Thread Michael S. Tsirkin
On Mon, Apr 04, 2022 at 05:39:24PM +0200, Cornelia Huck wrote:
> On Mon, Mar 07 2022, "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Mar 07, 2022 at 10:42:30AM +0800, zhenwei pi wrote:
> >> Hi, Michael & Lei
> >> 
> >> The full patchset has been reviewed by Gonglei, thanks to Gonglei.
> >> Should I modify the virtio crypto specification(use "__le32 akcipher_algo;"
> >> instead of "__le32 reserve;" only, see v1->v2 change), and start a new 
> >> issue
> >> for a revoting procedure?
> >
> > You can but not it probably will be deferred to 1.3. OK with you?
> >
> >> Also cc Cornelia Huck.
> 
> [Apologies, I'm horribly behind on my email backlog, and on virtio
> things in general :(]
> 
> The akcipher update had been deferred for 1.2, so I think it will be 1.3
> material. However, I just noticed while browsing the fine lwn.net merge
> window summary that this seems to have been merged already. That
> situation is less than ideal, although I don't expect any really bad
> problems, given that there had not been any negative feedback for the
> spec proposal that I remember.

Let's open a 1.3 branch? What do you think?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_ring: add unlikely annotation for free descs check

2022-04-04 Thread Michael S. Tsirkin
On Mon, Apr 04, 2022 at 11:11:16PM +0800, Xianting Tian wrote:
> I can't find it in next branch, will you apply this patch?

yes, thanks!

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RESEND V2 3/3] vdpa/mlx5: Use consistent RQT size

2022-04-04 Thread Michael S. Tsirkin
On Mon, Apr 04, 2022 at 11:07:36AM +, Eli Cohen wrote:
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 4, 2022 1:35 PM
> > To: Jason Wang 
> > Cc: Eli Cohen ; hdan...@sina.com; 
> > virtualization@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> > Subject: Re: [PATCH RESEND V2 3/3] vdpa/mlx5: Use consistent RQT size
> > 
> > On Tue, Mar 29, 2022 at 12:21:09PM +0800, Jason Wang wrote:
> > > From: Eli Cohen 
> > >
> > > The current code evaluates RQT size based on the configured number of
> > > virtqueues. This can raise an issue in the following scenario:
> > >
> > > Assume MQ was negotiated.
> > > 1. mlx5_vdpa_set_map() gets called.
> > > 2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
> > >than the configured max VQs.
> > > 3. A second set_map gets called, but now a smaller number of VQs is used
> > >to evaluate the size of the RQT.
> > > 4. handle_ctrl_mq() is called with a value larger than what the RQT can
> > >hold. This will emit errors and the driver state is compromised.
> > >
> > > To fix this, we use a new field in struct mlx5_vdpa_net to hold the
> > > required number of entries in the RQT. This value is evaluated in
> > > mlx5_vdpa_set_driver_features() where we have the negotiated features
> > > all set up.
> > >
> > > In addtion
> > 
> > addition?
> 
> Do you need me to send another version?

It's a bit easier that way but I can handle it manually too.

> If so, let's wait for Jason's reply.

Right.

> > 
> > > to that, we take into consideration the max capability of RQT
> > > entries early when the device is added so we don't need to take consider
> > > it when creating the RQT.
> > >
> > > Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
> > > max_vas / 2 and make the code clearer.
> > >
> > > Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
> > > Signed-off-by: Eli Cohen 
> > 
> > Jason I don't have your ack or S.O.B on this one.
> > 
> > 
> > > ---
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 61 +++
> > >  1 file changed, 21 insertions(+), 40 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 53b8c1a68f90..61bec1ed0bc9 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -161,6 +161,7 @@ struct mlx5_vdpa_net {
> > >   struct mlx5_flow_handle *rx_rule_mcast;
> > >   bool setup;
> > >   u32 cur_num_vqs;
> > > + u32 rqt_size;
> > >   struct notifier_block nb;
> > >   struct vdpa_callback config_cb;
> > >   struct mlx5_vdpa_wq_ent cvq_ent;
> > > @@ -204,17 +205,12 @@ static __virtio16 cpu_to_mlx5vdpa16(struct 
> > > mlx5_vdpa_dev *mvdev, u16 val)
> > >   return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), val);
> > >  }
> > >
> > > -static inline u32 mlx5_vdpa_max_qps(int max_vqs)
> > > -{
> > > - return max_vqs / 2;
> > > -}
> > > -
> > >  static u16 ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev)
> > >  {
> > >   if (!(mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
> > >   return 2;
> > >
> > > - return 2 * mlx5_vdpa_max_qps(mvdev->max_vqs);
> > > + return mvdev->max_vqs;
> > >  }
> > >
> > >  static bool is_ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev, u16 idx)
> > > @@ -1236,25 +1232,13 @@ static void teardown_vq(struct mlx5_vdpa_net 
> > > *ndev, struct mlx5_vdpa_virtqueue *
> > >  static int create_rqt(struct mlx5_vdpa_net *ndev)
> > >  {
> > >   __be32 *list;
> > > - int max_rqt;
> > >   void *rqtc;
> > >   int inlen;
> > >   void *in;
> > >   int i, j;
> > >   int err;
> > > - int num;
> > > -
> > > - if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
> > > - num = 1;
> > > - else
> > > - num = ndev->cur_num_vqs / 2;
> > >
> > > - max_rqt = min_t(int, roundup_pow_of_two(num),
> > > - 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
> > > - if (max_rqt < 1)
> > > - return -EOPNOTSUPP;
> > > -
> > > - inlen = MLX5_ST_SZ_BYTES(create_rqt_in

Re: [PATCH RESEND V2 3/3] vdpa/mlx5: Use consistent RQT size

2022-04-04 Thread Michael S. Tsirkin
On Tue, Mar 29, 2022 at 12:21:09PM +0800, Jason Wang wrote:
> From: Eli Cohen 
> 
> The current code evaluates RQT size based on the configured number of
> virtqueues. This can raise an issue in the following scenario:
> 
> Assume MQ was negotiated.
> 1. mlx5_vdpa_set_map() gets called.
> 2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
>than the configured max VQs.
> 3. A second set_map gets called, but now a smaller number of VQs is used
>to evaluate the size of the RQT.
> 4. handle_ctrl_mq() is called with a value larger than what the RQT can
>hold. This will emit errors and the driver state is compromised.
> 
> To fix this, we use a new field in struct mlx5_vdpa_net to hold the
> required number of entries in the RQT. This value is evaluated in
> mlx5_vdpa_set_driver_features() where we have the negotiated features
> all set up.
> 
> In addtion

addition?

> to that, we take into consideration the max capability of RQT
> entries early when the device is added so we don't need to take consider
> it when creating the RQT.
> 
> Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
> max_vas / 2 and make the code clearer.
> 
> Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
> Signed-off-by: Eli Cohen 

Jason I don't have your ack or S.O.B on this one.


> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 61 +++
>  1 file changed, 21 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 53b8c1a68f90..61bec1ed0bc9 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -161,6 +161,7 @@ struct mlx5_vdpa_net {
>   struct mlx5_flow_handle *rx_rule_mcast;
>   bool setup;
>   u32 cur_num_vqs;
> + u32 rqt_size;
>   struct notifier_block nb;
>   struct vdpa_callback config_cb;
>   struct mlx5_vdpa_wq_ent cvq_ent;
> @@ -204,17 +205,12 @@ static __virtio16 cpu_to_mlx5vdpa16(struct 
> mlx5_vdpa_dev *mvdev, u16 val)
>   return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), val);
>  }
>  
> -static inline u32 mlx5_vdpa_max_qps(int max_vqs)
> -{
> - return max_vqs / 2;
> -}
> -
>  static u16 ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev)
>  {
>   if (!(mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
>   return 2;
>  
> - return 2 * mlx5_vdpa_max_qps(mvdev->max_vqs);
> + return mvdev->max_vqs;
>  }
>  
>  static bool is_ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev, u16 idx)
> @@ -1236,25 +1232,13 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtqueue *
>  static int create_rqt(struct mlx5_vdpa_net *ndev)
>  {
>   __be32 *list;
> - int max_rqt;
>   void *rqtc;
>   int inlen;
>   void *in;
>   int i, j;
>   int err;
> - int num;
> -
> - if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
> - num = 1;
> - else
> - num = ndev->cur_num_vqs / 2;
>  
> - max_rqt = min_t(int, roundup_pow_of_two(num),
> - 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
> - if (max_rqt < 1)
> - return -EOPNOTSUPP;
> -
> - inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + max_rqt * 
> MLX5_ST_SZ_BYTES(rq_num);
> + inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + ndev->rqt_size * 
> MLX5_ST_SZ_BYTES(rq_num);
>   in = kzalloc(inlen, GFP_KERNEL);
>   if (!in)
>   return -ENOMEM;
> @@ -1263,12 +1247,12 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
>   rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
>  
>   MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
> - MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt);
> + MLX5_SET(rqtc, rqtc, rqt_max_size, ndev->rqt_size);
>   list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
> - for (i = 0, j = 0; i < max_rqt; i++, j += 2)
> - list[i] = cpu_to_be32(ndev->vqs[j % (2 * num)].virtq_id);
> + for (i = 0, j = 0; i < ndev->rqt_size; i++, j += 2)
> + list[i] = cpu_to_be32(ndev->vqs[j % 
> ndev->cur_num_vqs].virtq_id);
>  
> - MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt);
> + MLX5_SET(rqtc, rqtc, rqt_actual_size, ndev->rqt_size);
>   err = mlx5_vdpa_create_rqt(>mvdev, in, inlen, >res.rqtn);
>   kfree(in);
>   if (err)
> @@ -1282,19 +1266,13 @@ static int create_rqt(struct mlx5_vdpa_net *ndev)
>  static int modify_rqt(struct mlx5_vdpa_net *ndev, int num)
>  {
>   __be32 *list;
> - int max_rqt;
>   void *rqtc;
>   int inlen;
>   void *in;
>   int i, j;
>   int err;
>  
> - max_rqt = min_t(int, roundup_pow_of_two(ndev->cur_num_vqs / 2),
> - 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
> - if (max_rqt < 1)
> - return -EOPNOTSUPP;
> -
> - inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + max_rqt * 
> MLX5_ST_SZ_BYTES(rq_num);
> + inlen = 

[GIT PULL] virtio: fixes, cleanups

2022-04-04 Thread Michael S. Tsirkin
The following changes since commit ad6dc1daaf29f97f23cc810d60ee01c0e83f4c6b:

  vdpa/mlx5: Avoid processing works if workqueue was destroyed (2022-03-28 
16:54:30 -0400)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 1c80cf031e0204fde471558ee40183695773ce13:

  vdpa: mlx5: synchronize driver status with CVQ (2022-03-30 04:18:14 -0400)


virtio: fixes, cleanups

A couple of mlx5 fixes related to cvq
A couple of reverts dropping useless code (code that used it got reverted
earlier)

Signed-off-by: Michael S. Tsirkin 


Jason Wang (2):
  vdpa: mlx5: prevent cvq work from hogging CPU
  vdpa: mlx5: synchronize driver status with CVQ

Michael S. Tsirkin (2):
  Revert "virtio: use virtio_device_ready() in virtio_device_restore()"
  Revert "virtio_config: introduce a new .enable_cbs method"

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 62 ++-
 drivers/virtio/virtio.c   |  5 ++--
 include/linux/virtio_config.h |  6 
 3 files changed, 43 insertions(+), 30 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[GIT PULL] virtio: features, fixes

2022-03-31 Thread Michael S. Tsirkin
The following changes since commit f443e374ae131c168a065ea1748feac6b2e76613:

  Linux 5.17 (2022-03-20 13:14:17 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to ad6dc1daaf29f97f23cc810d60ee01c0e83f4c6b:

  vdpa/mlx5: Avoid processing works if workqueue was destroyed (2022-03-28 
16:54:30 -0400)


virtio: features, fixes

vdpa generic device type support
More virtio hardening for broken devices
On the same theme, revert some virtio hotplug hardening patches -
they were misusing some interrupt flags, will have to be reverted.
RSS support in virtio-net
max device MTU support in mlx5 vdpa
akcipher support in virtio-crypto
shared IRQ support in ifcvf vdpa
a minor performance improvement in vhost
Enable virtio mem for ARM64
beginnings of advance dma support

Cleanups, fixes all over the place.

Signed-off-by: Michael S. Tsirkin 


Andrew Melnychenko (4):
  drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.
  drivers/net/virtio_net: Added basic RSS support.
  drivers/net/virtio_net: Added RSS hash report.
  drivers/net/virtio_net: Added RSS hash report control.

Anirudh Rayabharam (1):
  vhost: handle error while adding split ranges to iotlb

Eli Cohen (2):
  net/mlx5: Add support for configuring max device MTU
  vdpa/mlx5: Avoid processing works if workqueue was destroyed

Gautam Dawar (1):
  Add definition of VIRTIO_F_IN_ORDER feature bit

Gavin Shan (1):
  drivers/virtio: Enable virtio mem for ARM64

Jason Wang (2):
  Revert "virtio-pci: harden INTX interrupts"
  Revert "virtio_pci: harden MSI-X interrupts"

Keir Fraser (1):
  virtio: pci: check bar values read from virtio config space

Longpeng (3):
  vdpa: support exposing the config size to userspace
  vdpa: change the type of nvqs to u32
  vdpa: support exposing the count of vqs to userspace

Miaohe Lin (1):
  mm/balloon_compaction: make balloon page compaction callbacks static

Michael Qiu (1):
  vdpa/mlx5: re-create forwarding rules after mac modified

Michael S. Tsirkin (2):
  tools/virtio: fix after premapped buf support
  tools/virtio: compile with -pthread

Stefano Garzarella (2):
  vhost: cache avail index in vhost_enable_notify()
  virtio: use virtio_device_ready() in virtio_device_restore()

Xuan Zhuo (3):
  virtio_ring: rename vring_unmap_state_packed() to 
vring_unmap_extra_packed()
  virtio_ring: remove flags check for unmap split indirect desc
  virtio_ring: remove flags check for unmap packed indirect desc

Zhu Lingshan (5):
  vDPA/ifcvf: make use of virtio pci modern IO helpers in ifcvf
  vhost_vdpa: don't setup irq offloading when irq_num < 0
  vDPA/ifcvf: implement device MSIX vector allocator
  vDPA/ifcvf: implement shared IRQ feature
  vDPA/ifcvf: cacheline alignment for ifcvf_hw

zhenwei pi (4):
  virtio_crypto: Introduce VIRTIO_CRYPTO_NOSPC
  virtio-crypto: introduce akcipher service
  virtio-crypto: implement RSA algorithm
  virtio-crypto: rename skcipher algs

 drivers/crypto/virtio/Kconfig  |   3 +
 drivers/crypto/virtio/Makefile |   3 +-
 .../crypto/virtio/virtio_crypto_akcipher_algs.c| 585 +
 drivers/crypto/virtio/virtio_crypto_common.h   |   7 +-
 drivers/crypto/virtio/virtio_crypto_core.c |   6 +-
 drivers/crypto/virtio/virtio_crypto_mgr.c  |  17 +-
 ...crypto_algs.c => virtio_crypto_skcipher_algs.c} |   4 +-
 drivers/net/virtio_net.c   | 389 +-
 drivers/vdpa/ifcvf/ifcvf_base.c| 140 ++---
 drivers/vdpa/ifcvf/ifcvf_base.h|  24 +-
 drivers/vdpa/ifcvf/ifcvf_main.c| 323 ++--
 drivers/vdpa/mlx5/net/mlx5_vnet.c  |  84 ++-
 drivers/vdpa/vdpa.c|   6 +-
 drivers/vhost/iotlb.c  |   6 +-
 drivers/vhost/vdpa.c   |  45 +-
 drivers/vhost/vhost.c  |   3 +-
 drivers/virtio/Kconfig |   7 +-
 drivers/virtio/virtio.c|   5 +-
 drivers/virtio/virtio_pci_common.c |  48 +-
 drivers/virtio/virtio_pci_common.h |   7 +-
 drivers/virtio/virtio_pci_legacy.c |   5 +-
 drivers/virtio/virtio_pci_modern.c |  18 +-
 drivers/virtio/virtio_pci_modern_dev.c |   9 +-
 drivers/virtio/virtio_ring.c   |  53 +-
 include/linux/balloon_compaction.h |  22 -
 include/linux/vdpa.h   |   9 +-
 include/uapi/linux/vhost.h |   7 +
 include/

Re: [PATCH v2 0/9] virtio: support advance DMA

2022-03-30 Thread Michael S. Tsirkin
On Wed, Mar 30, 2022 at 06:51:53PM +0800, Xuan Zhuo wrote:
> On Wed, 30 Mar 2022 06:51:03 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Wed, Mar 30, 2022 at 05:03:32PM +0800, Xuan Zhuo wrote:
> > > On Wed, 30 Mar 2022 16:38:18 +0800, Jason Wang  
> > > wrote:
> > > > On Wed, Mar 30, 2022 at 2:59 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Wed, 30 Mar 2022 14:56:17 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Wed, Mar 30, 2022 at 2:34 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, Feb 24, 2022 at 07:03:53PM +0800, Xuan Zhuo wrote:
> > > > > > > > virtqueue_add() only supports virtual addresses, dma is 
> > > > > > > > completed in
> > > > > > > > virtqueue_add().
> > > > > > > >
> > > > > > > > In some scenarios (such as the AF_XDP scenario), DMA is 
> > > > > > > > completed in advance, so
> > > > > > > > it is necessary for us to support passing the DMA address to 
> > > > > > > > virtqueue_add().
> > > > > > >
> > > > > > > I picked up a couple of patches. Others are waiting for some acks
> > > > > > > (Jason?) and improved commit logs for documentation.
> > > > > >
> > > > > > I will review them.
> > > > >
> > > > > hi, the core code of premapped, I will merge it into 'virtio pci 
> > > > > support
> > > > > VIRTIO_F_RING_RESET' because this function will be used when reusing 
> > > > > the buffer
> > > > > after resize.
> > > >
> > > > I still prefer not to do that.
> > > >
> > > > We can make rest work for resize first and add pre mapping on top. It
> > > > will simplify the review.
> > >
> > > Yes, I am also worried about the review problem, the number of my local 
> > > resize
> > > patch has reached 44 (including reuse bufs).
> > >
> > > hi, Michael, can we implement resize on top of v8 first? (drop unused 
> > > bufs directly)
> > >
> > > Then we implement premmapd and reuse the bufs after resize.
> > >
> > > We need to get the address (DMA address) and len from the reset ring and 
> > > submit
> > > it to the new vq through virtqueue_add(). So let virtqueue_add() support
> > > premapped first.
> > >
> > > Thanks.
> >
> > Not sure I understand.
> > So the plan is
> > - remap
> > - resize on top
> > ?
> 
> #1 resize with drop unused bufs directly
> #2 premapped and resize support reuse the unused bufs
> 
> This way "premaped" will have a user.
> 
> Between #1 and #2, I may submit some code with optimized formatting, because
> jason doesn't like my introduction of struct vring_split and struct 
> vring_packed
> in #1 for passing parameters between extracted functions.
> 
> Thanks.

So let's start with 1. Direct drop for now is ok.


> 
> >
> >
> >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > > v2:
> > > > > > > > 1. rename predma -> premapped
> > > > > > > > 2. virtio net xdp tx use virtio dma api
> > > > > > > >
> > > > > > > > v1:
> > > > > > > >1. All sgs requested at one time are required to be unified 
> > > > > > > > PREDMA, and several
> > > > > > > >   of them are not supported to be PREDMA
> > > > > > > >2. virtio_dma_map() is removed from this patch set and will 
> > > > > > > > be submitted
> > > > > > > >   together with the next time AF_XDP supports virtio dma
> > > > > > > >3. Added patch #2 #3 to remove the check for flags when 
> > > > > > > > performing unmap
> > > > > > > >   indirect desc
> > > > > > > >
> > > > > > > > Xuan Zhuo (9):
> > > > > > > >   virtio_ring: rename vring_unmap_state_packed() to
> > > > > > > > vring_unmap_extra_packed()
> > > > > > > >   virtio_ring: remove flags check for unmap split indirect desc
> > > > > > > >   virtio_ring: remove flags check for unmap packed indirect desc
> > > > > > > >   virtio_ring: virtqueue_add() support premapped
> > > > > > > >   virtio_ring: split: virtqueue_add_split() support premapped
> > > > > > > >   virtio_ring: packed: virtqueue_add_packed() support premapped
> > > > > > > >   virtio_ring: add api virtio_dma_map() for advance dma
> > > > > > > >   virtio_ring: introduce virtqueue_add_outbuf_premapped()
> > > > > > > >   virtio_net: xdp xmit use virtio dma api
> > > > > > > >
> > > > > > > >  drivers/net/virtio_net.c |  42 +-
> > > > > > > >  drivers/virtio/virtio_ring.c | 280 
> > > > > > > > ++-
> > > > > > > >  include/linux/virtio.h   |  12 ++
> > > > > > > >  3 files changed, 254 insertions(+), 80 deletions(-)
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.31.0
> > > > > > >
> > > > > >
> > > > >
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 0/9] virtio: support advance DMA

2022-03-30 Thread Michael S. Tsirkin
On Wed, Mar 30, 2022 at 05:03:32PM +0800, Xuan Zhuo wrote:
> On Wed, 30 Mar 2022 16:38:18 +0800, Jason Wang  wrote:
> > On Wed, Mar 30, 2022 at 2:59 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Wed, 30 Mar 2022 14:56:17 +0800, Jason Wang  
> > > wrote:
> > > > On Wed, Mar 30, 2022 at 2:34 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Thu, Feb 24, 2022 at 07:03:53PM +0800, Xuan Zhuo wrote:
> > > > > > virtqueue_add() only supports virtual addresses, dma is completed in
> > > > > > virtqueue_add().
> > > > > >
> > > > > > In some scenarios (such as the AF_XDP scenario), DMA is completed 
> > > > > > in advance, so
> > > > > > it is necessary for us to support passing the DMA address to 
> > > > > > virtqueue_add().
> > > > >
> > > > > I picked up a couple of patches. Others are waiting for some acks
> > > > > (Jason?) and improved commit logs for documentation.
> > > >
> > > > I will review them.
> > >
> > > hi, the core code of premapped, I will merge it into 'virtio pci support
> > > VIRTIO_F_RING_RESET' because this function will be used when reusing the 
> > > buffer
> > > after resize.
> >
> > I still prefer not to do that.
> >
> > We can make rest work for resize first and add pre mapping on top. It
> > will simplify the review.
> 
> Yes, I am also worried about the review problem, the number of my local resize
> patch has reached 44 (including reuse bufs).
> 
> hi, Michael, can we implement resize on top of v8 first? (drop unused bufs 
> directly)
> 
> Then we implement premmapd and reuse the bufs after resize.
> 
> We need to get the address (DMA address) and len from the reset ring and 
> submit
> it to the new vq through virtqueue_add(). So let virtqueue_add() support
> premapped first.
> 
> Thanks.

Not sure I understand.
So the plan is
- remap
- resize on top
?



> 
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > > v2:
> > > > > > 1. rename predma -> premapped
> > > > > > 2. virtio net xdp tx use virtio dma api
> > > > > >
> > > > > > v1:
> > > > > >1. All sgs requested at one time are required to be unified 
> > > > > > PREDMA, and several
> > > > > >   of them are not supported to be PREDMA
> > > > > >2. virtio_dma_map() is removed from this patch set and will be 
> > > > > > submitted
> > > > > >   together with the next time AF_XDP supports virtio dma
> > > > > >3. Added patch #2 #3 to remove the check for flags when 
> > > > > > performing unmap
> > > > > >   indirect desc
> > > > > >
> > > > > > Xuan Zhuo (9):
> > > > > >   virtio_ring: rename vring_unmap_state_packed() to
> > > > > > vring_unmap_extra_packed()
> > > > > >   virtio_ring: remove flags check for unmap split indirect desc
> > > > > >   virtio_ring: remove flags check for unmap packed indirect desc
> > > > > >   virtio_ring: virtqueue_add() support premapped
> > > > > >   virtio_ring: split: virtqueue_add_split() support premapped
> > > > > >   virtio_ring: packed: virtqueue_add_packed() support premapped
> > > > > >   virtio_ring: add api virtio_dma_map() for advance dma
> > > > > >   virtio_ring: introduce virtqueue_add_outbuf_premapped()
> > > > > >   virtio_net: xdp xmit use virtio dma api
> > > > > >
> > > > > >  drivers/net/virtio_net.c |  42 +-
> > > > > >  drivers/virtio/virtio_ring.c | 280 
> > > > > > ++-
> > > > > >  include/linux/virtio.h   |  12 ++
> > > > > >  3 files changed, 254 insertions(+), 80 deletions(-)
> > > > > >
> > > > > > --
> > > > > > 2.31.0
> > > > >
> > > >
> > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio: pci: check bar values read from virtio config space

2022-03-30 Thread Michael S. Tsirkin
On Wed, Mar 30, 2022 at 10:43:40AM +0800, Jason Wang wrote:
> On Wed, Mar 23, 2022 at 10:07 PM Keir Fraser  wrote:
> >
> > virtio pci config structures may in future have non-standard bar
> > values in the bar field. We should anticipate this by skipping any
> > structures containing such a reserved value.
> >
> > The bar value should never change: check for harmful modified values
> > we re-read it from the config space in vp_modern_map_capability().
> >
> > Also clean up an existing check to consistently use PCI_STD_NUM_BARS.
> >
> > Signed-off-by: Keir Fraser 
> 
> Acked-by: Jason Wang 
> 


Thanks! I don't want to rebase anymore though so not adding this ack.
Sorry!

> >  drivers/virtio/virtio_pci_modern.c | 12 +---
> >  drivers/virtio/virtio_pci_modern_dev.c |  9 -
> >  2 files changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_modern.c 
> > b/drivers/virtio/virtio_pci_modern.c
> > index 5455bc041fb6..6adfcd0297a7 100644
> > --- a/drivers/virtio/virtio_pci_modern.c
> > +++ b/drivers/virtio/virtio_pci_modern.c
> > @@ -293,7 +293,7 @@ static int virtio_pci_find_shm_cap(struct pci_dev *dev, 
> > u8 required_id,
> >
> > for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR); pos > 0;
> >  pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> > -   u8 type, cap_len, id;
> > +   u8 type, cap_len, id, res_bar;
> > u32 tmp32;
> > u64 res_offset, res_length;
> >
> > @@ -315,9 +315,14 @@ static int virtio_pci_find_shm_cap(struct pci_dev 
> > *dev, u8 required_id,
> > if (id != required_id)
> > continue;
> >
> > -   /* Type, and ID match, looks good */
> > pci_read_config_byte(dev, pos + offsetof(struct 
> > virtio_pci_cap,
> > -bar), bar);
> > +bar), _bar);
> > +   if (res_bar >= PCI_STD_NUM_BARS)
> > +   continue;
> > +
> > +   /* Type and ID match, and the BAR value isn't reserved.
> > +* Looks good.
> > +*/
> >
> > /* Read the lower 32bit of length and offset */
> > pci_read_config_dword(dev, pos + offsetof(struct 
> > virtio_pci_cap,
> > @@ -337,6 +342,7 @@ static int virtio_pci_find_shm_cap(struct pci_dev *dev, 
> > u8 required_id,
> >  length_hi), );
> > res_length |= ((u64)tmp32) << 32;
> >
> > +   *bar = res_bar;
> > *offset = res_offset;
> > *len = res_length;
> >
> > diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> > b/drivers/virtio/virtio_pci_modern_dev.c
> > index e8b3ff2b9fbc..591738ad3d56 100644
> > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > @@ -35,6 +35,13 @@ vp_modern_map_capability(struct virtio_pci_modern_device 
> > *mdev, int off,
> > pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, 
> > length),
> >   );
> >
> > +   /* Check if the BAR may have changed since we requested the region. 
> > */
> > +   if (bar >= PCI_STD_NUM_BARS || !(mdev->modern_bars & (1 << bar))) {
> > +   dev_err(>dev,
> > +   "virtio_pci: bar unexpectedly changed to %u\n", 
> > bar);
> > +   return NULL;
> > +   }
> > +
> > if (length <= start) {
> > dev_err(>dev,
> > "virtio_pci: bad capability len %u (>%u 
> > expected)\n",
> > @@ -120,7 +127,7 @@ static inline int virtio_pci_find_capability(struct 
> > pci_dev *dev, u8 cfg_type,
> >  );
> >
> > /* Ignore structures with reserved BAR values */
> > -   if (bar > 0x5)
> > +   if (bar >= PCI_STD_NUM_BARS)
> > continue;
> >
> > if (type == cfg_type) {
> > --
> > 2.35.1.894.gb6a874cedc-goog
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 0/9] virtio: support advance DMA

2022-03-30 Thread Michael S. Tsirkin
On Thu, Feb 24, 2022 at 07:03:53PM +0800, Xuan Zhuo wrote:
> virtqueue_add() only supports virtual addresses, dma is completed in
> virtqueue_add().
> 
> In some scenarios (such as the AF_XDP scenario), DMA is completed in advance, 
> so
> it is necessary for us to support passing the DMA address to virtqueue_add().

I picked up a couple of patches. Others are waiting for some acks
(Jason?) and improved commit logs for documentation.

Thanks!

> v2:
> 1. rename predma -> premapped
> 2. virtio net xdp tx use virtio dma api
> 
> v1:
>1. All sgs requested at one time are required to be unified PREDMA, and 
> several
>   of them are not supported to be PREDMA
>2. virtio_dma_map() is removed from this patch set and will be submitted
>   together with the next time AF_XDP supports virtio dma
>3. Added patch #2 #3 to remove the check for flags when performing unmap
>   indirect desc
> 
> Xuan Zhuo (9):
>   virtio_ring: rename vring_unmap_state_packed() to
> vring_unmap_extra_packed()
>   virtio_ring: remove flags check for unmap split indirect desc
>   virtio_ring: remove flags check for unmap packed indirect desc
>   virtio_ring: virtqueue_add() support premapped
>   virtio_ring: split: virtqueue_add_split() support premapped
>   virtio_ring: packed: virtqueue_add_packed() support premapped
>   virtio_ring: add api virtio_dma_map() for advance dma
>   virtio_ring: introduce virtqueue_add_outbuf_premapped()
>   virtio_net: xdp xmit use virtio dma api
> 
>  drivers/net/virtio_net.c |  42 +-
>  drivers/virtio/virtio_ring.c | 280 ++-
>  include/linux/virtio.h   |  12 ++
>  3 files changed, 254 insertions(+), 80 deletions(-)
> 
> --
> 2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] Revert "virtio_config: introduce a new .enable_cbs method"

2022-03-30 Thread Michael S. Tsirkin
This reverts commit d50497eb4e554e1f0351e1836ee7241c059592e6.

The new callback ended up not being used, and it's asymmetrical:
just enable, no disable.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/virtio_config.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index dafdc7f48c01..b341dd62aa4d 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -23,8 +23,6 @@ struct virtio_shm_region {
  *   any of @get/@set, @get_status/@set_status, or @get_features/
  *   @finalize_features are NOT safe to be called from an atomic
  *   context.
- * @enable_cbs: enable the callbacks
- *  vdev: the virtio_device
  * @get: read the value of a configuration field
  * vdev: the virtio_device
  * offset: the offset of the configuration field
@@ -78,7 +76,6 @@ struct virtio_shm_region {
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
-   void (*enable_cbs)(struct virtio_device *vdev);
void (*get)(struct virtio_device *vdev, unsigned offset,
void *buf, unsigned len);
void (*set)(struct virtio_device *vdev, unsigned offset,
@@ -233,9 +230,6 @@ void virtio_device_ready(struct virtio_device *dev)
 {
unsigned status = dev->config->get_status(dev);
 
-   if (dev->config->enable_cbs)
-  dev->config->enable_cbs(dev);
-
BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
 }
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] Revert "virtio: use virtio_device_ready() in virtio_device_restore()"

2022-03-30 Thread Michael S. Tsirkin
This reverts commit 8d65bc9a5be3f23c5e2ab36b6b8ef40095165b18.

We reverted the problematic changes, no more need for work
arounds on restore.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 75c8d560bbd3..22f15f444f75 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -526,9 +526,8 @@ int virtio_device_restore(struct virtio_device *dev)
goto err;
}
 
-   /* If restore didn't do it, mark device DRIVER_OK ourselves. */
-   if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
-   virtio_device_ready(dev);
+   /* Finally, tell the device we're all set */
+   virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
 
virtio_config_enable(dev);
 
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: use virtio_device_ready() in virtio_device_restore()

2022-03-29 Thread Michael S. Tsirkin
On Fri, Mar 25, 2022 at 11:05:22AM +0800, Jason Wang wrote:
> 
> 在 2022/3/24 下午7:31, Stefano Garzarella 写道:
> > On Thu, Mar 24, 2022 at 07:07:09AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Mar 24, 2022 at 12:03:07PM +0100, Stefano Garzarella wrote:
> > > > On Thu, Mar 24, 2022 at 06:48:05AM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Mar 24, 2022 at 04:40:02PM +0800, Jason Wang wrote:
> > > > > > From: Stefano Garzarella 
> > > > > >
> > > > > > This avoids setting DRIVER_OK twice for those drivers that call
> > > > > > virtio_device_ready() in the .restore
> > > > >
> > > > > Is this trying to say it's faster?
> > > > 
> > > > Nope, I mean, when I wrote the original version, I meant to do the same
> > > > things that we do in virtio_dev_probe() where we called
> > > > virtio_device_ready() which not only set the state, but also called
> > > > .enable_cbs callback.
> > > > 
> > > > Was this a side effect and maybe more compliant with the spec?
> > > 
> > > 
> > > Sorry I don't understand the question. it says "avoids setting
> > > DRIVER_OK twice" -
> > > why is that advantageous and worth calling out in the commit log?
> > 
> > I just wanted to say that it seems strange to set DRIVER_OK twice if we
> > read the spec. I don't think it's wrong, but weird.
> > 
> > Yes, maybe we should rewrite the commit message saying that we want to
> > use virtio_device_ready() everywhere to complete the setup before
> > setting DRIVER_OK so we can do all the necessary operations inside (like
> > in patch 3 or call enable_cbs).
> > 
> > Jason rewrote the commit log, so I don't know if he agrees.
> > 
> > Thanks,
> > Stefano
> 
> 
> I agree, I will tweak the log in V2.
> 
> Thanks

Still waiting for that v2.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re:

2022-03-29 Thread Michael S. Tsirkin
On Wed, Mar 30, 2022 at 10:40:59AM +0800, Jason Wang wrote:
> On Tue, Mar 29, 2022 at 10:09 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Mar 29, 2022 at 03:12:14PM +0800, Jason Wang wrote:
> > > > > > > > And requesting irq commits all memory otherwise all drivers 
> > > > > > > > would be
> > > > > > > > broken,
> > > > > > >
> > > > > > > So I think we might talk different issues:
> > > > > > >
> > > > > > > 1) Whether request_irq() commits the previous setups, I think the
> > > > > > > answer is yes, since the spin_unlock of desc->lock (release) can
> > > > > > > guarantee this though there seems no documentation around
> > > > > > > request_irq() to say this.
> > > > > > >
> > > > > > > And I can see at least 
> > > > > > > drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> > > > > > > using smp_wmb() before the request_irq().
> > > > > > >
> > > > > > > And even if write is ordered we still need read to be ordered to 
> > > > > > > be
> > > > > > > paired with that.
> > > >
> > > > IMO it synchronizes with the CPU to which irq is
> > > > delivered. Otherwise basically all drivers would be broken,
> > > > wouldn't they be?
> > >
> > > I guess it's because most of the drivers don't care much about the
> > > buggy/malicious device.  And most of the devices may require an extra
> > > step to enable device IRQ after request_irq(). Or it's the charge of
> > > the driver to do the synchronization.
> >
> > It is true that the use-case of malicious devices is somewhat boutique.
> > But I think most drivers do want to have their hotplug routines to be
> > robust, yes.
> >
> > > > I don't know whether it's correct on all platforms, but if not
> > > > we need to fix request_irq.
> > > >
> > > > > > >
> > > > > > > > if it doesn't it just needs to be fixed, not worked around in
> > > > > > > > virtio.
> > > > > > >
> > > > > > > 2) virtio drivers might do a lot of setups between request_irq() 
> > > > > > > and
> > > > > > > virtio_device_ready():
> > > > > > >
> > > > > > > request_irq()
> > > > > > > driver specific setups
> > > > > > > virtio_device_ready()
> > > > > > >
> > > > > > > CPU 0 probe) request_irq()
> > > > > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > > > > CPU 0 probe) driver specific setups
> > > > > > > CPU 0 probe) smp_store_release(intr_soft_enabled, true), commit 
> > > > > > > the setups
> > > > > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > > > > CPU 1 IRQ handler) use the uninitialized variable
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > As I said, virtio_device_ready needs to do synchronize_irq.
> > > > > > That will guarantee all setup is visible to the specific IRQ,
> > > > >
> > > > > Only the interrupt after synchronize_irq() returns.
> > > >
> > > > Anything else is a buggy device though.
> > >
> > > Yes, but the goal of this patch is to prevent the possible attack from
> > > buggy(malicious) devices.
> >
> > Right. However if a driver of a *buggy* device somehow sees driver_ok =
> > false even though it's actually initialized, that is not a deal breaker
> > as that does not open us up to an attack.
> >
> > > >
> > > > > >this
> > > > > > is what it's point is.
> > > > >
> > > > > What happens if an interrupt is raised in the middle like:
> > > > >
> > > > > smp_store_release(dev->irq_soft_enabled, true)
> > > > > IRQ handler
> > > > > synchornize_irq()
> > > > >
> > > > > If we don't enforce a reading order, the IRQ handler may still see the
> > > > > uninitialized variable.
> > > > >
> > > > > Thanks
> > > >
> &

Re:

2022-03-29 Thread Michael S. Tsirkin
On Wed, Mar 30, 2022 at 10:38:06AM +0800, Jason Wang wrote:
> On Wed, Mar 30, 2022 at 6:04 AM Michael S. Tsirkin  wrote:
> >
> > On Tue, Mar 29, 2022 at 08:13:57PM +0200, Thomas Gleixner wrote:
> > > On Tue, Mar 29 2022 at 10:37, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 29, 2022 at 10:35:21AM +0200, Thomas Gleixner wrote:
> > > > We are trying to fix the driver since at the moment it does not
> > > > have the dev->ok flag at all.
> > > >
> > > > And I suspect virtio is not alone in that.
> > > > So it would have been nice if there was a standard flag
> > > > replacing the driver-specific dev->ok above, and ideally
> > > > would also handle the case of an interrupt triggering
> > > > too early by deferring the interrupt until the flag is set.
> > > >
> > > > And in fact, it does kind of exist: IRQF_NO_AUTOEN, and you would call
> > > > enable_irq instead of dev->ok = true, except
> > > > - it doesn't work with affinity managed IRQs
> > > > - it does not work with shared IRQs
> > > >
> > > > So using dev->ok as you propose above seems better at this point.
> > >
> > > Unless there is a big enough amount of drivers which could make use of a
> > > generic mechanism for that.
> > >
> > > >> If any driver does this in the wrong order, then the driver is
> > > >> broken.
> > > >
> > > > I agree, however:
> > > > $ git grep synchronize_irq `git grep -l request_irq drivers/net/`|wc -l
> > > > 113
> > > > $ git grep -l request_irq drivers/net/|wc -l
> > > > 397
> > > >
> > > > I suspect there are more drivers which in theory need the
> > > > synchronize_irq dance but in practice do not execute it.
> > >
> > > That really depends on when the driver requests the interrupt, when
> > > it actually enables the interrupt in the device itself
> >
> > This last point does not matter since we are talking about protecting
> > against buggy/malicious devices. They can inject the interrupt anyway
> > even if driver did not configure it.
> >
> > > and how the
> > > interrupt service routine works.
> > >
> > > So just doing that grep dance does not tell much. You really have to do
> > > a case by case analysis.
> > >
> > > Thanks,
> > >
> > > tglx
> >
> >
> > I agree. In fact, at least for network the standard approach is to
> > request interrupts in the open call, virtio net is unusual
> > in doing it in probe. We should consider changing that.
> > Jason?
> 
> This probably works only for virtio-net and it looks like not trivial
> since we don't have a specific core API to request interrupts.
> 
> Thanks

We'll need a new API, for sure. E.g.  find vqs with no
callback on probe, and then virtio_request_vq_callbacks separately.

The existing API that specifies callbacks during find vqs
can be used by other drivers.

> >
> > --
> > MST
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re:

2022-03-29 Thread Michael S. Tsirkin
On Tue, Mar 29, 2022 at 08:13:57PM +0200, Thomas Gleixner wrote:
> On Tue, Mar 29 2022 at 10:37, Michael S. Tsirkin wrote:
> > On Tue, Mar 29, 2022 at 10:35:21AM +0200, Thomas Gleixner wrote:
> > We are trying to fix the driver since at the moment it does not
> > have the dev->ok flag at all.
> >
> > And I suspect virtio is not alone in that.
> > So it would have been nice if there was a standard flag
> > replacing the driver-specific dev->ok above, and ideally
> > would also handle the case of an interrupt triggering
> > too early by deferring the interrupt until the flag is set.
> >
> > And in fact, it does kind of exist: IRQF_NO_AUTOEN, and you would call
> > enable_irq instead of dev->ok = true, except
> > - it doesn't work with affinity managed IRQs
> > - it does not work with shared IRQs
> >
> > So using dev->ok as you propose above seems better at this point.
> 
> Unless there is a big enough amount of drivers which could make use of a
> generic mechanism for that.
> 
> >> If any driver does this in the wrong order, then the driver is
> >> broken.
> > 
> > I agree, however:
> > $ git grep synchronize_irq `git grep -l request_irq drivers/net/`|wc -l
> > 113
> > $ git grep -l request_irq drivers/net/|wc -l
> > 397
> >
> > I suspect there are more drivers which in theory need the
> > synchronize_irq dance but in practice do not execute it.
> 
> That really depends on when the driver requests the interrupt, when
> it actually enables the interrupt in the device itself

This last point does not matter since we are talking about protecting
against buggy/malicious devices. They can inject the interrupt anyway
even if driver did not configure it.

> and how the
> interrupt service routine works.
> 
> So just doing that grep dance does not tell much. You really have to do
> a case by case analysis.
> 
> Thanks,
> 
> tglx


I agree. In fact, at least for network the standard approach is to
request interrupts in the open call, virtio net is unusual
in doing it in probe. We should consider changing that.
Jason?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re:

2022-03-29 Thread Michael S. Tsirkin
On Tue, Mar 29, 2022 at 10:35:21AM +0200, Thomas Gleixner wrote:
> On Mon, Mar 28 2022 at 06:40, Michael S. Tsirkin wrote:
> > On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
> >> > > So I think we might talk different issues:
> >> > >
> >> > > 1) Whether request_irq() commits the previous setups, I think the
> >> > > answer is yes, since the spin_unlock of desc->lock (release) can
> >> > > guarantee this though there seems no documentation around
> >> > > request_irq() to say this.
> >> > >
> >> > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> >> > > using smp_wmb() before the request_irq().
> 
> That's a complete bogus example especially as there is not a single
> smp_rmb() which pairs with the smp_wmb().
> 
> >> > > And even if write is ordered we still need read to be ordered to be
> >> > > paired with that.
> >
> > IMO it synchronizes with the CPU to which irq is
> > delivered. Otherwise basically all drivers would be broken,
> > wouldn't they be?
> > I don't know whether it's correct on all platforms, but if not
> > we need to fix request_irq.
> 
> There is nothing to fix:
> 
> request_irq()
>raw_spin_lock_irq(desc->lock);   // ACQUIRE
>
>raw_spin_unlock_irq(desc->lock); // RELEASE
> 
> interrupt()
>raw_spin_lock(desc->lock);   // ACQUIRE
>set status to IN_PROGRESS
>raw_spin_unlock(desc->lock); // RELEASE
>invoke handler()
> 
> So anything which the driver set up _before_ request_irq() is visible to
> the interrupt handler. No?
> >> What happens if an interrupt is raised in the middle like:
> >> 
> >> smp_store_release(dev->irq_soft_enabled, true)
> >> IRQ handler
> >> synchornize_irq()
> 
> This is bogus. The obvious order of things is:
> 
> dev->ok = false;
> request_irq();
> 
> moar_setup();
> synchronize_irq();  // ACQUIRE + RELEASE
> dev->ok = true;
> 
> The reverse operation on teardown:
> 
> dev->ok = false;
> synchronize_irq();  // ACQUIRE + RELEASE
> 
> teardown();
> 
> So in both cases a simple check in the handler is sufficient:
> 
> handler()
> if (!dev->ok)
>   return;


Thanks a lot for the analysis Thomas. This is more or less what I was
thinking.

> 
> I'm not understanding what you folks are trying to "fix" here.

We are trying to fix the driver since at the moment it does not
have the dev->ok flag at all.


And I suspect virtio is not alone in that.
So it would have been nice if there was a standard flag
replacing the driver-specific dev->ok above, and ideally
would also handle the case of an interrupt triggering
too early by deferring the interrupt until the flag is set.

And in fact, it does kind of exist: IRQF_NO_AUTOEN, and you would call
enable_irq instead of dev->ok = true, except
- it doesn't work with affinity managed IRQs
- it does not work with shared IRQs

So using dev->ok as you propose above seems better at this point.

> If any
> driver does this in the wrong order, then the driver is broken.

I agree, however:
$ git grep synchronize_irq `git grep -l request_irq drivers/net/`|wc -l
113
$ git grep -l request_irq drivers/net/|wc -l
397

I suspect there are more drivers which in theory need the
synchronize_irq dance but in practice do not execute it.


> Sure, you can do the same with:
> 
> dev->ok = false;
> request_irq();
> moar_setup();
> smp_wmb();
> dev->ok = true;
> 
> for the price of a smp_rmb() in the interrupt handler:
> 
> handler()
> if (!dev->ok)
>   return;
> smp_rmb();
> 
> but that's only working for the setup case correctly and not for
> teardown.
> 
> Thanks,
> 
> tglx

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re:

2022-03-29 Thread Michael S. Tsirkin
On Tue, Mar 29, 2022 at 03:12:14PM +0800, Jason Wang wrote:
> > > > > > And requesting irq commits all memory otherwise all drivers would be
> > > > > > broken,
> > > > >
> > > > > So I think we might talk different issues:
> > > > >
> > > > > 1) Whether request_irq() commits the previous setups, I think the
> > > > > answer is yes, since the spin_unlock of desc->lock (release) can
> > > > > guarantee this though there seems no documentation around
> > > > > request_irq() to say this.
> > > > >
> > > > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> > > > > using smp_wmb() before the request_irq().
> > > > >
> > > > > And even if write is ordered we still need read to be ordered to be
> > > > > paired with that.
> >
> > IMO it synchronizes with the CPU to which irq is
> > delivered. Otherwise basically all drivers would be broken,
> > wouldn't they be?
> 
> I guess it's because most of the drivers don't care much about the
> buggy/malicious device.  And most of the devices may require an extra
> step to enable device IRQ after request_irq(). Or it's the charge of
> the driver to do the synchronization.

It is true that the use-case of malicious devices is somewhat boutique.
But I think most drivers do want to have their hotplug routines to be
robust, yes.

> > I don't know whether it's correct on all platforms, but if not
> > we need to fix request_irq.
> >
> > > > >
> > > > > > if it doesn't it just needs to be fixed, not worked around in
> > > > > > virtio.
> > > > >
> > > > > 2) virtio drivers might do a lot of setups between request_irq() and
> > > > > virtio_device_ready():
> > > > >
> > > > > request_irq()
> > > > > driver specific setups
> > > > > virtio_device_ready()
> > > > >
> > > > > CPU 0 probe) request_irq()
> > > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > > CPU 0 probe) driver specific setups
> > > > > CPU 0 probe) smp_store_release(intr_soft_enabled, true), commit the 
> > > > > setups
> > > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > > CPU 1 IRQ handler) use the uninitialized variable
> > > > >
> > > > > Thanks
> > > >
> > > >
> > > > As I said, virtio_device_ready needs to do synchronize_irq.
> > > > That will guarantee all setup is visible to the specific IRQ,
> > >
> > > Only the interrupt after synchronize_irq() returns.
> >
> > Anything else is a buggy device though.
> 
> Yes, but the goal of this patch is to prevent the possible attack from
> buggy(malicious) devices.

Right. However if a driver of a *buggy* device somehow sees driver_ok =
false even though it's actually initialized, that is not a deal breaker
as that does not open us up to an attack.

> >
> > > >this
> > > > is what it's point is.
> > >
> > > What happens if an interrupt is raised in the middle like:
> > >
> > > smp_store_release(dev->irq_soft_enabled, true)
> > > IRQ handler
> > > synchornize_irq()
> > >
> > > If we don't enforce a reading order, the IRQ handler may still see the
> > > uninitialized variable.
> > >
> > > Thanks
> >
> > IMHO variables should be initialized before request_irq
> > to a value meaning "not a valid interrupt".
> > Specifically driver_ok = false.
> > Handler in the scenario you describe will then see !driver_ok
> > and exit immediately.
> 
> So just to make sure we're on the same page.
> 
> 1) virtio_reset_device() will set the driver_ok to false;
> 2) virtio_device_ready() will set the driver_ok to true
> 
> So for virtio drivers, it often did:
> 
> 1) virtio_reset_device()
> 2) find_vqs() which will call request_irq()
> 3) other driver specific setups
> 4) virtio_device_ready()
> 
> In virtio_device_ready(), the patch perform the following currently:
> 
> smp_store_release(driver_ok, true);
> set_status(DRIVER_OK);
> 
> Per your suggestion, to add synchronize_irq() after
> smp_store_release() so we had
> 
> smp_store_release(driver_ok, true);
> synchornize_irq()
> set_status(DRIVER_OK)
> 
> Suppose there's a interrupt raised before the synchronize_irq(), if we do:
> 
> if (READ_ONCE(driver_ok)) {
>   vq->callback()
> }
> 
> It will see the driver_ok as true but how can we make sure
> vq->callback sees the driver specific setups (3) above?
> 
> And an example is virtio_scsi():
> 
> virtio_reset_device()
> virtscsi_probe()
> virtscsi_init()
> virtio_find_vqs()
> ...
> virtscsi_init_vq(>event_vq, vqs[1])
> 
> virtio_device_ready()
> 
> In virtscsi_event_done():
> 
> virtscsi_event_done():
> virtscsi_vq_done(vscsi, >event_vq, ...);
> 
> We need to make sure the even_done reads driver_ok before read 
> vscsi->event_vq.
> 
> Thanks


See response by Thomas. A simple if (!dev->driver_ok) should be enough,
it's all under a lock.

> >
> >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > We use smp_store_relase()
> > > > > > > > > to make sure the driver commits the setup before enabling the 
> > > > > > > > > irq. It
> > > > > > > > > means the read needs to be 

Re: [PATCH v1] net/mlx5: Add support for configuring max device MTU

2022-03-28 Thread Michael S. Tsirkin
On Mon, Feb 21, 2022 at 02:19:27PM +0200, Eli Cohen wrote:
> Allow an admin creating a vdpa device to specify the max MTU for the
> net device.
> 
> For example, to create a device with max MTU of 1000, the following
> command can be used:
> 
> $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 mtu 1000
> 
> This configuration mechanism assumes that vdpa is the sole real user of
> the function. mlx5_core could theoretically change the mtu of the
> function using the ip command on the mlx5_core net device but this
> should not be done.
> 
> Reviewed-by: Si-Wei Liu

missing space before <

> Signed-off-by: Eli Cohen 
> ---
> v0 -> v1:
> Update changelog


can you rebase pls? does not seem to apply

>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 32 ++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 6156cf6e9377..be095dc1134e 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2689,6 +2689,28 @@ static int event_handler(struct notifier_block *nb, 
> unsigned long event, void *p
>   return ret;
>  }
>  
> +static int config_func_mtu(struct mlx5_core_dev *mdev, u16 mtu)
> +{
> + int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
> + void *in;
> + int err;
> +
> + in = kvzalloc(inlen, GFP_KERNEL);
> + if (!in)
> + return -ENOMEM;
> +
> + MLX5_SET(modify_nic_vport_context_in, in, field_select.mtu, 1);
> + MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.mtu,
> +  mtu + MLX5V_ETH_HARD_MTU);
> + MLX5_SET(modify_nic_vport_context_in, in, opcode,
> +  MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
> +
> + err = mlx5_cmd_exec_in(mdev, modify_nic_vport_context, in);
> +
> + kvfree(in);
> + return err;
> +}
> +
>  static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>const struct vdpa_dev_set_config *add_config)
>  {
> @@ -2749,6 +2771,13 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> *v_mdev, const char *name,
>   mutex_init(>reslock);
>   mutex_init(>numq_lock);
>   config = >config;
> +
> + if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) {
> + err = config_func_mtu(mdev, add_config->net.mtu);
> + if (err)
> + goto err_mtu;
> + }
> +
>   err = query_mtu(mdev, );
>   if (err)
>   goto err_mtu;
> @@ -2867,7 +2896,8 @@ static int mlx5v_probe(struct auxiliary_device *adev,
>   mgtdev->mgtdev.device = mdev->device;
>   mgtdev->mgtdev.id_table = id_table;
>   mgtdev->mgtdev.config_attr_mask = 
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR) |
> -   
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
> +   
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP) |
> +   BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU);
>   mgtdev->mgtdev.max_supported_vqs =
>   MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues) + 1;
>   mgtdev->mgtdev.supported_features = get_supported_features(mdev);
> -- 
> 2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virito-pci-modern: Remove useless DMA-32 fallback configuration

2022-03-28 Thread Michael S. Tsirkin
Typo in subj, and pls make subject match actual file name (_ not -).


On Fri, Mar 18, 2022 at 12:58:58AM +, cgel@gmail.com wrote:
> From: Minghao Chi 
> 
> As stated in [1], dma_set_mask() with a 64-bit mask will never fail if
> dev->dma_mask is non-NULL.
> So, if it fails, the 32 bits case will also fail for the same reason.
> 
> Simplify code and remove some dead code accordingly.
> 
> [1]: https://lkml.org/lkml/2021/6/7/398
> 
> Reported-by: Zeal Robot 
> Signed-off-by: Minghao Chi 
> ---
>  drivers/virtio/virtio_pci_modern_dev.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index e8b3ff2b9fbc..dff0b15a239d 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -255,9 +255,6 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
>   }
>  
>   err = dma_set_mask_and_coherent(_dev->dev, DMA_BIT_MASK(64));
> - if (err)
> - err = dma_set_mask_and_coherent(_dev->dev,
> - DMA_BIT_MASK(32));
>   if (err)
>   dev_warn(_dev->dev, "Failed to enable 64-bit or 32-bit DMA. 
>  Trying to continue, but this might not work.\n");
>  
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] dt-bindings: virtio: mmio: add optional wakeup-source property

2022-03-28 Thread Michael S. Tsirkin
On Fri, Mar 25, 2022 at 09:59:45AM +0800, Minghao Xue wrote:
> Some systems want to set the interrupt of virtio_mmio device
> as a wakeup source. On such systems, we'll use the existence
> of the "wakeup-source" property as a signal of requirement.
> 
> Signed-off-by: Minghao Xue 

I don't have enough of a clue about dt to review this.
Pls get some acks from people with DT expertise.

> ---
> v1 -> v2: rename property from "virtio,wakeup" to "wakeup-source"
> 
>  Documentation/devicetree/bindings/virtio/mmio.yaml | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml 
> b/Documentation/devicetree/bindings/virtio/mmio.yaml
> index 4b7a027..160b21b 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.yaml
> +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml
> @@ -31,6 +31,10 @@ properties:
>  description: Required for devices making accesses thru an IOMMU.
>  maxItems: 1
>  
> +  wakeup-source:
> +type: boolean
> +description: Required for setting irq of a virtio_mmio device as wakeup 
> source.
> +
>  required:
>- compatible
>- reg
> -- 
> 2.7.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re:

2022-03-28 Thread Michael S. Tsirkin
On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
> On Mon, Mar 28, 2022 at 1:59 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Mar 28, 2022 at 12:56:41PM +0800, Jason Wang wrote:
> > > On Fri, Mar 25, 2022 at 6:10 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> > > > > On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > > > > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Bcc:
> > > > > > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > > > > > Message-ID: <20220325021422-mutt-send-email-...@kernel.org>
> > > > > > > > Reply-To:
> > > > > > > > In-Reply-To: 
> > > > > > > >
> > > > > > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > > > > > >
> > > > > > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > > > > > This is a rework on the previous IRQ hardening that is 
> > > > > > > > > > > done for
> > > > > > > > > > > virtio-pci where several drawbacks were found and were 
> > > > > > > > > > > reverted:
> > > > > > > > > > >
> > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to 
> > > > > > > > > > > affinity managed IRQ
> > > > > > > > > > > that is used by some device such as virtio-blk
> > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > >
> > > > > > > > > > > In this patch, we tries to borrow the idea from the INTX 
> > > > > > > > > > > IRQ hardening
> > > > > > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden 
> > > > > > > > > > > INTX interrupts")
> > > > > > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > > > > > virtio_reset_device()/virtio_device_ready(). A 
> > > > > > > > > > > synchornize_rcu() is
> > > > > > > > > > > used in virtio_reset_device() to synchronize with the IRQ 
> > > > > > > > > > > handlers. In
> > > > > > > > > > > the future, we may provide config_ops for the transport 
> > > > > > > > > > > that doesn't
> > > > > > > > > > > use IRQ. With this, vring_interrupt() can return check 
> > > > > > > > > > > and early if
> > > > > > > > > > > irq_soft_enabled is false. This lead to 
> > > > > > > > > > > smp_load_acquire() to be used
> > > > > > > > > > > but the cost should be acceptable.
> > > > > > > > > > Maybe it should be but is it? Can't we use synchronize_irq 
> > > > > > > > > > instead?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Even if we allow the transport driver to synchornize through
> > > > > > > > > synchronize_irq() we still need a check in the 
> > > > > > > > > vring_interrupt().
> > > > > > > > >
> > > > > > > > > We do something like the following previously:
> > > > > > > > >
> > > > > > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > > > > > return IRQ_NONE;
> > > > > > > > >
> > > > > > > > > But it looks like a bug since speculative read can be done 
> > > > &

Re:

2022-03-27 Thread Michael S. Tsirkin
On Mon, Mar 28, 2022 at 12:56:41PM +0800, Jason Wang wrote:
> On Fri, Mar 25, 2022 at 6:10 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> > > On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > Bcc:
> > > > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > > > Message-ID: <20220325021422-mutt-send-email-...@kernel.org>
> > > > > > Reply-To:
> > > > > > In-Reply-To: 
> > > > > >
> > > > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > > > >
> > > > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > > > This is a rework on the previous IRQ hardening that is done 
> > > > > > > > > for
> > > > > > > > > virtio-pci where several drawbacks were found and were 
> > > > > > > > > reverted:
> > > > > > > > >
> > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to 
> > > > > > > > > affinity managed IRQ
> > > > > > > > > that is used by some device such as virtio-blk
> > > > > > > > > 2) done only for PCI transport
> > > > > > > > >
> > > > > > > > > In this patch, we tries to borrow the idea from the INTX IRQ 
> > > > > > > > > hardening
> > > > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX 
> > > > > > > > > interrupts")
> > > > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > > > virtio_reset_device()/virtio_device_ready(). A 
> > > > > > > > > synchornize_rcu() is
> > > > > > > > > used in virtio_reset_device() to synchronize with the IRQ 
> > > > > > > > > handlers. In
> > > > > > > > > the future, we may provide config_ops for the transport that 
> > > > > > > > > doesn't
> > > > > > > > > use IRQ. With this, vring_interrupt() can return check and 
> > > > > > > > > early if
> > > > > > > > > irq_soft_enabled is false. This lead to smp_load_acquire() to 
> > > > > > > > > be used
> > > > > > > > > but the cost should be acceptable.
> > > > > > > > Maybe it should be but is it? Can't we use synchronize_irq 
> > > > > > > > instead?
> > > > > > >
> > > > > > >
> > > > > > > Even if we allow the transport driver to synchornize through
> > > > > > > synchronize_irq() we still need a check in the vring_interrupt().
> > > > > > >
> > > > > > > We do something like the following previously:
> > > > > > >
> > > > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > > > return IRQ_NONE;
> > > > > > >
> > > > > > > But it looks like a bug since speculative read can be done before 
> > > > > > > the check
> > > > > > > where the interrupt handler can't see the uncommitted setup which 
> > > > > > > is done by
> > > > > > > the driver.
> > > > > >
> > > > > > I don't think so - if you sync after setting the value then
> > > > > > you are guaranteed that any handler running afterwards
> > > > > > will see the new value.
> > > > >
> > > > > The problem is not disabled but the enable.
> > > >
> > > > So a misbehaving device can lose interrupts? That's not a problem at all
> > > > imo.
> > >
> > > It's the interrupt raised before setting irq_soft_enabled

Re: [PATCH v8 16/16] virtio_ring: introduce virtqueue_resize()

2022-03-25 Thread Michael S. Tsirkin
On Mon, Mar 14, 2022 at 05:34:55PM +0800, Xuan Zhuo wrote:
> Introduce virtqueue_resize() to implement the resize of vring.
> Based on these, the driver can dynamically adjust the size of the vring.
> For example: ethtool -G.
> 
> virtqueue_resize() implements resize based on the vq reset function. In
> case of failure to allocate a new vring, it will give up resize and use
> the original vring.
> 
> During this process, if the re-enable reset vq fails, the vq can no
> longer be used. Although the probability of this situation is not high.
> 
> The parameter recycle is used to recycle the buffer that is no longer
> used.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio_ring.c | 67 
>  include/linux/virtio.h   |  3 ++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index fb0abf9a2f57..b1dde086a8a4 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2528,6 +2528,73 @@ struct virtqueue *vring_create_virtqueue(
>  }
>  EXPORT_SYMBOL_GPL(vring_create_virtqueue);
>  
> +/**
> + * virtqueue_resize - resize the vring of vq
> + * @vq: the struct virtqueue we're talking about.
> + * @num: new ring num
> + * @recycle: callback for recycle the useless buffer
> + *
> + * When it is really necessary to create a new vring, it will set the 
> current vq
> + * into the reset state. Then call the passed cb to recycle the buffer that 
> is
> + * no longer used. Only after the new vring is successfully created, the old
> + * vring will be released.
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error.
> + * -ENOMEM: create new vring fail. But vq can still work
> + * -EBUSY:  reset/re-enable vq fail. vq may cannot work
> + * -ENOENT: not support resize
> + * -E2BIG/-EINVAL: param num error
> + */
> +int virtqueue_resize(struct virtqueue *vq, u32 num,
> +  void (*recycle)(struct virtqueue *vq, void *buf))
> +{
> + struct virtio_device *vdev = vq->vdev;
> + void *buf;
> + int err;
> +
> + if (num > vq->num_max)
> + return -E2BIG;
> +
> + if (!num)
> + return -EINVAL;
> +
> + if (to_vvq(vq)->packed.vring.num == num)
> + return 0;
> +
> + if (!vq->vdev->config->reset_vq)
> + return -ENOENT;
> +
> + if (!vq->vdev->config->enable_reset_vq)
> + return -ENOENT;
> +
> + err = vq->vdev->config->reset_vq(vq);
> + if (err) {
> + if (err != -ENOENT)
> + err = -EBUSY;
> + return err;
> + }
> +
> + while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> + recycle(vq, buf);


So all this callback can do now is drop all buffers, and I think that is
not great.  Can we store them and invoke the callback after queue is
enabled?


> +
> + if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED))
> + err = virtqueue_resize_packed(vq, num);
> + else
> + err = virtqueue_resize_split(vq, num);
> +
> + if (err)
> + err = -ENOMEM;
> +
> + if (vq->vdev->config->enable_reset_vq(vq))
> + return -EBUSY;
> +
> + return err;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_resize);
> +
>  /* Only available for split ring */
>  struct virtqueue *vring_new_virtqueue(unsigned int index,
> unsigned int num,
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index d59adc4be068..c86ff02e0ca0 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -91,6 +91,9 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq);
>  dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq);
>  dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
>  
> +int virtqueue_resize(struct virtqueue *vq, u32 num,
> +  void (*recycle)(struct virtqueue *vq, void *buf));
> +
>  /**
>   * virtio_device - representation of a device using virtio
>   * @index: unique position on the virtio bus
> -- 
> 2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re:

2022-03-25 Thread Michael S. Tsirkin
On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > Bcc:
> > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > Message-ID: <20220325021422-mutt-send-email-...@kernel.org>
> > > > Reply-To:
> > > > In-Reply-To: 
> > > >
> > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > >
> > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity 
> > > > > > > managed IRQ
> > > > > > > that is used by some device such as virtio-blk
> > > > > > > 2) done only for PCI transport
> > > > > > >
> > > > > > > In this patch, we tries to borrow the idea from the INTX IRQ 
> > > > > > > hardening
> > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX 
> > > > > > > interrupts")
> > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > virtio_reset_device()/virtio_device_ready(). A synchornize_rcu() 
> > > > > > > is
> > > > > > > used in virtio_reset_device() to synchronize with the IRQ 
> > > > > > > handlers. In
> > > > > > > the future, we may provide config_ops for the transport that 
> > > > > > > doesn't
> > > > > > > use IRQ. With this, vring_interrupt() can return check and early 
> > > > > > > if
> > > > > > > irq_soft_enabled is false. This lead to smp_load_acquire() to be 
> > > > > > > used
> > > > > > > but the cost should be acceptable.
> > > > > > Maybe it should be but is it? Can't we use synchronize_irq instead?
> > > > >
> > > > >
> > > > > Even if we allow the transport driver to synchornize through
> > > > > synchronize_irq() we still need a check in the vring_interrupt().
> > > > >
> > > > > We do something like the following previously:
> > > > >
> > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > return IRQ_NONE;
> > > > >
> > > > > But it looks like a bug since speculative read can be done before the 
> > > > > check
> > > > > where the interrupt handler can't see the uncommitted setup which is 
> > > > > done by
> > > > > the driver.
> > > >
> > > > I don't think so - if you sync after setting the value then
> > > > you are guaranteed that any handler running afterwards
> > > > will see the new value.
> > >
> > > The problem is not disabled but the enable.
> >
> > So a misbehaving device can lose interrupts? That's not a problem at all
> > imo.
> 
> It's the interrupt raised before setting irq_soft_enabled to true:
> 
> CPU 0 probe) driver specific setup (not commited)
> CPU 1 IRQ handler) read the uninitialized variable
> CPU 0 probe) set irq_soft_enabled to true
> CPU 1 IRQ handler) read irq_soft_enable as true
> CPU 1 IRQ handler) use the uninitialized variable
> 
> Thanks

Yea, it hurts if you do it.  So do not do it then ;).

irq_soft_enabled (I think driver_ok or status is a better name)
should be initialized to false *before* irq is requested.

And requesting irq commits all memory otherwise all drivers would be
broken, if it doesn't it just needs to be fixed, not worked around in
virtio.


> >
> > > We use smp_store_relase()
> > > to make sure the driver commits the setup before enabling the irq. It
> > > means the read needs to be ordered as well in vring_interrupt().
> > >
> > > >
> > > > Although I couldn't find anything about this in memory-barriers.txt

Re:

2022-03-25 Thread Michael S. Tsirkin
On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin  wrote:
> >
> > Bcc:
> > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > Message-ID: <20220325021422-mutt-send-email-...@kernel.org>
> > Reply-To:
> > In-Reply-To: 
> >
> > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > >
> > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > >
> > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity 
> > > > > managed IRQ
> > > > > that is used by some device such as virtio-blk
> > > > > 2) done only for PCI transport
> > > > >
> > > > > In this patch, we tries to borrow the idea from the INTX IRQ hardening
> > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX 
> > > > > interrupts")
> > > > > by introducing a global irq_soft_enabled variable for each
> > > > > virtio_device. Then we can to toggle it during
> > > > > virtio_reset_device()/virtio_device_ready(). A synchornize_rcu() is
> > > > > used in virtio_reset_device() to synchronize with the IRQ handlers. In
> > > > > the future, we may provide config_ops for the transport that doesn't
> > > > > use IRQ. With this, vring_interrupt() can return check and early if
> > > > > irq_soft_enabled is false. This lead to smp_load_acquire() to be used
> > > > > but the cost should be acceptable.
> > > > Maybe it should be but is it? Can't we use synchronize_irq instead?
> > >
> > >
> > > Even if we allow the transport driver to synchornize through
> > > synchronize_irq() we still need a check in the vring_interrupt().
> > >
> > > We do something like the following previously:
> > >
> > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > return IRQ_NONE;
> > >
> > > But it looks like a bug since speculative read can be done before the 
> > > check
> > > where the interrupt handler can't see the uncommitted setup which is done 
> > > by
> > > the driver.
> >
> > I don't think so - if you sync after setting the value then
> > you are guaranteed that any handler running afterwards
> > will see the new value.
> 
> The problem is not disabled but the enable.

So a misbehaving device can lose interrupts? That's not a problem at all
imo.

> We use smp_store_relase()
> to make sure the driver commits the setup before enabling the irq. It
> means the read needs to be ordered as well in vring_interrupt().
> 
> >
> > Although I couldn't find anything about this in memory-barriers.txt
> > which surprises me.
> >
> > CC Paul to help make sure I'm right.
> >
> >
> > >
> > > >
> > > > > To avoid breaking legacy device which can send IRQ before DRIVER_OK, a
> > > > > module parameter is introduced to enable the hardening so function
> > > > > hardening is disabled by default.
> > > > Which devices are these? How come they send an interrupt before there
> > > > are any buffers in any queues?
> > >
> > >
> > > I copied this from the commit log for 22b7050a024d7
> > >
> > > "
> > >
> > > This change will also benefit old hypervisors (before 2009)
> > > that send interrupts without checking DRIVER_OK: previously,
> > > the callback could race with driver-specific initialization.
> > > "
> > >
> > > If this is only for config interrupt, I can remove the above log.
> >
> >
> > This is only for config interrupt.
> 
> Ok.
> 
> >
> > >
> > > >
> > > > > Note that the hardening is only done for vring interrupt since the
> > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > handler because it uses spinlock to do the synchronization which is
> > > > > expensive.
> > > 

  1   2   3   4   5   6   7   8   9   10   >