Re: [PATCH net] virtio_net: bugfix overflow inside xdp_linearize_page()

2023-04-13 Thread Jason Wang
On Thu, Apr 13, 2023 at 8:19 PM Xuan Zhuo  wrote:
>
> Here we copy the data from the original buf to the new page. But we
> not check that it may be overflow.
>
> As long as the size received(including vnethdr) is greater than 3840
> (PAGE_SIZE -VIRTIO_XDP_HEADROOM). Then the memcpy will overflow.
>
> And this is completely possible, as long as the MTU is large, such
> as 4096. In our test environment, this will cause crash. Since crash is
> caused by the written memory, it is meaningless, so I do not include it.
>
> Signed-off-by: Xuan Zhuo 

Missing fixes tag?

Other than this,

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/net/virtio_net.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2396c28c0122..ea1bd4bb326d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -814,8 +814,13 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
>int page_off,
>unsigned int *len)
>  {
> -   struct page *page = alloc_page(GFP_ATOMIC);
> +   int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +   struct page *page;
> +
> +   if (page_off + *len + tailroom > PAGE_SIZE)
> +   return NULL;
>
> +   page = alloc_page(GFP_ATOMIC);
> if (!page)
> return NULL;
>
> @@ -823,7 +828,6 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
> page_off += *len;
>
> while (--*num_buf) {
> -   int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> unsigned int buflen;
> void *buf;
> int off;
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V2 2/2] virtio-net: sleep instead of busy waiting for cvq command

2023-04-13 Thread Jason Wang
Adding netdev.

On Fri, Apr 14, 2023 at 1:09 PM Jason Wang  wrote:
>
> On Thu, Apr 13, 2023 at 3:31 PM Xuan Zhuo  wrote:
> >
> > On Thu, 13 Apr 2023 14:40:27 +0800, Jason Wang  wrote:
> > > We used to busy waiting on the cvq command this tends to be
> > > problematic since there no way for to schedule another process which
> > > may serve for the control virtqueue. This might be the case when the
> > > control virtqueue is emulated by software. This patch switches to use
> > > completion to allow the CPU to sleep instead of busy waiting for the
> > > cvq command.
> > >
> > > Signed-off-by: Jason Wang 
> > > ---
> > > Changes since V1:
> > > - use completion for simplicity
> > > - don't try to harden the CVQ command which requires more thought
> > > Changes since RFC:
> > > - break the device when timeout
> > > - get buffer manually since the virtio core check more_used() instead
> > > ---
> > >  drivers/net/virtio_net.c | 21 ++---
> > >  1 file changed, 14 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 2e56bbf86894..d3eb8fd6c9dc 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -19,6 +19,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -295,6 +296,8 @@ struct virtnet_info {
> > >
> > >   /* failover when STANDBY feature enabled */
> > >   struct failover *failover;
> > > +
> > > + struct completion completion;
> > >  };
> > >
> > >  struct padded_vnet_hdr {
> > > @@ -1709,6 +1712,13 @@ static bool try_fill_recv(struct virtnet_info *vi, 
> > > struct receive_queue *rq,
> > >   return !oom;
> > >  }
> > >
> > > +static void virtnet_cvq_done(struct virtqueue *cvq)
> > > +{
> > > + struct virtnet_info *vi = cvq->vdev->priv;
> > > +
> > > + complete(>completion);
> > > +}
> > > +
> > >  static void skb_recv_done(struct virtqueue *rvq)
> > >  {
> > >   struct virtnet_info *vi = rvq->vdev->priv;
> > > @@ -2169,12 +2179,8 @@ static bool virtnet_send_command(struct 
> > > virtnet_info *vi, u8 class, u8 cmd,
> > >   if (unlikely(!virtqueue_kick(vi->cvq)))
> > >   return vi->ctrl->status == VIRTIO_NET_OK;
> > >
> > > - /* Spin for a response, the kick causes an ioport write, trapping
> > > -  * into the hypervisor, so the request should be handled 
> > > immediately.
> > > -  */
> > > - while (!virtqueue_get_buf(vi->cvq, ) &&
> > > -!virtqueue_is_broken(vi->cvq))
> > > - cpu_relax();
> > > + wait_for_completion(>completion);
> > > + virtqueue_get_buf(vi->cvq, );
> > >
> > >   return vi->ctrl->status == VIRTIO_NET_OK;
> > >  }
> > > @@ -3672,7 +3678,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > >
> > >   /* Parameters for control virtqueue, if any */
> > >   if (vi->has_cvq) {
> > > - callbacks[total_vqs - 1] = NULL;
> > > + callbacks[total_vqs - 1] = virtnet_cvq_done;
> >
> > This depends the interrupt, right?
>
> Not necessarily, we have ISR for at last PCI:
>
> static irqreturn_t vp_interrupt(int irq, void *opaque)
> {
> struct virtio_pci_device *vp_dev = opaque;
> u8 isr;
>
> /* reading the ISR has the effect of also clearing it so it's very
>  * important to save off the value. */
> isr = ioread8(vp_dev->isr);
> ...
> }
>
> >
> > I worry that there may be some devices that may not support interruption on 
> > cvq.
>
> Is the device using INTX or MSI-X?
>
> > Although this may not be in line with SPEC, it may cause problem on the 
> > devices
> > that can work normally at present.
>
> Then the implementation is buggy, it might not work for drivers other
> than Linux. Working around such buggy implementation is suboptimal.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >   names[total_vqs - 1] = "control";
> > >   }
> > >
> > > @@ -4122,6 +4128,7 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >   if (vi->has_rss || vi->has_rss_hash_report)
> > >   virtnet_init_default_rss(vi);
> > >
> > > + init_completion(>completion);
> > >   enable_rx_mode_work(vi);
> > >
> > >   /* serialize netdev register + virtio_device_ready() with 
> > > ndo_open() */
> > > --
> > > 2.25.1
> > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V2 2/2] virtio-net: sleep instead of busy waiting for cvq command

2023-04-13 Thread Jason Wang
On Thu, Apr 13, 2023 at 3:31 PM Xuan Zhuo  wrote:
>
> On Thu, 13 Apr 2023 14:40:27 +0800, Jason Wang  wrote:
> > We used to busy waiting on the cvq command this tends to be
> > problematic since there no way for to schedule another process which
> > may serve for the control virtqueue. This might be the case when the
> > control virtqueue is emulated by software. This patch switches to use
> > completion to allow the CPU to sleep instead of busy waiting for the
> > cvq command.
> >
> > Signed-off-by: Jason Wang 
> > ---
> > Changes since V1:
> > - use completion for simplicity
> > - don't try to harden the CVQ command which requires more thought
> > Changes since RFC:
> > - break the device when timeout
> > - get buffer manually since the virtio core check more_used() instead
> > ---
> >  drivers/net/virtio_net.c | 21 ++---
> >  1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2e56bbf86894..d3eb8fd6c9dc 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -19,6 +19,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -295,6 +296,8 @@ struct virtnet_info {
> >
> >   /* failover when STANDBY feature enabled */
> >   struct failover *failover;
> > +
> > + struct completion completion;
> >  };
> >
> >  struct padded_vnet_hdr {
> > @@ -1709,6 +1712,13 @@ static bool try_fill_recv(struct virtnet_info *vi, 
> > struct receive_queue *rq,
> >   return !oom;
> >  }
> >
> > +static void virtnet_cvq_done(struct virtqueue *cvq)
> > +{
> > + struct virtnet_info *vi = cvq->vdev->priv;
> > +
> > + complete(>completion);
> > +}
> > +
> >  static void skb_recv_done(struct virtqueue *rvq)
> >  {
> >   struct virtnet_info *vi = rvq->vdev->priv;
> > @@ -2169,12 +2179,8 @@ static bool virtnet_send_command(struct virtnet_info 
> > *vi, u8 class, u8 cmd,
> >   if (unlikely(!virtqueue_kick(vi->cvq)))
> >   return vi->ctrl->status == VIRTIO_NET_OK;
> >
> > - /* Spin for a response, the kick causes an ioport write, trapping
> > -  * into the hypervisor, so the request should be handled immediately.
> > -  */
> > - while (!virtqueue_get_buf(vi->cvq, ) &&
> > -!virtqueue_is_broken(vi->cvq))
> > - cpu_relax();
> > + wait_for_completion(>completion);
> > + virtqueue_get_buf(vi->cvq, );
> >
> >   return vi->ctrl->status == VIRTIO_NET_OK;
> >  }
> > @@ -3672,7 +3678,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >
> >   /* Parameters for control virtqueue, if any */
> >   if (vi->has_cvq) {
> > - callbacks[total_vqs - 1] = NULL;
> > + callbacks[total_vqs - 1] = virtnet_cvq_done;
>
> This depends the interrupt, right?

Not necessarily, we have ISR for at last PCI:

static irqreturn_t vp_interrupt(int irq, void *opaque)
{
struct virtio_pci_device *vp_dev = opaque;
u8 isr;

/* reading the ISR has the effect of also clearing it so it's very
 * important to save off the value. */
isr = ioread8(vp_dev->isr);
...
}

>
> I worry that there may be some devices that may not support interruption on 
> cvq.

Is the device using INTX or MSI-X?

> Although this may not be in line with SPEC, it may cause problem on the 
> devices
> that can work normally at present.

Then the implementation is buggy, it might not work for drivers other
than Linux. Working around such buggy implementation is suboptimal.

Thanks

>
> Thanks.
>
>
> >   names[total_vqs - 1] = "control";
> >   }
> >
> > @@ -4122,6 +4128,7 @@ static int virtnet_probe(struct virtio_device *vdev)
> >   if (vi->has_rss || vi->has_rss_hash_report)
> >   virtnet_init_default_rss(vi);
> >
> > + init_completion(>completion);
> >   enable_rx_mode_work(vi);
> >
> >   /* serialize netdev register + virtio_device_ready() with ndo_open() 
> > */
> > --
> > 2.25.1
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V2 1/2] virtio-net: convert rx mode setting to use workqueue

2023-04-13 Thread Jason Wang
Forget to cc netdev, adding.

On Fri, Apr 14, 2023 at 12:25 AM Michael S. Tsirkin  wrote:
>
> On Thu, Apr 13, 2023 at 02:40:26PM +0800, Jason Wang wrote:
> > This patch convert rx mode setting to be done in a workqueue, this is
> > a must for allow to sleep when waiting for the cvq command to
> > response since current code is executed under addr spin lock.
> >
> > Signed-off-by: Jason Wang 
>
> I don't like this frankly. This means that setting RX mode which would
> previously be reliable, now becomes unreliable.

It is "unreliable" by design:

  void(*ndo_set_rx_mode)(struct net_device *dev);

> - first of all configuration is no longer immediate

Is immediate a hard requirement? I can see a workqueue is used at least:

mlx5e, ipoib, efx, ...

>   and there is no way for driver to find out when
>   it actually took effect

But we know rx mode is best effort e.g it doesn't support vhost and we
survive from this for years.

> - second, if device fails command, this is also not
>   propagated to driver, again no way for driver to find out
>
> VDUSE needs to be fixed to do tricks to fix this
> without breaking normal drivers.

It's not specific to VDUSE. For example, when using virtio-net in the
UP environment with any software cvq (like mlx5 via vDPA or cma
transport).

Thanks

>
>
> > ---
> > Changes since V1:
> > - use RTNL to synchronize rx mode worker
> > ---
> >  drivers/net/virtio_net.c | 55 +---
> >  1 file changed, 52 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index e2560b6f7980..2e56bbf86894 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -265,6 +265,12 @@ struct virtnet_info {
> >   /* Work struct for config space updates */
> >   struct work_struct config_work;
> >
> > + /* Work struct for config rx mode */
> > + struct work_struct rx_mode_work;
> > +
> > + /* Is rx mode work enabled? */
> > + bool rx_mode_work_enabled;
> > +
> >   /* Does the affinity hint is set for virtqueues? */
> >   bool affinity_hint_set;
> >
> > @@ -388,6 +394,20 @@ static void disable_delayed_refill(struct virtnet_info 
> > *vi)
> >   spin_unlock_bh(>refill_lock);
> >  }
> >
> > +static void enable_rx_mode_work(struct virtnet_info *vi)
> > +{
> > + rtnl_lock();
> > + vi->rx_mode_work_enabled = true;
> > + rtnl_unlock();
> > +}
> > +
> > +static void disable_rx_mode_work(struct virtnet_info *vi)
> > +{
> > + rtnl_lock();
> > + vi->rx_mode_work_enabled = false;
> > + rtnl_unlock();
> > +}
> > +
> >  static void virtqueue_napi_schedule(struct napi_struct *napi,
> >   struct virtqueue *vq)
> >  {
> > @@ -2310,9 +2330,11 @@ static int virtnet_close(struct net_device *dev)
> >   return 0;
> >  }
> >
> > -static void virtnet_set_rx_mode(struct net_device *dev)
> > +static void virtnet_rx_mode_work(struct work_struct *work)
> >  {
> > - struct virtnet_info *vi = netdev_priv(dev);
> > + struct virtnet_info *vi =
> > + container_of(work, struct virtnet_info, rx_mode_work);
> > + struct net_device *dev = vi->dev;
> >   struct scatterlist sg[2];
> >   struct virtio_net_ctrl_mac *mac_data;
> >   struct netdev_hw_addr *ha;
> > @@ -2325,6 +2347,8 @@ static void virtnet_set_rx_mode(struct net_device 
> > *dev)
> >   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
> >   return;
> >
> > + rtnl_lock();
> > +
> >   vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
> >   vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
> >
> > @@ -2342,14 +2366,19 @@ static void virtnet_set_rx_mode(struct net_device 
> > *dev)
> >   dev_warn(>dev, "Failed to %sable allmulti mode.\n",
> >vi->ctrl->allmulti ? "en" : "dis");
> >
> > + netif_addr_lock_bh(dev);
> > +
> >   uc_count = netdev_uc_count(dev);
> >   mc_count = netdev_mc_count(dev);
> >   /* MAC filter - use one buffer for both lists */
> >   buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
> > (2 * sizeof(mac_data->entries)), GFP_ATOMIC);
> >   mac_data = buf;
> > - if (!buf)
> > + if (!buf) {
> > + netif_addr_unlock_bh(dev);
> > + rtnl_unlock();
> >   return;
> > + }
> >
> >   sg_init_table(sg, 2);
> >
> > @@ -2370,6 +2399,8 @@ static void virtnet_set_rx_mode(struct net_device 
> > *dev)
> >   netdev_for_each_mc_addr(ha, dev)
> >   memcpy(_data->macs[i++][0], ha->addr, ETH_ALEN);
> >
> > + netif_addr_unlock_bh(dev);
> > +
> >   sg_set_buf([1], mac_data,
> >  sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
> >
> > @@ -2377,9 +2408,19 @@ static void virtnet_set_rx_mode(struct net_device 
> > *dev)
> > VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
> >   

Re: [PATCH net-next V2 0/2] virtio-net: don't busy poll for cvq command

2023-04-13 Thread Jason Wang
On Thu, Apr 13, 2023 at 10:04 PM Jakub Kicinski  wrote:
>
> On Thu, 13 Apr 2023 14:40:25 +0800 Jason Wang wrote:
> > The code used to busy poll for cvq command which turns out to have
> > several side effects:
> >
> > 1) infinite poll for buggy devices
> > 2) bad interaction with scheduler
> >
> > So this series tries to use sleep instead of busy polling. In this
> > version, I take a step back: the hardening part is not implemented and
> > leave for future investigation. We use to aggree to use interruptible
> > sleep but it doesn't work for a general workqueue.
>
> CC: netdev missing?

My bad. Will cc netdev for any discussion.

Thanks

>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 11/11] vhost: allow userspace to create workers

2023-04-13 Thread Jason Wang
On Fri, Apr 14, 2023 at 6:36 AM Mike Christie
 wrote:
>
> On 4/12/23 2:56 AM, Jason Wang wrote:
> >> I can spin another patchset with the single ioctl design so we can compare.
> > So I'm fine with this approach. One last question, I see this:
> >
> > /* By default, a device gets one vhost_worker that its virtqueues share. 
> > This */
> >
> > I'm wondering if it is better to have a vhost_get_worker() to return
> > the worker_id of a specific queue. In the future, this might be useful
> > for devices that have multiple workers by default?
>
> Yeah, it would be useful. Just 2 questions. You mean an ioctl command to get
> this info right? I had added one in one version a while back, but dropped it
> because for some reason I thought we were going to do a generic vhost sysfs
> type of interface. We are not doing sysfs right?

Probably, but we need to make sure the emulatorpin can work for libvirt at last.

Thanks

>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 11/11] vhost: allow userspace to create workers

2023-04-13 Thread Mike Christie
On 4/12/23 2:56 AM, Jason Wang wrote:
>> I can spin another patchset with the single ioctl design so we can compare.
> So I'm fine with this approach. One last question, I see this:
> 
> /* By default, a device gets one vhost_worker that its virtqueues share. This 
> */
> 
> I'm wondering if it is better to have a vhost_get_worker() to return
> the worker_id of a specific queue. In the future, this might be useful
> for devices that have multiple workers by default?

Yeah, it would be useful. Just 2 questions. You mean an ioctl command to get
this info right? I had added one in one version a while back, but dropped it
because for some reason I thought we were going to do a generic vhost sysfs
type of interface. We are not doing sysfs right?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V2 1/2] virtio-net: convert rx mode setting to use workqueue

2023-04-13 Thread Michael S. Tsirkin
On Thu, Apr 13, 2023 at 02:40:26PM +0800, Jason Wang wrote:
> This patch convert rx mode setting to be done in a workqueue, this is
> a must for allow to sleep when waiting for the cvq command to
> response since current code is executed under addr spin lock.
> 
> Signed-off-by: Jason Wang 

I don't like this frankly. This means that setting RX mode which would
previously be reliable, now becomes unreliable.
- first of all configuration is no longer immediate
  and there is no way for driver to find out when
  it actually took effect
- second, if device fails command, this is also not
  propagated to driver, again no way for driver to find out 

VDUSE needs to be fixed to do tricks to fix this
without breaking normal drivers.


> ---
> Changes since V1:
> - use RTNL to synchronize rx mode worker
> ---
>  drivers/net/virtio_net.c | 55 +---
>  1 file changed, 52 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e2560b6f7980..2e56bbf86894 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -265,6 +265,12 @@ struct virtnet_info {
>   /* Work struct for config space updates */
>   struct work_struct config_work;
>  
> + /* Work struct for config rx mode */
> + struct work_struct rx_mode_work;
> +
> + /* Is rx mode work enabled? */
> + bool rx_mode_work_enabled;
> +
>   /* Does the affinity hint is set for virtqueues? */
>   bool affinity_hint_set;
>  
> @@ -388,6 +394,20 @@ static void disable_delayed_refill(struct virtnet_info 
> *vi)
>   spin_unlock_bh(>refill_lock);
>  }
>  
> +static void enable_rx_mode_work(struct virtnet_info *vi)
> +{
> + rtnl_lock();
> + vi->rx_mode_work_enabled = true;
> + rtnl_unlock();
> +}
> +
> +static void disable_rx_mode_work(struct virtnet_info *vi)
> +{
> + rtnl_lock();
> + vi->rx_mode_work_enabled = false;
> + rtnl_unlock();
> +}
> +
>  static void virtqueue_napi_schedule(struct napi_struct *napi,
>   struct virtqueue *vq)
>  {
> @@ -2310,9 +2330,11 @@ static int virtnet_close(struct net_device *dev)
>   return 0;
>  }
>  
> -static void virtnet_set_rx_mode(struct net_device *dev)
> +static void virtnet_rx_mode_work(struct work_struct *work)
>  {
> - struct virtnet_info *vi = netdev_priv(dev);
> + struct virtnet_info *vi =
> + container_of(work, struct virtnet_info, rx_mode_work);
> + struct net_device *dev = vi->dev;
>   struct scatterlist sg[2];
>   struct virtio_net_ctrl_mac *mac_data;
>   struct netdev_hw_addr *ha;
> @@ -2325,6 +2347,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
>   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
>   return;
>  
> + rtnl_lock();
> +
>   vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
>   vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
>  
> @@ -2342,14 +2366,19 @@ static void virtnet_set_rx_mode(struct net_device 
> *dev)
>   dev_warn(>dev, "Failed to %sable allmulti mode.\n",
>vi->ctrl->allmulti ? "en" : "dis");
>  
> + netif_addr_lock_bh(dev);
> +
>   uc_count = netdev_uc_count(dev);
>   mc_count = netdev_mc_count(dev);
>   /* MAC filter - use one buffer for both lists */
>   buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
> (2 * sizeof(mac_data->entries)), GFP_ATOMIC);
>   mac_data = buf;
> - if (!buf)
> + if (!buf) {
> + netif_addr_unlock_bh(dev);
> + rtnl_unlock();
>   return;
> + }
>  
>   sg_init_table(sg, 2);
>  
> @@ -2370,6 +2399,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
>   netdev_for_each_mc_addr(ha, dev)
>   memcpy(_data->macs[i++][0], ha->addr, ETH_ALEN);
>  
> + netif_addr_unlock_bh(dev);
> +
>   sg_set_buf([1], mac_data,
>  sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
>  
> @@ -2377,9 +2408,19 @@ static void virtnet_set_rx_mode(struct net_device *dev)
> VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
>   dev_warn(>dev, "Failed to set MAC filter table.\n");
>  
> + rtnl_unlock();
> +
>   kfree(buf);
>  }
>  
> +static void virtnet_set_rx_mode(struct net_device *dev)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> +
> + if (vi->rx_mode_work_enabled)
> + schedule_work(>rx_mode_work);
> +}
> +
>  static int virtnet_vlan_rx_add_vid(struct net_device *dev,
>  __be16 proto, u16 vid)
>  {
> @@ -3150,6 +3191,8 @@ static void virtnet_freeze_down(struct virtio_device 
> *vdev)
>  
>   /* Make sure no work handler is accessing the device */
>   flush_work(>config_work);
> + disable_rx_mode_work(vi);
> + flush_work(>rx_mode_work);
>  
>   netif_tx_lock_bh(vi->dev);
>   netif_device_detach(vi->dev);

So 

Re: virtio-iommu hotplug issue

2023-04-13 Thread Eric Auger
Hi,

On 4/13/23 13:01, Akihiko Odaki wrote:
> On 2023/04/13 19:40, Jean-Philippe Brucker wrote:
>> Hello,
>>
>> On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
>>> Hi,
>>>
>>> Recently I encountered a problem with the combination of Linux's
>>> virtio-iommu driver and QEMU when a SR-IOV virtual function gets
>>> disabled.
>>> I'd like to ask you what kind of solution is appropriate here and
>>> implement
>>> the solution if possible.
>>>
>>> A PCIe device implementing the SR-IOV specification exports a virtual
>>> function, and the guest can enable or disable it at runtime by
>>> writing to a
>>> configuration register. This effectively looks like a PCI device is
>>> hotplugged for the guest.
>>
>> Just so I understand this better: the guest gets a whole PCIe device PF
>> that implements SR-IOV, and so the guest can dynamically create VFs? 
>> Out
>> of curiosity, is that a hardware device assigned to the guest with VFIO,
>> or a device emulated by QEMU?
>
> Yes, that's right. The guest can dynamically create and delete VFs.
> The device is emulated by QEMU: igb, an Intel NIC recently added to
> QEMU and projected to be released as part of QEMU 8.0.
From below description In understand you then bind this emulated device
to VFIO on guest, correct?
>
>>
>>> In such a case, the kernel assumes the endpoint is
>>> detached from the virtio-iommu domain, but QEMU actually does not
>>> detach it.
The QEMU virtio-iommu device executes commands from the virtio-iommu
driver and my understanding is the VFIO infra is not in trouble here. As
suggested by Jean, a detach command probably is missed.
>>>
>>> This inconsistent view of the removed device sometimes prevents the
>>> VM from
>>> correctly performing the following procedure, for example:
>>> 1. Enable a VF.
>>> 2. Disable the VF.
>>> 3. Open a vfio container.
>>> 4. Open the group which the PF belongs to.
>>> 5. Add the group to the vfio container.
>>> 6. Map some memory region.
>>> 7. Close the group.
>>> 8. Close the vfio container.
>>> 9. Repeat 3-8
>>>
>>> When the VF gets disabled, the kernel assumes the endpoint is
>>> detached from
>>> the IOMMU domain, but QEMU actually doesn't detach it. Later, the
>>> domain
>>> will be reused in step 3-8.
>>>
>>> In step 7, the PF will be detached, and the kernel thinks there is no
>>> endpoint attached and the mapping the domain holds is cleared, but
>>> the VF
>>> endpoint is still attached and the mapping is kept intact.
>>>
>>> In step 9, the same domain will be reused again, and the kernel
>>> requests to
>>> create a new mapping, but it will conflict with the existing mapping
>>> and
>>> result in -EINVAL.
>>>
>>> This problem can be fixed by either of:
>>> - requesting the detachment of the endpoint from the guest when the PCI
>>> device is unplugged (the VF is disabled)
>>
>> Yes, I think this is an issue in the virtio-iommu driver, which
>> should be
>> sending a DETACH request when the VF is disabled, likely from
>> viommu_release_device(). I'll work on a fix unless you would like to
>> do it
>
> It will be nice if you prepare a fix. I will test your patch with my
> workload if you share it with me.

I can help testing too

Thanks

Eric
>
> Regards,
> Akihiko Odaki
>
>>
>>> - detecting that the PCI device is gone and automatically detach it on
>>> QEMU-side.
>>>
>>> It is not completely clear for me which solution is more appropriate
>>> as the
>>> virtio-iommu specification is written in a way independent of the
>>> endpoint
>>> mechanism and does not say what should be done when a PCI device is
>>> unplugged.
>>
>> Yes, I'm not sure it's in scope for the specification, it's more about
>> software guidance
>>
>> Thanks,
>> Jean
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V2 0/2] virtio-net: don't busy poll for cvq command

2023-04-13 Thread Maxime Coquelin



On 4/13/23 15:02, Maxime Coquelin wrote:

Hi Jason,

On 4/13/23 08:40, Jason Wang wrote:

Hi all:

The code used to busy poll for cvq command which turns out to have
several side effects:

1) infinite poll for buggy devices
2) bad interaction with scheduler

So this series tries to use sleep instead of busy polling. In this
version, I take a step back: the hardening part is not implemented and
leave for future investigation. We use to aggree to use interruptible
sleep but it doesn't work for a general workqueue.

Please review.


Thanks for working on this.
My DPDK VDUSE RFC missed to set the interrupt, as Xuan Zhou highlighted
it makes the vdpa dev add/del commands to freeze:
[<0>] device_del+0x37/0x3d0
[<0>] device_unregister+0x13/0x60
[<0>] unregister_virtio_device+0x11/0x20
[<0>] device_release_driver_internal+0x193/0x200
[<0>] bus_remove_device+0xbf/0x130
[<0>] device_del+0x174/0x3d0
[<0>] device_unregister+0x13/0x60
[<0>] vdpa_nl_cmd_dev_del_set_doit+0x66/0xe0 [vdpa]
[<0>] genl_family_rcv_msg_doit.isra.0+0xb8/0x100
[<0>] genl_rcv_msg+0x151/0x290
[<0>] netlink_rcv_skb+0x54/0x100
[<0>] genl_rcv+0x24/0x40
[<0>] netlink_unicast+0x217/0x340
[<0>] netlink_sendmsg+0x23e/0x4a0
[<0>] sock_sendmsg+0x8f/0xa0
[<0>] __sys_sendto+0xfc/0x170
[<0>] __x64_sys_sendto+0x20/0x30
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

Once fixed on DPDK side (you can use my vduse_v1 branch [0] for
testing), it works fine:

Tested-by: Maxime Coquelin 

For the potential missing interrupt with non-compliant devices, I guess
it could be handled with the hardening work as same thing could happen
if the VDUSE application crashed for example.

Regards,
Maxime

[0]:


Better with the link...

[0]: https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commits/vduse_v1/


Thanks

Changes since V1:
- use RTNL to synchronize rx mode worker
- use completion for simplicity
- don't try to harden CVQ command

Changes since RFC:

- switch to use BAD_RING in virtio_break_device()
- check virtqueue_is_broken() after being woken up
- use more_used() instead of virtqueue_get_buf() to allow caller to
   get buffers afterwards
   - break the virtio-net device when timeout
   - get buffer manually since the virtio core check more_used() instead

Jason Wang (2):
   virtio-net: convert rx mode setting to use workqueue
   virtio-net: sleep instead of busy waiting for cvq command

  drivers/net/virtio_net.c | 76 ++--
  1 file changed, 66 insertions(+), 10 deletions(-)



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next V2 0/2] virtio-net: don't busy poll for cvq command

2023-04-13 Thread Maxime Coquelin

Hi Jason,

On 4/13/23 08:40, Jason Wang wrote:

Hi all:

The code used to busy poll for cvq command which turns out to have
several side effects:

1) infinite poll for buggy devices
2) bad interaction with scheduler

So this series tries to use sleep instead of busy polling. In this
version, I take a step back: the hardening part is not implemented and
leave for future investigation. We use to aggree to use interruptible
sleep but it doesn't work for a general workqueue.

Please review.


Thanks for working on this.
My DPDK VDUSE RFC missed to set the interrupt, as Xuan Zhou highlighted
it makes the vdpa dev add/del commands to freeze:
[<0>] device_del+0x37/0x3d0
[<0>] device_unregister+0x13/0x60
[<0>] unregister_virtio_device+0x11/0x20
[<0>] device_release_driver_internal+0x193/0x200
[<0>] bus_remove_device+0xbf/0x130
[<0>] device_del+0x174/0x3d0
[<0>] device_unregister+0x13/0x60
[<0>] vdpa_nl_cmd_dev_del_set_doit+0x66/0xe0 [vdpa]
[<0>] genl_family_rcv_msg_doit.isra.0+0xb8/0x100
[<0>] genl_rcv_msg+0x151/0x290
[<0>] netlink_rcv_skb+0x54/0x100
[<0>] genl_rcv+0x24/0x40
[<0>] netlink_unicast+0x217/0x340
[<0>] netlink_sendmsg+0x23e/0x4a0
[<0>] sock_sendmsg+0x8f/0xa0
[<0>] __sys_sendto+0xfc/0x170
[<0>] __x64_sys_sendto+0x20/0x30
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

Once fixed on DPDK side (you can use my vduse_v1 branch [0] for
testing), it works fine:

Tested-by: Maxime Coquelin 

For the potential missing interrupt with non-compliant devices, I guess
it could be handled with the hardening work as same thing could happen
if the VDUSE application crashed for example.

Regards,
Maxime

[0]:

Thanks

Changes since V1:
- use RTNL to synchronize rx mode worker
- use completion for simplicity
- don't try to harden CVQ command

Changes since RFC:

- switch to use BAD_RING in virtio_break_device()
- check virtqueue_is_broken() after being woken up
- use more_used() instead of virtqueue_get_buf() to allow caller to
   get buffers afterwards
   - break the virtio-net device when timeout
   - get buffer manually since the virtio core check more_used() instead

Jason Wang (2):
   virtio-net: convert rx mode setting to use workqueue
   virtio-net: sleep instead of busy waiting for cvq command

  drivers/net/virtio_net.c | 76 ++--
  1 file changed, 66 insertions(+), 10 deletions(-)



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net] virtio_net: bugfix overflow inside xdp_linearize_page()

2023-04-13 Thread Xuan Zhuo
Here we copy the data from the original buf to the new page. But we
not check that it may be overflow.

As long as the size received(including vnethdr) is greater than 3840
(PAGE_SIZE -VIRTIO_XDP_HEADROOM). Then the memcpy will overflow.

And this is completely possible, as long as the MTU is large, such
as 4096. In our test environment, this will cause crash. Since crash is
caused by the written memory, it is meaningless, so I do not include it.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2396c28c0122..ea1bd4bb326d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -814,8 +814,13 @@ static struct page *xdp_linearize_page(struct 
receive_queue *rq,
   int page_off,
   unsigned int *len)
 {
-   struct page *page = alloc_page(GFP_ATOMIC);
+   int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+   struct page *page;
+
+   if (page_off + *len + tailroom > PAGE_SIZE)
+   return NULL;
 
+   page = alloc_page(GFP_ATOMIC);
if (!page)
return NULL;
 
@@ -823,7 +828,6 @@ static struct page *xdp_linearize_page(struct receive_queue 
*rq,
page_off += *len;
 
while (--*num_buf) {
-   int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
unsigned int buflen;
void *buf;
int off;
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio-iommu hotplug issue

2023-04-13 Thread Jean-Philippe Brucker
Hello,

On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
> Hi,
> 
> Recently I encountered a problem with the combination of Linux's
> virtio-iommu driver and QEMU when a SR-IOV virtual function gets disabled.
> I'd like to ask you what kind of solution is appropriate here and implement
> the solution if possible.
> 
> A PCIe device implementing the SR-IOV specification exports a virtual
> function, and the guest can enable or disable it at runtime by writing to a
> configuration register. This effectively looks like a PCI device is
> hotplugged for the guest.

Just so I understand this better: the guest gets a whole PCIe device PF
that implements SR-IOV, and so the guest can dynamically create VFs?  Out
of curiosity, is that a hardware device assigned to the guest with VFIO,
or a device emulated by QEMU?

> In such a case, the kernel assumes the endpoint is
> detached from the virtio-iommu domain, but QEMU actually does not detach it.
> 
> This inconsistent view of the removed device sometimes prevents the VM from
> correctly performing the following procedure, for example:
> 1. Enable a VF.
> 2. Disable the VF.
> 3. Open a vfio container.
> 4. Open the group which the PF belongs to.
> 5. Add the group to the vfio container.
> 6. Map some memory region.
> 7. Close the group.
> 8. Close the vfio container.
> 9. Repeat 3-8
> 
> When the VF gets disabled, the kernel assumes the endpoint is detached from
> the IOMMU domain, but QEMU actually doesn't detach it. Later, the domain
> will be reused in step 3-8.
> 
> In step 7, the PF will be detached, and the kernel thinks there is no
> endpoint attached and the mapping the domain holds is cleared, but the VF
> endpoint is still attached and the mapping is kept intact.
> 
> In step 9, the same domain will be reused again, and the kernel requests to
> create a new mapping, but it will conflict with the existing mapping and
> result in -EINVAL.
> 
> This problem can be fixed by either of:
> - requesting the detachment of the endpoint from the guest when the PCI
> device is unplugged (the VF is disabled)

Yes, I think this is an issue in the virtio-iommu driver, which should be
sending a DETACH request when the VF is disabled, likely from
viommu_release_device(). I'll work on a fix unless you would like to do it

> - detecting that the PCI device is gone and automatically detach it on
> QEMU-side.
> 
> It is not completely clear for me which solution is more appropriate as the
> virtio-iommu specification is written in a way independent of the endpoint
> mechanism and does not say what should be done when a PCI device is
> unplugged.

Yes, I'm not sure it's in scope for the specification, it's more about
software guidance

Thanks,
Jean
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v1] vsock/loopback: don't disable irqs for queue access

2023-04-13 Thread Stefano Garzarella

On Sun, Apr 09, 2023 at 10:17:51PM +0300, Arseniy Krasnov wrote:

This replaces 'skb_queue_tail()' with 'virtio_vsock_skb_queue_tail()'.
The first one uses 'spin_lock_irqsave()', second uses 'spin_lock_bh()'.
There is no need to disable interrupts in the loopback transport as
there is no access to the queue with skbs from interrupt context. Both
virtio and vhost transports work in the same way.


Yep, this is a good point!



Signed-off-by: Arseniy Krasnov 
---
net/vmw_vsock/vsock_loopback.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)


LGTM! (net-next material)

Reviewed-by: Stefano Garzarella 

Thanks,
Stefano



diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index e3afc0c866f5..5c6360df1f31 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -31,8 +31,7 @@ static int vsock_loopback_send_pkt(struct sk_buff *skb)
struct vsock_loopback *vsock = _vsock_loopback;
int len = skb->len;

-   skb_queue_tail(>pkt_queue, skb);
-
+   virtio_vsock_skb_queue_tail(>pkt_queue, skb);
queue_work(vsock->workqueue, >pkt_work);

return len;
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] gpio: aggregator: Add interrupt support

2023-04-13 Thread Geert Uytterhoeven
Hi Kamel,

On Thu, Apr 13, 2023 at 9:48 AM  wrote:
> Le 2021-10-04 14:44, Geert Uytterhoeven a écrit :
> What is the status for this patch, is there any remaining
> changes to be made ?

You mean commit a00128dfc8fc0cc8 ("gpio: aggregator: Add interrupt
support") in v5.17?

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 0/2] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-13 Thread Alvaro Karsz
Add VIRTIO_F_NOTIFICATION_DATA feature support for the MMIO, channel
I/O, modern PCI and vDPA transports.

This patchset binds 2 patches that were sent separately to the mailing
lists.

The first one [1] adds support for the MMIO, channel I/O and modern PCI 
transports.
The second one [2] adds support to the vDPA transport.

[1] https://lore.kernel.org/lkml/20230324195029.2410503-1-vik...@daynix.com/
[2] 
https://lore.kernel.org/virtualization/20230409070706.3288876-1-alvaro.ka...@solid-run.com/

Alvaro Karsz (1):
  virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support

Viktor Prutyanov (1):
  virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

 drivers/s390/virtio/virtio_ccw.c   | 22 +++---
 drivers/virtio/virtio_mmio.c   | 18 +-
 drivers/virtio/virtio_pci_modern.c | 17 -
 drivers/virtio/virtio_ring.c   | 19 +++
 drivers/virtio/virtio_vdpa.c   | 23 +--
 include/linux/vdpa.h   |  9 +
 include/linux/virtio_ring.h|  2 ++
 include/uapi/linux/virtio_config.h |  6 ++
 8 files changed, 109 insertions(+), 7 deletions(-)

-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-13 Thread Alvaro Karsz
Add VIRTIO_F_NOTIFICATION_DATA support for vDPA transport.
If this feature is negotiated, the driver passes extra data when kicking
a virtqueue.

A device that offers this feature needs to implement the
kick_vq_with_data callback.

kick_vq_with_data receives the vDPA device and data.
data includes:
16 bits vqn and 16 bits next available index for split virtqueues.
16 bits vqs, 15 least significant bits of next available index
and 1 bit next_wrap for packed virtqueues.

This patch follows a patch [1] by Viktor Prutyanov which adds support
for the MMIO, channel I/O and modern PCI transports.

Signed-off-by: Alvaro Karsz 
---
 drivers/virtio/virtio_vdpa.c | 23 +--
 include/linux/vdpa.h |  9 +
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index d7f5af62dda..737c1f36d32 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -112,6 +112,17 @@ static bool virtio_vdpa_notify(struct virtqueue *vq)
return true;
 }
 
+static bool virtio_vdpa_notify_with_data(struct virtqueue *vq)
+{
+   struct vdpa_device *vdpa = vd_get_vdpa(vq->vdev);
+   const struct vdpa_config_ops *ops = vdpa->config;
+   u32 data = vring_notification_data(vq);
+
+   ops->kick_vq_with_data(vdpa, data);
+
+   return true;
+}
+
 static irqreturn_t virtio_vdpa_config_cb(void *private)
 {
struct virtio_vdpa_device *vd_dev = private;
@@ -138,6 +149,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
struct device *dma_dev;
const struct vdpa_config_ops *ops = vdpa->config;
struct virtio_vdpa_vq_info *info;
+   bool (*notify)(struct virtqueue *vq) = virtio_vdpa_notify;
struct vdpa_callback cb;
struct virtqueue *vq;
u64 desc_addr, driver_addr, device_addr;
@@ -154,6 +166,14 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
if (index >= vdpa->nvqs)
return ERR_PTR(-ENOENT);
 
+   /* We cannot accept VIRTIO_F_NOTIFICATION_DATA without 
kick_vq_with_data */
+   if (__virtio_test_bit(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+   if (ops->kick_vq_with_data)
+   notify = virtio_vdpa_notify_with_data;
+   else
+   __virtio_clear_bit(vdev, VIRTIO_F_NOTIFICATION_DATA);
+   }
+
/* Queue shouldn't already be set up. */
if (ops->get_vq_ready(vdpa, index))
return ERR_PTR(-ENOENT);
@@ -183,8 +203,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
dma_dev = vdpa_get_dma_dev(vdpa);
vq = vring_create_virtqueue_dma(index, max_num, align, vdev,
true, may_reduce_num, ctx,
-   virtio_vdpa_notify, callback,
-   name, dma_dev);
+   notify, callback, name, dma_dev);
if (!vq) {
err = -ENOMEM;
goto error_new_virtqueue;
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 43f59ef10cc..04cdaad77dd 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -143,6 +143,14 @@ struct vdpa_map_file {
  * @kick_vq:   Kick the virtqueue
  * @vdev: vdpa device
  * @idx: virtqueue index
+ * @kick_vq_with_data: Kick the virtqueue and supply extra data
+ * (only if VIRTIO_F_NOTIFICATION_DATA is 
negotiated)
+ * @vdev: vdpa device
+ * @data for split virtqueue:
+ * 16 bits vqn and 16 bits next available index.
+ * @data for packed virtqueue:
+ * 16 bits vqn, 15 least significant bits of
+ * next available index and 1 bit next_wrap.
  * @set_vq_cb: Set the interrupt callback function for
  * a virtqueue
  * @vdev: vdpa device
@@ -300,6 +308,7 @@ struct vdpa_config_ops {
  u64 device_area);
void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
+   void (*kick_vq_with_data)(struct vdpa_device *vdev, u32 data);
void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
  struct vdpa_callback *cb);
void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-13 Thread Alvaro Karsz
From: Viktor Prutyanov 

According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
indicates that the driver passes extra data along with the queue
notifications.

In a split queue case, the extra data is 16-bit available index. In a
packed queue case, the extra data is 1-bit wrap counter and 15-bit
available index.

Add support for this feature for MMIO, channel I/O and modern PCI
transports.

Signed-off-by: Viktor Prutyanov 
Acked-by: Jason Wang 
Reviewed-by: Xuan Zhuo 
---
 drivers/s390/virtio/virtio_ccw.c   | 22 +++---
 drivers/virtio/virtio_mmio.c   | 18 +-
 drivers/virtio/virtio_pci_modern.c | 17 -
 drivers/virtio/virtio_ring.c   | 19 +++
 include/linux/virtio_ring.h|  2 ++
 include/uapi/linux/virtio_config.h |  6 ++
 6 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 954fc31b4bc..02922768b12 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -391,7 +391,7 @@ static void virtio_ccw_drop_indicator(struct 
virtio_ccw_device *vcdev,
ccw_device_dma_free(vcdev->cdev, thinint_area, sizeof(*thinint_area));
 }
 
-static bool virtio_ccw_kvm_notify(struct virtqueue *vq)
+static inline bool virtio_ccw_do_kvm_notify(struct virtqueue *vq, u32 data)
 {
struct virtio_ccw_vq_info *info = vq->priv;
struct virtio_ccw_device *vcdev;
@@ -402,12 +402,22 @@ static bool virtio_ccw_kvm_notify(struct virtqueue *vq)
BUILD_BUG_ON(sizeof(struct subchannel_id) != sizeof(unsigned int));
info->cookie = kvm_hypercall3(KVM_S390_VIRTIO_CCW_NOTIFY,
  *((unsigned int *)),
- vq->index, info->cookie);
+ data, info->cookie);
if (info->cookie < 0)
return false;
return true;
 }
 
+static bool virtio_ccw_kvm_notify(struct virtqueue *vq)
+{
+   return virtio_ccw_do_kvm_notify(vq, vq->index);
+}
+
+static bool virtio_ccw_kvm_notify_with_data(struct virtqueue *vq)
+{
+   return virtio_ccw_do_kvm_notify(vq, vring_notification_data(vq));
+}
+
 static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev,
   struct ccw1 *ccw, int index)
 {
@@ -495,6 +505,7 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
 struct ccw1 *ccw)
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
+   bool (*notify)(struct virtqueue *vq);
int err;
struct virtqueue *vq = NULL;
struct virtio_ccw_vq_info *info;
@@ -502,6 +513,11 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
unsigned long flags;
bool may_reduce;
 
+   if (__virtio_test_bit(vdev, VIRTIO_F_NOTIFICATION_DATA))
+   notify = virtio_ccw_kvm_notify_with_data;
+   else
+   notify = virtio_ccw_kvm_notify;
+
/* Allocate queue. */
info = kzalloc(sizeof(struct virtio_ccw_vq_info), GFP_KERNEL);
if (!info) {
@@ -524,7 +540,7 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
may_reduce = vcdev->revision > 0;
vq = vring_create_virtqueue(i, info->num, KVM_VIRTIO_CCW_RING_ALIGN,
vdev, true, may_reduce, ctx,
-   virtio_ccw_kvm_notify, callback, name);
+   notify, callback, name);
 
if (!vq) {
/* For now, we fail if we can't get the requested size. */
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 3ff746e3f24..dd4424c1423 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -285,6 +285,16 @@ static bool vm_notify(struct virtqueue *vq)
return true;
 }
 
+static bool vm_notify_with_data(struct virtqueue *vq)
+{
+   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
+   u32 data = vring_notification_data(vq);
+
+   writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
+
+   return true;
+}
+
 /* Notify all virtqueues on an interrupt. */
 static irqreturn_t vm_interrupt(int irq, void *opaque)
 {
@@ -363,12 +373,18 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
*vdev, unsigned int in
  const char *name, bool ctx)
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+   bool (*notify)(struct virtqueue *vq);
struct virtio_mmio_vq_info *info;
struct virtqueue *vq;
unsigned long flags;
unsigned int num;
int err;
 
+   if (__virtio_test_bit(vdev, VIRTIO_F_NOTIFICATION_DATA))
+   notify = vm_notify_with_data;
+   else
+   notify = vm_notify;
+
if (!name)
return 

Re: [PATCH] tools/virtio/ringtest: Replace obsolete memalign() with posix_memalign()

2023-04-13 Thread Xuan Zhuo
On Wed, 12 Apr 2023 03:25:36 -0400, Deming Wang  wrote:
> memalign() is obsolete according to its manpage.
>
> Replace memalign() with posix_memalign() and remove malloc.h include
> that was there for memalign().
>
> As a pointer is passed into posix_memalign(), initialize *p to NULL
> to silence a warning about the function's return value being used as
> uninitialized (which is not valid anyway because the error is properly
> checked before p is returned).
>
> Signed-off-by: Deming Wang 

LGTM

Reviewed-by: Xuan Zhuo 

> ---
>  tools/virtio/ringtest/ptr_ring.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/tools/virtio/ringtest/ptr_ring.c 
> b/tools/virtio/ringtest/ptr_ring.c
> index c9b26335f891..a0bf4978eace 100644
> --- a/tools/virtio/ringtest/ptr_ring.c
> +++ b/tools/virtio/ringtest/ptr_ring.c
> @@ -26,9 +26,12 @@ typedef int gfp_t;
>
>  static void *kmalloc(unsigned size, gfp_t gfp)
>  {
> - void *p = memalign(64, size);
> - if (!p)
> - return p;
> + void *p;
> + int ret;
> +
> + ret = posix_memalign(, 64, size);
> + if (ret < 0)
> + return NULL;
>
>   if (gfp & __GFP_ZERO)
>   memset(p, 0, size);
> --
> 2.27.0
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-13 Thread Alvaro Karsz
> Hmm.  So it seems we need to first apply yours then this patch,
> is that right? Or the other way around? What is the right way to make it not 
> break bisect?
> Do you mind including this patch with yours in a patchset
> in the correct order?

Ok, I'll create a patchset.

Thanks,
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 2/2] vdpa/snet: support the suspend vDPA callback

2023-04-13 Thread Alvaro Karsz
When suspend is called, the driver sends a suspend command to the DPU
through the control mechanism.

Signed-off-by: Alvaro Karsz 
---
 drivers/vdpa/solidrun/snet_ctrl.c |  6 ++
 drivers/vdpa/solidrun/snet_main.c | 15 +++
 drivers/vdpa/solidrun/snet_vdpa.h |  1 +
 3 files changed, 22 insertions(+)

diff --git a/drivers/vdpa/solidrun/snet_ctrl.c 
b/drivers/vdpa/solidrun/snet_ctrl.c
index 10cde502f1a..3858738643b 100644
--- a/drivers/vdpa/solidrun/snet_ctrl.c
+++ b/drivers/vdpa/solidrun/snet_ctrl.c
@@ -15,6 +15,7 @@
 enum snet_ctrl_opcodes {
SNET_CTRL_OP_DESTROY = 1,
SNET_CTRL_OP_READ_VQ_STATE,
+   SNET_CTRL_OP_SUSPEND,
 };
 
 #define SNET_CTRL_TIMEOUT  200
@@ -322,3 +323,8 @@ int snet_read_vq_state(struct snet *snet, u16 idx, struct 
vdpa_vq_state *state)
return snet_ctrl_read_from_dpu(snet, SNET_CTRL_OP_READ_VQ_STATE, idx, 
state,
   sizeof(*state));
 }
+
+int snet_suspend_dev(struct snet *snet)
+{
+   return snet_send_ctrl_msg(snet, SNET_CTRL_OP_SUSPEND, 0);
+}
diff --git a/drivers/vdpa/solidrun/snet_main.c 
b/drivers/vdpa/solidrun/snet_main.c
index 86769f436b4..7359599e09e 100644
--- a/drivers/vdpa/solidrun/snet_main.c
+++ b/drivers/vdpa/solidrun/snet_main.c
@@ -483,6 +483,20 @@ static void snet_set_config(struct vdpa_device *vdev, 
unsigned int offset,
iowrite8(*buf_ptr++, cfg_ptr + i);
 }
 
+static int snet_suspend(struct vdpa_device *vdev)
+{
+   struct snet *snet = vdpa_to_snet(vdev);
+   int ret;
+
+   ret = snet_suspend_dev(snet);
+   if (ret)
+   SNET_ERR(snet->pdev, "SNET[%u] suspend failed, err: %d\n", 
snet->sid, ret);
+   else
+   SNET_DBG(snet->pdev, "Suspend SNET[%u] device\n", snet->sid);
+
+   return ret;
+}
+
 static const struct vdpa_config_ops snet_config_ops = {
.set_vq_address = snet_set_vq_address,
.set_vq_num = snet_set_vq_num,
@@ -508,6 +522,7 @@ static const struct vdpa_config_ops snet_config_ops = {
.set_status = snet_set_status,
.get_config = snet_get_config,
.set_config = snet_set_config,
+   .suspend= snet_suspend,
 };
 
 static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
diff --git a/drivers/vdpa/solidrun/snet_vdpa.h 
b/drivers/vdpa/solidrun/snet_vdpa.h
index 09ff676e7a2..3c78d4e7d48 100644
--- a/drivers/vdpa/solidrun/snet_vdpa.h
+++ b/drivers/vdpa/solidrun/snet_vdpa.h
@@ -203,5 +203,6 @@ void psnet_create_hwmon(struct pci_dev *pdev);
 void snet_ctrl_clear(struct snet *snet);
 int snet_destroy_dev(struct snet *snet);
 int snet_read_vq_state(struct snet *snet, u16 idx, struct vdpa_vq_state 
*state);
+int snet_suspend_dev(struct snet *snet);
 
 #endif //_SNET_VDPA_H_
-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/2] vdpa/snet: support getting and setting VQ state

2023-04-13 Thread Alvaro Karsz
This patch adds the get_vq_state and set_vq_state vDPA callbacks.

In order to get the VQ state, the state needs to be read from the DPU.
In order to allow that, the old messaging mechanism is replaced with a new,
flexible control mechanism.
This mechanism allows to read data from the DPU.

The mechanism can be used if the negotiated config version is 2 or
higher.

If the new mechanism is used when the config version is 1, it will call
snet_send_ctrl_msg_old, which is config 1 compatible.

Signed-off-by: Alvaro Karsz 
---
 drivers/vdpa/solidrun/Makefile |   1 +
 drivers/vdpa/solidrun/snet_ctrl.c  | 324 +
 drivers/vdpa/solidrun/snet_hwmon.c |   2 +-
 drivers/vdpa/solidrun/snet_main.c  | 112 --
 drivers/vdpa/solidrun/snet_vdpa.h  |  19 +-
 5 files changed, 387 insertions(+), 71 deletions(-)
 create mode 100644 drivers/vdpa/solidrun/snet_ctrl.c

diff --git a/drivers/vdpa/solidrun/Makefile b/drivers/vdpa/solidrun/Makefile
index c0aa3415bf7..9116252cd5f 100644
--- a/drivers/vdpa/solidrun/Makefile
+++ b/drivers/vdpa/solidrun/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_SNET_VDPA) += snet_vdpa.o
 snet_vdpa-$(CONFIG_SNET_VDPA) += snet_main.o
+snet_vdpa-$(CONFIG_SNET_VDPA) += snet_ctrl.o
 ifdef CONFIG_HWMON
 snet_vdpa-$(CONFIG_SNET_VDPA) += snet_hwmon.o
 endif
diff --git a/drivers/vdpa/solidrun/snet_ctrl.c 
b/drivers/vdpa/solidrun/snet_ctrl.c
new file mode 100644
index 000..10cde502f1a
--- /dev/null
+++ b/drivers/vdpa/solidrun/snet_ctrl.c
@@ -0,0 +1,324 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SolidRun DPU driver for control plane
+ *
+ * Copyright (C) 2022-2023 SolidRun
+ *
+ * Author: Alvaro Karsz 
+ *
+ */
+
+#include 
+
+#include "snet_vdpa.h"
+
+enum snet_ctrl_opcodes {
+   SNET_CTRL_OP_DESTROY = 1,
+   SNET_CTRL_OP_READ_VQ_STATE,
+};
+
+#define SNET_CTRL_TIMEOUT  200
+
+#define SNET_CTRL_DATA_SIZE_MASK   0x
+#define SNET_CTRL_IN_PROCESS_MASK  0x0001
+#define SNET_CTRL_CHUNK_RDY_MASK   0x0002
+#define SNET_CTRL_ERROR_MASK   0x0FFC
+
+#define SNET_VAL_TO_ERR(val)   (-(((val) & SNET_CTRL_ERROR_MASK) >> 
18))
+#define SNET_EMPTY_CTRL(val)   (((val) & SNET_CTRL_ERROR_MASK) || \
+   !((val) & 
SNET_CTRL_IN_PROCESS_MASK))
+#define SNET_DATA_READY(val)   ((val) & (SNET_CTRL_ERROR_MASK | 
SNET_CTRL_CHUNK_RDY_MASK))
+
+/* Control register used to read data from the DPU */
+struct snet_ctrl_reg_ctrl {
+   /* Chunk size in 4B words */
+   u16 data_size;
+   /* We are in the middle of a command */
+   u16 in_process:1;
+   /* A data chunk is ready and can be consumed */
+   u16 chunk_ready:1;
+   /* Error code */
+   u16 error:10;
+   /* Saved for future usage */
+   u16 rsvd:4;
+};
+
+/* Opcode register */
+struct snet_ctrl_reg_op {
+   u16 opcode;
+   /* Only if VQ index is relevant for the command */
+   u16 vq_idx;
+};
+
+struct snet_ctrl_regs {
+   struct snet_ctrl_reg_op op;
+   struct snet_ctrl_reg_ctrl ctrl;
+   u32 rsvd;
+   u32 data[];
+};
+
+static struct snet_ctrl_regs __iomem *snet_get_ctrl(struct snet *snet)
+{
+   return snet->bar + snet->psnet->cfg.ctrl_off;
+}
+
+static int snet_wait_for_empty_ctrl(struct snet_ctrl_regs __iomem *regs)
+{
+   u32 val;
+
+   return readx_poll_timeout(ioread32, >ctrl, val, 
SNET_EMPTY_CTRL(val), 10,
+ SNET_CTRL_TIMEOUT);
+}
+
+static int snet_wait_for_empty_op(struct snet_ctrl_regs __iomem *regs)
+{
+   u32 val;
+
+   return readx_poll_timeout(ioread32, >op, val, !val, 10, 
SNET_CTRL_TIMEOUT);
+}
+
+static int snet_wait_for_data(struct snet_ctrl_regs __iomem *regs)
+{
+   u32 val;
+
+   return readx_poll_timeout(ioread32, >ctrl, val, 
SNET_DATA_READY(val), 10,
+ SNET_CTRL_TIMEOUT);
+}
+
+static u32 snet_read32_word(struct snet_ctrl_regs __iomem *ctrl_regs, u16 
word_idx)
+{
+   return ioread32(_regs->data[word_idx]);
+}
+
+static u32 snet_read_ctrl(struct snet_ctrl_regs __iomem *ctrl_regs)
+{
+   return ioread32(_regs->ctrl);
+}
+
+static void snet_write_ctrl(struct snet_ctrl_regs __iomem *ctrl_regs, u32 val)
+{
+   iowrite32(val, _regs->ctrl);
+}
+
+static void snet_write_op(struct snet_ctrl_regs __iomem *ctrl_regs, u32 val)
+{
+   iowrite32(val, _regs->op);
+}
+
+static int snet_wait_for_dpu_completion(struct snet_ctrl_regs __iomem 
*ctrl_regs)
+{
+   /* Wait until the DPU finishes completely.
+* It will clear the opcode register.
+*/
+   return snet_wait_for_empty_op(ctrl_regs);
+}
+
+/* Reading ctrl from the DPU:
+ * buf_size must be 4B aligned
+ *
+ * Steps:
+ *
+ * (1) Verify that the DPU is not in the middle of another operation by
+ * reading the in_process and error bits in the control register.
+ * (2) Write the request opcode and the VQ idx in 

[PATCH v4 0/2] vdpa/snet: support [s/g]et_vq_state and suspend

2023-04-13 Thread Alvaro Karsz
Add more vDPA callbacks.

[s/g]et_vq_state is added in patch 1, including a new control mechanism
to read data from the DPU.

suspend is added in patch 2.

Link to v1:
https://lore.kernel.org/virtualization/20230402125219.1084754-1-alvaro.ka...@solid-run.com/
Link to v2:
https://lore.kernel.org/virtualization/20230409091024.3437405-1-alvaro.ka...@solid-run.com/
Link to v3:
https://lore.kernel.org/virtualization/20230410100237.3499578-1-alvaro.ka...@solid-run.com/

Changelog:

v2:
- Remove wmb() from snet_ctrl_read_from_dpu - Patch 1.
v3:
- Usage of spinlocks to serialize access to control registers -
  Patch 1.
v4:
- Rename struct snet_vq -> struct vdpa_vq_state from
  inital_state to vq_state - Patch 1.

Alvaro Karsz (2):
  vdpa/snet: support getting and setting VQ state
  vdpa/snet: support the suspend vDPA callback

 drivers/vdpa/solidrun/Makefile |   1 +
 drivers/vdpa/solidrun/snet_ctrl.c  | 330 +
 drivers/vdpa/solidrun/snet_hwmon.c |   2 +-
 drivers/vdpa/solidrun/snet_main.c  | 127 ++-
 drivers/vdpa/solidrun/snet_vdpa.h  |  20 +-
 5 files changed, 409 insertions(+), 71 deletions(-)
 create mode 100644 drivers/vdpa/solidrun/snet_ctrl.c

-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V2 2/2] virtio-net: sleep instead of busy waiting for cvq command

2023-04-13 Thread Xuan Zhuo
On Thu, 13 Apr 2023 14:40:27 +0800, Jason Wang  wrote:
> We used to busy waiting on the cvq command this tends to be
> problematic since there no way for to schedule another process which
> may serve for the control virtqueue. This might be the case when the
> control virtqueue is emulated by software. This patch switches to use
> completion to allow the CPU to sleep instead of busy waiting for the
> cvq command.
>
> Signed-off-by: Jason Wang 
> ---
> Changes since V1:
> - use completion for simplicity
> - don't try to harden the CVQ command which requires more thought
> Changes since RFC:
> - break the device when timeout
> - get buffer manually since the virtio core check more_used() instead
> ---
>  drivers/net/virtio_net.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2e56bbf86894..d3eb8fd6c9dc 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -295,6 +296,8 @@ struct virtnet_info {
>
>   /* failover when STANDBY feature enabled */
>   struct failover *failover;
> +
> + struct completion completion;
>  };
>
>  struct padded_vnet_hdr {
> @@ -1709,6 +1712,13 @@ static bool try_fill_recv(struct virtnet_info *vi, 
> struct receive_queue *rq,
>   return !oom;
>  }
>
> +static void virtnet_cvq_done(struct virtqueue *cvq)
> +{
> + struct virtnet_info *vi = cvq->vdev->priv;
> +
> + complete(>completion);
> +}
> +
>  static void skb_recv_done(struct virtqueue *rvq)
>  {
>   struct virtnet_info *vi = rvq->vdev->priv;
> @@ -2169,12 +2179,8 @@ static bool virtnet_send_command(struct virtnet_info 
> *vi, u8 class, u8 cmd,
>   if (unlikely(!virtqueue_kick(vi->cvq)))
>   return vi->ctrl->status == VIRTIO_NET_OK;
>
> - /* Spin for a response, the kick causes an ioport write, trapping
> -  * into the hypervisor, so the request should be handled immediately.
> -  */
> - while (!virtqueue_get_buf(vi->cvq, ) &&
> -!virtqueue_is_broken(vi->cvq))
> - cpu_relax();
> + wait_for_completion(>completion);
> + virtqueue_get_buf(vi->cvq, );
>
>   return vi->ctrl->status == VIRTIO_NET_OK;
>  }
> @@ -3672,7 +3678,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
>
>   /* Parameters for control virtqueue, if any */
>   if (vi->has_cvq) {
> - callbacks[total_vqs - 1] = NULL;
> + callbacks[total_vqs - 1] = virtnet_cvq_done;

This depends the interrupt, right?

I worry that there may be some devices that may not support interruption on cvq.
Although this may not be in line with SPEC, it may cause problem on the devices
that can work normally at present.

Thanks.


>   names[total_vqs - 1] = "control";
>   }
>
> @@ -4122,6 +4128,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>   if (vi->has_rss || vi->has_rss_hash_report)
>   virtnet_init_default_rss(vi);
>
> + init_completion(>completion);
>   enable_rx_mode_work(vi);
>
>   /* serialize netdev register + virtio_device_ready() with ndo_open() */
> --
> 2.25.1
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS

2023-04-13 Thread Xuan Zhuo
First of all, I personally love open source, linux and virtio. I have
also participated in community work such as virtio for a long time.

I think I am familiar enough with virtio/virtio-net and is adequate as a
reviewer.

Every time there is some patch/bug, I wish I can get pinged
and I will feedback on that.

For me personally, being a reviewer is an honor and a responsibility,
and it also makes it easier for me to participate in virtio-related
work. And I will spend more time reviewing virtio patch. Better advance
virtio development

I had some contributions to virtio/virtio-net and some support for it.

* per-queue reset
* virtio-net xdp
* some bug fix
* ..

I make a humble request to grant the reviewer role for the virtio core
and net drivers.

Signed-off-by: Xuan Zhuo 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cacd6074fb89..700b00a9e225 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22064,6 +22064,7 @@ F:  include/uapi/linux/virtio_console.h
 VIRTIO CORE AND NET DRIVERS
 M: "Michael S. Tsirkin" 
 M: Jason Wang 
+R: Xuan Zhuo 
 L: virtualization@lists.linux-foundation.org
 S: Maintained
 F: Documentation/ABI/testing/sysfs-bus-vdpa
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next V2 2/2] virtio-net: sleep instead of busy waiting for cvq command

2023-04-13 Thread Jason Wang
We used to busy waiting on the cvq command this tends to be
problematic since there no way for to schedule another process which
may serve for the control virtqueue. This might be the case when the
control virtqueue is emulated by software. This patch switches to use
completion to allow the CPU to sleep instead of busy waiting for the
cvq command.

Signed-off-by: Jason Wang 
---
Changes since V1:
- use completion for simplicity
- don't try to harden the CVQ command which requires more thought
Changes since RFC:
- break the device when timeout
- get buffer manually since the virtio core check more_used() instead
---
 drivers/net/virtio_net.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2e56bbf86894..d3eb8fd6c9dc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -295,6 +296,8 @@ struct virtnet_info {
 
/* failover when STANDBY feature enabled */
struct failover *failover;
+
+   struct completion completion;
 };
 
 struct padded_vnet_hdr {
@@ -1709,6 +1712,13 @@ static bool try_fill_recv(struct virtnet_info *vi, 
struct receive_queue *rq,
return !oom;
 }
 
+static void virtnet_cvq_done(struct virtqueue *cvq)
+{
+   struct virtnet_info *vi = cvq->vdev->priv;
+
+   complete(>completion);
+}
+
 static void skb_recv_done(struct virtqueue *rvq)
 {
struct virtnet_info *vi = rvq->vdev->priv;
@@ -2169,12 +2179,8 @@ static bool virtnet_send_command(struct virtnet_info 
*vi, u8 class, u8 cmd,
if (unlikely(!virtqueue_kick(vi->cvq)))
return vi->ctrl->status == VIRTIO_NET_OK;
 
-   /* Spin for a response, the kick causes an ioport write, trapping
-* into the hypervisor, so the request should be handled immediately.
-*/
-   while (!virtqueue_get_buf(vi->cvq, ) &&
-  !virtqueue_is_broken(vi->cvq))
-   cpu_relax();
+   wait_for_completion(>completion);
+   virtqueue_get_buf(vi->cvq, );
 
return vi->ctrl->status == VIRTIO_NET_OK;
 }
@@ -3672,7 +3678,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 
/* Parameters for control virtqueue, if any */
if (vi->has_cvq) {
-   callbacks[total_vqs - 1] = NULL;
+   callbacks[total_vqs - 1] = virtnet_cvq_done;
names[total_vqs - 1] = "control";
}
 
@@ -4122,6 +4128,7 @@ static int virtnet_probe(struct virtio_device *vdev)
if (vi->has_rss || vi->has_rss_hash_report)
virtnet_init_default_rss(vi);
 
+   init_completion(>completion);
enable_rx_mode_work(vi);
 
/* serialize netdev register + virtio_device_ready() with ndo_open() */
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next V2 1/2] virtio-net: convert rx mode setting to use workqueue

2023-04-13 Thread Jason Wang
This patch convert rx mode setting to be done in a workqueue, this is
a must for allow to sleep when waiting for the cvq command to
response since current code is executed under addr spin lock.

Signed-off-by: Jason Wang 
---
Changes since V1:
- use RTNL to synchronize rx mode worker
---
 drivers/net/virtio_net.c | 55 +---
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e2560b6f7980..2e56bbf86894 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -265,6 +265,12 @@ struct virtnet_info {
/* Work struct for config space updates */
struct work_struct config_work;
 
+   /* Work struct for config rx mode */
+   struct work_struct rx_mode_work;
+
+   /* Is rx mode work enabled? */
+   bool rx_mode_work_enabled;
+
/* Does the affinity hint is set for virtqueues? */
bool affinity_hint_set;
 
@@ -388,6 +394,20 @@ static void disable_delayed_refill(struct virtnet_info *vi)
spin_unlock_bh(>refill_lock);
 }
 
+static void enable_rx_mode_work(struct virtnet_info *vi)
+{
+   rtnl_lock();
+   vi->rx_mode_work_enabled = true;
+   rtnl_unlock();
+}
+
+static void disable_rx_mode_work(struct virtnet_info *vi)
+{
+   rtnl_lock();
+   vi->rx_mode_work_enabled = false;
+   rtnl_unlock();
+}
+
 static void virtqueue_napi_schedule(struct napi_struct *napi,
struct virtqueue *vq)
 {
@@ -2310,9 +2330,11 @@ static int virtnet_close(struct net_device *dev)
return 0;
 }
 
-static void virtnet_set_rx_mode(struct net_device *dev)
+static void virtnet_rx_mode_work(struct work_struct *work)
 {
-   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtnet_info *vi =
+   container_of(work, struct virtnet_info, rx_mode_work);
+   struct net_device *dev = vi->dev;
struct scatterlist sg[2];
struct virtio_net_ctrl_mac *mac_data;
struct netdev_hw_addr *ha;
@@ -2325,6 +2347,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
return;
 
+   rtnl_lock();
+
vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
 
@@ -2342,14 +2366,19 @@ static void virtnet_set_rx_mode(struct net_device *dev)
dev_warn(>dev, "Failed to %sable allmulti mode.\n",
 vi->ctrl->allmulti ? "en" : "dis");
 
+   netif_addr_lock_bh(dev);
+
uc_count = netdev_uc_count(dev);
mc_count = netdev_mc_count(dev);
/* MAC filter - use one buffer for both lists */
buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
  (2 * sizeof(mac_data->entries)), GFP_ATOMIC);
mac_data = buf;
-   if (!buf)
+   if (!buf) {
+   netif_addr_unlock_bh(dev);
+   rtnl_unlock();
return;
+   }
 
sg_init_table(sg, 2);
 
@@ -2370,6 +2399,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
netdev_for_each_mc_addr(ha, dev)
memcpy(_data->macs[i++][0], ha->addr, ETH_ALEN);
 
+   netif_addr_unlock_bh(dev);
+
sg_set_buf([1], mac_data,
   sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
 
@@ -2377,9 +2408,19 @@ static void virtnet_set_rx_mode(struct net_device *dev)
  VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
dev_warn(>dev, "Failed to set MAC filter table.\n");
 
+   rtnl_unlock();
+
kfree(buf);
 }
 
+static void virtnet_set_rx_mode(struct net_device *dev)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+
+   if (vi->rx_mode_work_enabled)
+   schedule_work(>rx_mode_work);
+}
+
 static int virtnet_vlan_rx_add_vid(struct net_device *dev,
   __be16 proto, u16 vid)
 {
@@ -3150,6 +3191,8 @@ static void virtnet_freeze_down(struct virtio_device 
*vdev)
 
/* Make sure no work handler is accessing the device */
flush_work(>config_work);
+   disable_rx_mode_work(vi);
+   flush_work(>rx_mode_work);
 
netif_tx_lock_bh(vi->dev);
netif_device_detach(vi->dev);
@@ -3172,6 +3215,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
virtio_device_ready(vdev);
 
enable_delayed_refill(vi);
+   enable_rx_mode_work(vi);
 
if (netif_running(vi->dev)) {
err = virtnet_open(vi->dev);
@@ -3969,6 +4013,7 @@ static int virtnet_probe(struct virtio_device *vdev)
vdev->priv = vi;
 
INIT_WORK(>config_work, virtnet_config_changed_work);
+   INIT_WORK(>rx_mode_work, virtnet_rx_mode_work);
spin_lock_init(>refill_lock);
 
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) {
@@ -4077,6 +4122,8 @@ static int virtnet_probe(struct 

[PATCH net-next V2 0/2] virtio-net: don't busy poll for cvq command

2023-04-13 Thread Jason Wang
Hi all:

The code used to busy poll for cvq command which turns out to have
several side effects:

1) infinite poll for buggy devices
2) bad interaction with scheduler

So this series tries to use sleep instead of busy polling. In this
version, I take a step back: the hardening part is not implemented and
leave for future investigation. We use to aggree to use interruptible
sleep but it doesn't work for a general workqueue.

Please review.

Thanks

Changes since V1:
- use RTNL to synchronize rx mode worker
- use completion for simplicity
- don't try to harden CVQ command

Changes since RFC:

- switch to use BAD_RING in virtio_break_device()
- check virtqueue_is_broken() after being woken up
- use more_used() instead of virtqueue_get_buf() to allow caller to
  get buffers afterwards
  - break the virtio-net device when timeout
  - get buffer manually since the virtio core check more_used() instead

Jason Wang (2):
  virtio-net: convert rx mode setting to use workqueue
  virtio-net: sleep instead of busy waiting for cvq command

 drivers/net/virtio_net.c | 76 ++--
 1 file changed, 66 insertions(+), 10 deletions(-)

-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization