Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-20 Thread Michael S. Tsirkin
On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> On 7/20/23 1:38 AM, Jason Wang wrote:
> > 
> > Adding cond_resched() to the command waiting loop for a better
> > co-operation with the scheduler. This allows to give CPU a breath to
> > run other task(workqueue) instead of busy looping when preemption is
> > not allowed on a device whose CVQ might be slow.
> > 
> > Signed-off-by: Jason Wang 
> 
> This still leaves hung processes, but at least it doesn't pin the CPU any
> more.  Thanks.
> Reviewed-by: Shannon Nelson 
> 

I'd like to see a full solution
1- block until interrupt
2- still handle surprise removal correctly by waking in that case



> > ---
> >   drivers/net/virtio_net.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 9f3b1d6ac33d..e7533f29b219 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info 
> > *vi, u8 class, u8 cmd,
> >   * into the hypervisor, so the request should be handled 
> > immediately.
> >   */
> >  while (!virtqueue_get_buf(vi->cvq, ) &&
> > -  !virtqueue_is_broken(vi->cvq))
> > +  !virtqueue_is_broken(vi->cvq)) {
> > +   cond_resched();
> >  cpu_relax();
> > +   }
> > 
> >  return vi->ctrl->status == VIRTIO_NET_OK;
> >   }
> > --
> > 2.39.3
> > 
> > ___
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-20 Thread Michael S. Tsirkin
On Thu, Jul 20, 2023 at 08:31:13AM -0700, Shannon Nelson wrote:
> On 7/20/23 1:38 AM, Jason Wang wrote:
> > 
> > Adding cond_resched() to the command waiting loop for a better
> > co-operation with the scheduler. This allows to give CPU a breath to
> > run other task(workqueue) instead of busy looping when preemption is
> > not allowed on a device whose CVQ might be slow.
> > 
> > Signed-off-by: Jason Wang 
> > ---
> >   drivers/net/virtio_net.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 9f3b1d6ac33d..e7533f29b219 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info 
> > *vi, u8 class, u8 cmd,
> >   * into the hypervisor, so the request should be handled 
> > immediately.
> >   */
> >  while (!virtqueue_get_buf(vi->cvq, ) &&
> > -  !virtqueue_is_broken(vi->cvq))
> > +  !virtqueue_is_broken(vi->cvq)) {
> > +   cond_resched();
> >  cpu_relax();
> > +   }
> 
> The cover letter suggests that this addresses the infinite poll for buggy
> devices, but I don't see how that is resolved here.  This should make it a
> little nicer to the system, but it still is going to poll forever on a
> device that has gone catatonic.  Is there a reason that I'm missing that we
> don't have a polling limit here?
> 
> sln

we don't know what the limit would be. but given it's a workqueue
now, why does it still have to poll as opposed to blocking?


> > 
> >  return vi->ctrl->status == VIRTIO_NET_OK;
> >   }
> > --
> > 2.39.3
> > 
> > ___
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-20 Thread Shannon Nelson via Virtualization

On 7/20/23 1:38 AM, Jason Wang wrote:


Adding cond_resched() to the command waiting loop for a better
co-operation with the scheduler. This allows to give CPU a breath to
run other task(workqueue) instead of busy looping when preemption is
not allowed on a device whose CVQ might be slow.

Signed-off-by: Jason Wang 


This still leaves hung processes, but at least it doesn't pin the CPU 
any more.  Thanks.


Reviewed-by: Shannon Nelson 



---
  drivers/net/virtio_net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9f3b1d6ac33d..e7533f29b219 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info 
*vi, u8 class, u8 cmd,
  * into the hypervisor, so the request should be handled immediately.
  */
 while (!virtqueue_get_buf(vi->cvq, ) &&
-  !virtqueue_is_broken(vi->cvq))
+  !virtqueue_is_broken(vi->cvq)) {
+   cond_resched();
 cpu_relax();
+   }

 return vi->ctrl->status == VIRTIO_NET_OK;
  }
--
2.39.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v4 1/2] virtio-net: convert rx mode setting to use workqueue

2023-07-20 Thread Shannon Nelson via Virtualization

On 7/20/23 1:38 AM, Jason Wang wrote:


This patch convert rx mode setting to be done in a workqueue, this is
a must for allow to sleep when waiting for the cvq command to
response since current code is executed under addr spin lock.

Note that we need to disable and flush the workqueue during freeze,
this means the rx mode setting is lost after resuming. This is not the
bug of this patch as we never try to restore rx mode setting during
resume.

Signed-off-by: Jason Wang 


Reviewed-by: Shannon Nelson 


---
  drivers/net/virtio_net.c | 55 +---
  1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d67b36fdba0d..9f3b1d6ac33d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -274,6 +274,12 @@ struct virtnet_info {
 /* Work struct for config space updates */
 struct work_struct config_work;

+   /* Work struct for setting rx mode */
+   struct work_struct rx_mode_work;
+
+   /* OK to queue work setting RX mode? */
+   bool rx_mode_work_enabled;
+
 /* Does the affinity hint is set for virtqueues? */
 bool affinity_hint_set;

@@ -397,6 +403,20 @@ static void disable_delayed_refill(struct virtnet_info *vi)
 spin_unlock_bh(>refill_lock);
  }

+static void enable_rx_mode_work(struct virtnet_info *vi)
+{
+   rtnl_lock();
+   vi->rx_mode_work_enabled = true;
+   rtnl_unlock();
+}
+
+static void disable_rx_mode_work(struct virtnet_info *vi)
+{
+   rtnl_lock();
+   vi->rx_mode_work_enabled = false;
+   rtnl_unlock();
+}
+
  static void virtqueue_napi_schedule(struct napi_struct *napi,
 struct virtqueue *vq)
  {
@@ -2448,9 +2468,11 @@ static int virtnet_close(struct net_device *dev)
 return 0;
  }

-static void virtnet_set_rx_mode(struct net_device *dev)
+static void virtnet_rx_mode_work(struct work_struct *work)
  {
-   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtnet_info *vi =
+   container_of(work, struct virtnet_info, rx_mode_work);
+   struct net_device *dev = vi->dev;
 struct scatterlist sg[2];
 struct virtio_net_ctrl_mac *mac_data;
 struct netdev_hw_addr *ha;
@@ -2463,6 +2485,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
 return;

+   rtnl_lock();
+
 vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
 vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);

@@ -2480,14 +2504,19 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 dev_warn(>dev, "Failed to %sable allmulti mode.\n",
  vi->ctrl->allmulti ? "en" : "dis");

+   netif_addr_lock_bh(dev);
+
 uc_count = netdev_uc_count(dev);
 mc_count = netdev_mc_count(dev);
 /* MAC filter - use one buffer for both lists */
 buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
   (2 * sizeof(mac_data->entries)), GFP_ATOMIC);
 mac_data = buf;
-   if (!buf)
+   if (!buf) {
+   netif_addr_unlock_bh(dev);
+   rtnl_unlock();
 return;
+   }

 sg_init_table(sg, 2);

@@ -2508,6 +2537,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 netdev_for_each_mc_addr(ha, dev)
 memcpy(_data->macs[i++][0], ha->addr, ETH_ALEN);

+   netif_addr_unlock_bh(dev);
+
 sg_set_buf([1], mac_data,
sizeof(mac_data->entries) + (mc_count * ETH_ALEN));

@@ -2515,9 +2546,19 @@ static void virtnet_set_rx_mode(struct net_device *dev)
   VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
 dev_warn(>dev, "Failed to set MAC filter table.\n");

+   rtnl_unlock();
+
 kfree(buf);
  }

+static void virtnet_set_rx_mode(struct net_device *dev)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+
+   if (vi->rx_mode_work_enabled)
+   schedule_work(>rx_mode_work);
+}
+
  static int virtnet_vlan_rx_add_vid(struct net_device *dev,
__be16 proto, u16 vid)
  {
@@ -3286,6 +3327,8 @@ static void virtnet_freeze_down(struct virtio_device 
*vdev)

 /* Make sure no work handler is accessing the device */
 flush_work(>config_work);
+   disable_rx_mode_work(vi);
+   flush_work(>rx_mode_work);

 netif_tx_lock_bh(vi->dev);
 netif_device_detach(vi->dev);
@@ -3308,6 +3351,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 virtio_device_ready(vdev);

 enable_delayed_refill(vi);
+   enable_rx_mode_work(vi);

 if (netif_running(vi->dev)) {
 err = virtnet_open(vi->dev);
@@ -4121,6 +4165,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 vdev->priv = vi;

 

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-20 Thread Michael S. Tsirkin
On Wed, Jul 19, 2023 at 11:22:42PM -0700, Christoph Hellwig wrote:
> On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > mapped across multiple add/get buf operations.
> > > 
> > > This is just poking holes into the abstraction..
> > 
> > More specifically?
> 
> Because now you expose a device that can't be used for the non-dma
> mapping case and shoud be hidden.


Ah, ok.
Well I think we can add wrappers like virtio_dma_sync and so on.
There are NOP for non-dma so passing the dma device is harmless.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] virtio-pci: Fix legacy device flag setting error in probe

2023-07-20 Thread Michael S. Tsirkin
On Thu, Jul 20, 2023 at 10:27:04AM +0800, Jason Wang wrote:
> On Wed, Jul 19, 2023 at 11:46 PM Feng Liu  wrote:
> >
> > The 'is_legacy' flag is used to differentiate between legacy vs modern
> > device. Currently, it is based on the value of vp_dev->ldev.ioaddr.
> > However, due to the shared memory of the union between struct
> > virtio_pci_legacy_device and struct virtio_pci_modern_device, when
> > virtio_pci_modern_probe modifies the content of struct
> > virtio_pci_modern_device, it affects the content of struct
> > virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing
> > the 'is_legacy' flag to be set as true. To resolve issue, when legacy
> > device is probed, mark 'is_legacy' as true, when modern device is
> > probed, keep 'is_legacy' as false.
> >
> > Fixes: 4f0fc22534e3 ("virtio_pci: Optimize virtio_pci_device structure 
> > size")
> > Signed-off-by: Feng Liu 
> > Reviewed-by: Parav Pandit 
> > Reviewed-by: Jiri Pirko 
> > ---
> >  drivers/virtio/virtio_pci_common.c | 2 --
> >  drivers/virtio/virtio_pci_legacy.c | 1 +
> >  2 files changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c 
> > b/drivers/virtio/virtio_pci_common.c
> > index a6c86f916dbd..c2524a7207cf 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -557,8 +557,6 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
> >
> > pci_set_master(pci_dev);
> >
> > -   vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false;
> > -
> > rc = register_virtio_device(_dev->vdev);
> > reg_dev = vp_dev;
> > if (rc)
> > diff --git a/drivers/virtio/virtio_pci_legacy.c 
> > b/drivers/virtio/virtio_pci_legacy.c
> > index 2257f1b3d8ae..d9cbb02b35a1 100644
> > --- a/drivers/virtio/virtio_pci_legacy.c
> > +++ b/drivers/virtio/virtio_pci_legacy.c
> > @@ -223,6 +223,7 @@ int virtio_pci_legacy_probe(struct virtio_pci_device 
> > *vp_dev)
> > vp_dev->config_vector = vp_config_vector;
> > vp_dev->setup_vq = setup_vq;
> > vp_dev->del_vq = del_vq;
> > +   vp_dev->is_legacy = true;
> 
> This seems break force_legacy for modern device:
> 
> if (force_legacy) {
> rc = virtio_pci_legacy_probe(vp_dev);
> /* Also try modern mode if we can't map BAR0 (no IO space). */
> if (rc == -ENODEV || rc == -ENOMEM)
> rc = virtio_pci_modern_probe(vp_dev);
> 
> Thanks

don't see the breakage here - can you explain a bit more?

> >
> > return 0;
> >  }
> > --
> > 2.37.1 (Apple Git-137.1)
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] virtio-pci: Fix legacy device flag setting error in probe

2023-07-20 Thread Feng Liu via Virtualization



On 2023-07-19 p.m.10:27, Jason Wang wrote:

External email: Use caution opening links or attachments


On Wed, Jul 19, 2023 at 11:46 PM Feng Liu  wrote:


The 'is_legacy' flag is used to differentiate between legacy vs modern
device. Currently, it is based on the value of vp_dev->ldev.ioaddr.
However, due to the shared memory of the union between struct
virtio_pci_legacy_device and struct virtio_pci_modern_device, when
virtio_pci_modern_probe modifies the content of struct
virtio_pci_modern_device, it affects the content of struct
virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing
the 'is_legacy' flag to be set as true. To resolve issue, when legacy
device is probed, mark 'is_legacy' as true, when modern device is
probed, keep 'is_legacy' as false.

Fixes: 4f0fc22534e3 ("virtio_pci: Optimize virtio_pci_device structure size")
Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
---
  drivers/virtio/virtio_pci_common.c | 2 --
  drivers/virtio/virtio_pci_legacy.c | 1 +
  2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index a6c86f916dbd..c2524a7207cf 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -557,8 +557,6 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,

 pci_set_master(pci_dev);

-   vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false;
-
 rc = register_virtio_device(_dev->vdev);
 reg_dev = vp_dev;
 if (rc)
diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index 2257f1b3d8ae..d9cbb02b35a1 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -223,6 +223,7 @@ int virtio_pci_legacy_probe(struct virtio_pci_device 
*vp_dev)
 vp_dev->config_vector = vp_config_vector;
 vp_dev->setup_vq = setup_vq;
 vp_dev->del_vq = del_vq;
+   vp_dev->is_legacy = true;


This seems break force_legacy for modern device:

 if (force_legacy) {
 rc = virtio_pci_legacy_probe(vp_dev);
 /* Also try modern mode if we can't map BAR0 (no IO space). */
 if (rc == -ENODEV || rc == -ENOMEM)
 rc = virtio_pci_modern_probe(vp_dev);

Thanks



Hi, Jason

In the case of force_legacy, if no IO space occurs, function will return 
directly after vp_legacy_probe, and will not run vp_dev->is_legacy = 
true; because vp_dev is allocated through kzalloc, the default 
vp_dev->is_legacy is false, which It is expected for modern device, so 
it will not break modern device.


What do you think?

int virtio_pci_legacy_probe(struct virtio_pci_device *vp_dev)
{
[...]

rc = vp_legacy_probe(ldev);
if (rc)
return rc;  /* if no IO space, function will return from here */

[...]
vp_dev->is_legacy = true;
}




 return 0;
  }
--
2.37.1 (Apple Git-137.1)




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-20 Thread Shannon Nelson via Virtualization

On 7/20/23 1:38 AM, Jason Wang wrote:


Adding cond_resched() to the command waiting loop for a better
co-operation with the scheduler. This allows to give CPU a breath to
run other task(workqueue) instead of busy looping when preemption is
not allowed on a device whose CVQ might be slow.

Signed-off-by: Jason Wang 
---
  drivers/net/virtio_net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9f3b1d6ac33d..e7533f29b219 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info 
*vi, u8 class, u8 cmd,
  * into the hypervisor, so the request should be handled immediately.
  */
 while (!virtqueue_get_buf(vi->cvq, ) &&
-  !virtqueue_is_broken(vi->cvq))
+  !virtqueue_is_broken(vi->cvq)) {
+   cond_resched();
 cpu_relax();
+   }


The cover letter suggests that this addresses the infinite poll for 
buggy devices, but I don't see how that is resolved here.  This should 
make it a little nicer to the system, but it still is going to poll 
forever on a device that has gone catatonic.  Is there a reason that I'm 
missing that we don't have a polling limit here?


sln



 return vi->ctrl->status == VIRTIO_NET_OK;
  }
--
2.39.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-07-20 Thread Michael S. Tsirkin
On Thu, Feb 02, 2023 at 05:25:17PM -0600, Mike Christie wrote:
> For vhost workers we use the kthread API which inherit's its values from
> and checks against the kthreadd thread. This results in the wrong RLIMITs
> being checked, so while tools like libvirt try to control the number of
> threads based on the nproc rlimit setting we can end up creating more
> threads than the user wanted.
> 
> This patch has us use the vhost_task helpers which will inherit its
> values/checks from the thread that owns the device similar to if we did
> a clone in userspace. The vhost threads will now be counted in the nproc
> rlimits. And we get features like cgroups and mm sharing automatically,
> so we can remove those calls.
> 
> Signed-off-by: Mike Christie 
> Acked-by: Michael S. Tsirkin 



Hi Mike,
So this seems to have caused a measureable regression in networking
performance (about 30%). Take a look here, and there's a zip file
with detailed measuraments attached:

https://bugzilla.redhat.com/show_bug.cgi?id=603


Could you take a look please?
You can also ask reporter questions there assuming you
have or can create a (free) account.



> ---
>  drivers/vhost/vhost.c | 58 ---
>  drivers/vhost/vhost.h |  4 +--
>  2 files changed, 13 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 74378d241f8d..d3c7c37b69a7 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -22,11 +22,11 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -256,7 +256,7 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
> vhost_work *work)
>* test_and_set_bit() implies a memory barrier.
>*/
>   llist_add(>node, >worker->work_list);
> - wake_up_process(dev->worker->task);
> + wake_up_process(dev->worker->vtsk->task);
>   }
>  }
>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> @@ -336,17 +336,14 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>  static int vhost_worker(void *data)
>  {
>   struct vhost_worker *worker = data;
> - struct vhost_dev *dev = worker->dev;
>   struct vhost_work *work, *work_next;
>   struct llist_node *node;
>  
> - kthread_use_mm(dev->mm);
> -
>   for (;;) {
>   /* mb paired w/ kthread_stop */
>   set_current_state(TASK_INTERRUPTIBLE);
>  
> - if (kthread_should_stop()) {
> + if (vhost_task_should_stop(worker->vtsk)) {
>   __set_current_state(TASK_RUNNING);
>   break;
>   }
> @@ -368,7 +365,7 @@ static int vhost_worker(void *data)
>   schedule();
>   }
>   }
> - kthread_unuse_mm(dev->mm);
> +
>   return 0;
>  }
>  
> @@ -509,31 +506,6 @@ long vhost_dev_check_owner(struct vhost_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_check_owner);
>  
> -struct vhost_attach_cgroups_struct {
> - struct vhost_work work;
> - struct task_struct *owner;
> - int ret;
> -};
> -
> -static void vhost_attach_cgroups_work(struct vhost_work *work)
> -{
> - struct vhost_attach_cgroups_struct *s;
> -
> - s = container_of(work, struct vhost_attach_cgroups_struct, work);
> - s->ret = cgroup_attach_task_all(s->owner, current);
> -}
> -
> -static int vhost_attach_cgroups(struct vhost_dev *dev)
> -{
> - struct vhost_attach_cgroups_struct attach;
> -
> - attach.owner = current;
> - vhost_work_init(, vhost_attach_cgroups_work);
> - vhost_work_queue(dev, );
> - vhost_dev_flush(dev);
> - return attach.ret;
> -}
> -
>  /* Caller should have device mutex */
>  bool vhost_dev_has_owner(struct vhost_dev *dev)
>  {
> @@ -580,14 +552,14 @@ static void vhost_worker_free(struct vhost_dev *dev)
>  
>   dev->worker = NULL;
>   WARN_ON(!llist_empty(>work_list));
> - kthread_stop(worker->task);
> + vhost_task_stop(worker->vtsk);
>   kfree(worker);
>  }
>  
>  static int vhost_worker_create(struct vhost_dev *dev)
>  {
>   struct vhost_worker *worker;
> - struct task_struct *task;
> + struct vhost_task *vtsk;
>   int ret;
>  
>   worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
> @@ -595,27 +567,19 @@ static int vhost_worker_create(struct vhost_dev *dev)
>   return -ENOMEM;
>  
>   dev->worker = worker;
> - worker->dev = dev;
>   worker->kcov_handle = kcov_common_handle();
>   init_llist_head(>work_list);
>  
> - task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid);
> - if (IS_ERR(task)) {
> - ret = PTR_ERR(task);
> + vtsk = vhost_task_create(vhost_worker, worker, NUMA_NO_NODE);
> + if (!vtsk) {
> + ret = -ENOMEM;
>   goto free_worker;
>   }
>  
> - worker->task = task;
> - wake_up_process(task); /* 

[PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-20 Thread Jason Wang
Adding cond_resched() to the command waiting loop for a better
co-operation with the scheduler. This allows to give CPU a breath to
run other task(workqueue) instead of busy looping when preemption is
not allowed on a device whose CVQ might be slow.

Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9f3b1d6ac33d..e7533f29b219 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info 
*vi, u8 class, u8 cmd,
 * into the hypervisor, so the request should be handled immediately.
 */
while (!virtqueue_get_buf(vi->cvq, ) &&
-  !virtqueue_is_broken(vi->cvq))
+  !virtqueue_is_broken(vi->cvq)) {
+   cond_resched();
cpu_relax();
+   }
 
return vi->ctrl->status == VIRTIO_NET_OK;
 }
-- 
2.39.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v4 1/2] virtio-net: convert rx mode setting to use workqueue

2023-07-20 Thread Jason Wang
This patch convert rx mode setting to be done in a workqueue, this is
a must for allow to sleep when waiting for the cvq command to
response since current code is executed under addr spin lock.

Note that we need to disable and flush the workqueue during freeze,
this means the rx mode setting is lost after resuming. This is not the
bug of this patch as we never try to restore rx mode setting during
resume.

Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 55 +---
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d67b36fdba0d..9f3b1d6ac33d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -274,6 +274,12 @@ struct virtnet_info {
/* Work struct for config space updates */
struct work_struct config_work;
 
+   /* Work struct for setting rx mode */
+   struct work_struct rx_mode_work;
+
+   /* OK to queue work setting RX mode? */
+   bool rx_mode_work_enabled;
+
/* Does the affinity hint is set for virtqueues? */
bool affinity_hint_set;
 
@@ -397,6 +403,20 @@ static void disable_delayed_refill(struct virtnet_info *vi)
spin_unlock_bh(>refill_lock);
 }
 
+static void enable_rx_mode_work(struct virtnet_info *vi)
+{
+   rtnl_lock();
+   vi->rx_mode_work_enabled = true;
+   rtnl_unlock();
+}
+
+static void disable_rx_mode_work(struct virtnet_info *vi)
+{
+   rtnl_lock();
+   vi->rx_mode_work_enabled = false;
+   rtnl_unlock();
+}
+
 static void virtqueue_napi_schedule(struct napi_struct *napi,
struct virtqueue *vq)
 {
@@ -2448,9 +2468,11 @@ static int virtnet_close(struct net_device *dev)
return 0;
 }
 
-static void virtnet_set_rx_mode(struct net_device *dev)
+static void virtnet_rx_mode_work(struct work_struct *work)
 {
-   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtnet_info *vi =
+   container_of(work, struct virtnet_info, rx_mode_work);
+   struct net_device *dev = vi->dev;
struct scatterlist sg[2];
struct virtio_net_ctrl_mac *mac_data;
struct netdev_hw_addr *ha;
@@ -2463,6 +2485,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX))
return;
 
+   rtnl_lock();
+
vi->ctrl->promisc = ((dev->flags & IFF_PROMISC) != 0);
vi->ctrl->allmulti = ((dev->flags & IFF_ALLMULTI) != 0);
 
@@ -2480,14 +2504,19 @@ static void virtnet_set_rx_mode(struct net_device *dev)
dev_warn(>dev, "Failed to %sable allmulti mode.\n",
 vi->ctrl->allmulti ? "en" : "dis");
 
+   netif_addr_lock_bh(dev);
+
uc_count = netdev_uc_count(dev);
mc_count = netdev_mc_count(dev);
/* MAC filter - use one buffer for both lists */
buf = kzalloc(((uc_count + mc_count) * ETH_ALEN) +
  (2 * sizeof(mac_data->entries)), GFP_ATOMIC);
mac_data = buf;
-   if (!buf)
+   if (!buf) {
+   netif_addr_unlock_bh(dev);
+   rtnl_unlock();
return;
+   }
 
sg_init_table(sg, 2);
 
@@ -2508,6 +2537,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
netdev_for_each_mc_addr(ha, dev)
memcpy(_data->macs[i++][0], ha->addr, ETH_ALEN);
 
+   netif_addr_unlock_bh(dev);
+
sg_set_buf([1], mac_data,
   sizeof(mac_data->entries) + (mc_count * ETH_ALEN));
 
@@ -2515,9 +2546,19 @@ static void virtnet_set_rx_mode(struct net_device *dev)
  VIRTIO_NET_CTRL_MAC_TABLE_SET, sg))
dev_warn(>dev, "Failed to set MAC filter table.\n");
 
+   rtnl_unlock();
+
kfree(buf);
 }
 
+static void virtnet_set_rx_mode(struct net_device *dev)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+
+   if (vi->rx_mode_work_enabled)
+   schedule_work(>rx_mode_work);
+}
+
 static int virtnet_vlan_rx_add_vid(struct net_device *dev,
   __be16 proto, u16 vid)
 {
@@ -3286,6 +3327,8 @@ static void virtnet_freeze_down(struct virtio_device 
*vdev)
 
/* Make sure no work handler is accessing the device */
flush_work(>config_work);
+   disable_rx_mode_work(vi);
+   flush_work(>rx_mode_work);
 
netif_tx_lock_bh(vi->dev);
netif_device_detach(vi->dev);
@@ -3308,6 +3351,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
virtio_device_ready(vdev);
 
enable_delayed_refill(vi);
+   enable_rx_mode_work(vi);
 
if (netif_running(vi->dev)) {
err = virtnet_open(vi->dev);
@@ -4121,6 +4165,7 @@ static int virtnet_probe(struct virtio_device *vdev)
vdev->priv = vi;
 
INIT_WORK(>config_work, virtnet_config_changed_work);
+   INIT_WORK(>rx_mode_work, virtnet_rx_mode_work);
   

[PATCH net-next v4 0/2] virtio-net: don't busy poll for cvq command

2023-07-20 Thread Jason Wang
Hi all:

The code used to busy poll for cvq command which turns out to have
several side effects:

1) infinite poll for buggy devices
2) bad interaction with scheduler

So this series tries to use cond_resched() in the waiting loop. Before
doing this we need first make sure the cvq command is not executed in
atomic environment, so we need first convert rx mode handling to a
workqueue.

Note that, this doesn't try to solve the case that a malicous device
may block networking stack (RTNL) or break freezer which requries more
thought to gracefully exit and resend the commands.

Please review.

Changes since V3:

- Tweak the comments
- Tweak the changelog to explain the rx mode lost after resuming.
- No functional changes

Changes since V2:

- Don't use interrupt but cond_resched()

Changes since V1:

- use RTNL to synchronize rx mode worker
- use completion for simplicity
- don't try to harden CVQ command

Changes since RFC:

- switch to use BAD_RING in virtio_break_device()
- check virtqueue_is_broken() after being woken up
- use more_used() instead of virtqueue_get_buf() to allow caller to
  get buffers afterwards
- break the virtio-net device when timeout
- get buffer manually since the virtio core check more_used() instead

Jason Wang (2):
  virtio-net: convert rx mode setting to use workqueue
  virtio-net: add cond_resched() to the command waiting loop

 drivers/net/virtio_net.c | 59 +---
 1 file changed, 55 insertions(+), 4 deletions(-)

-- 
2.39.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page

2023-07-20 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 03:41:56PM +0800, Jason Wang wrote:
> > Did you actually check that it works though?
> > Looks like with swiotlb you need to synch to trigger a copy
> > before unmap, and I don't see where it's done in the current
> > patch.
> 
> And this is needed for XDP_REDIRECT as well.

DMA always needs proper syncs, be that for swiotlb or for cache
maintainance, yes.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page

2023-07-20 Thread Jason Wang
On Thu, Jul 20, 2023 at 2:23 PM Christoph Hellwig  wrote:
>
> Hi Jason,
>
> can you please resend your reply with proper quoting?  I had to give
> up after multiple pages of scrolling without finding anything that
> you added to the full quote.

I guess it's this part?

> > > You should also test without iommu but with swiotlb=force
> >
> >
> > For swiotlb, merge DMA has no benefit, because we still need to copy data 
> > from
> > swiotlb buffer to the origin buffer.
> > The benefit of the merge DMA is to reduce the operate to the iommu device.
> >
> > I did some test for this. The result is same.
> >
> > Thanks.
> >
>
> Did you actually check that it works though?
> Looks like with swiotlb you need to synch to trigger a copy
> before unmap, and I don't see where it's done in the current
> patch.

And this is needed for XDP_REDIRECT as well.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-20 Thread Xuan Zhuo
On Wed, 19 Jul 2023 23:57:51 -0700, Christoph Hellwig  
wrote:
> On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
> >  virtqueue_dma_dev() return the device that working with the DMA APIs.
> >  Then that can be used like other devices. So what is the problem.
> >
> >  I always think the code path without the DMA APIs is the trouble for you.
>
> Because we now have an API where the upper level drivers sometimes
> see the dma device and sometimes not.

No dma device is just for the old devices.

The API without DMA dev are only compatible with older devices. We can't give up
these old devices, but we also have to embrace new features.

> This will be abused and cause
> trouble sooner than you can say "layering".

I don't understand what the possible trouble here is.

When no dma device, the driver just does the same thing as before.

Thanks.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-20 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
>  virtqueue_dma_dev() return the device that working with the DMA APIs.
>  Then that can be used like other devices. So what is the problem.
> 
>  I always think the code path without the DMA APIs is the trouble for you.

Because we now have an API where the upper level drivers sometimes
see the dma device and sometimes not.  This will be abused and cause
trouble sooner than you can say "layering".
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-20 Thread Christoph Hellwig
On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > caller can do dma operation in advance. The purpose is to keep memory
> > > mapped across multiple add/get buf operations.
> > 
> > This is just poking holes into the abstraction..
> 
> More specifically?

Because now you expose a device that can't be used for the non-dma
mapping case and shoud be hidden.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page

2023-07-20 Thread Christoph Hellwig
Hi Jason,

can you please resend your reply with proper quoting?  I had to give
up after multiple pages of scrolling without finding anything that
you added to the full quote.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()

2023-07-20 Thread Christoph Hellwig
On Thu, Jul 13, 2023 at 10:47:23AM -0400, Michael S. Tsirkin wrote:
> There are a gazillion virtio drivers and most of them just use the
> virtio API, without bothering with these micro-optimizations.  virtio
> already tracks addresses so mapping/unmapping them for DMA is easier
> done in the core.  It's only networking and only with XDP where the
> difference becomes measureable.

Yes, but now you two differing code paths (which then branch into
another two with the fake DMA mappings).  I'm really worried about
the madness that follows like the USB dma mapping code that is a
constant soure of trouble.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-20 Thread Xuan Zhuo
On Wed, 19 Jul 2023 23:22:42 -0700, Christoph Hellwig  
wrote:
> On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > mapped across multiple add/get buf operations.
> > >
> > > This is just poking holes into the abstraction..
> >
> > More specifically?
>
> Because now you expose a device that can't be used for the non-dma
> mapping case and shoud be hidden.

 Sorry I can not got.

 virtqueue_dma_dev() return the device that working with the DMA APIs.
 Then that can be used like other devices. So what is the problem.

 I always think the code path without the DMA APIs is the trouble for you.

 Thanks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization