Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-10 Thread Michael S. Tsirkin
On Fri, Aug 11, 2023 at 10:23:15AM +0800, Jason Wang wrote:
> On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin  wrote:
> >
> > On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> > > On Mon, Jul 31, 2023 at 2:30 PM Jason Wang  wrote:
> > > >
> > > > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > > > > They really shouldn't - any NIC that takes forever to
> > > > > > > program will create issues in the networking stack.
> > > > > >
> > > > > > Unfortunately, it's not rare as the device/cvq could be implemented
> > > > > > via firmware or software.
> > > > >
> > > > > Currently that mean one either has sane firmware with a scheduler that
> > > > > can meet deadlines, or loses ability to report errors back.
> > > > >
> > > > > > > But if they do they can always set this flag too.
> > > > > >
> > > > > > This may have false negatives and may confuse the management.
> > > > > >
> > > > > > Maybe we can extend the networking core to allow some device 
> > > > > > specific
> > > > > > configurations to be done with device specific lock without rtnl. 
> > > > > > For
> > > > > > example, split the set_channels to
> > > > > >
> > > > > > pre_set_channels
> > > > > > set_channels
> > > > > > post_set_channels
> > > > > >
> > > > > > The device specific part could be done in pre and post without a 
> > > > > > rtnl lock?
> > > > > >
> > > > > > Thanks
> > > > >
> > > > >
> > > > > Would the benefit be that errors can be reported to userspace then?
> > > > > Then maybe.  I think you will have to show how this works for at least
> > > > > one card besides virtio.
> > > >
> > > > Even for virtio, this seems not easy, as e.g the
> > > > virtnet_send_command() and netif_set_real_num_tx_queues() need to
> > > > appear to be atomic to the networking core.
> > > >
> > > > I wonder if we can re-consider the way of a timeout here and choose a
> > > > sane value as a start.
> > >
> > > Michael, any more input on this?
> > >
> > > Thanks
> >
> > I think this is just mission creep. We are trying to fix
> > vduse - let's do that for starters.
> >
> > Recovering from firmware timeouts is far from trivial and
> > just assuming that just because it timed out it will not
> > access memory is just as likely to cause memory corruption
> > with worse results than an infinite spin.
> 
> Yes, this might require support not only in the driver
> 
> >
> > I propose we fix this for vduse and assume hardware/firmware
> > is well behaved.
> 
> One major case is the re-connection, in that case it might take
> whatever longer that the kernel virito-net driver expects.
> So we can have a timeout in VDUSE and trap CVQ then VDUSE can return
> and fail early?

Ugh more mission creep. not at all my point. vduse should cache
values in the driver, until someone manages to change
net core to be more friendly to userspace devices.

> 
> > Or maybe not well behaved firmware will
> > set the flag losing error reporting ability.
> 
> This might be hard since it means not only the set but also the get is
> unreliable.
> 
> Thanks

/me shrugs



> >
> >
> >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-10 Thread Jason Wang
On Fri, Aug 11, 2023 at 3:41 AM Michael S. Tsirkin  wrote:
>
> On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> > On Mon, Jul 31, 2023 at 2:30 PM Jason Wang  wrote:
> > >
> > > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > > > They really shouldn't - any NIC that takes forever to
> > > > > > program will create issues in the networking stack.
> > > > >
> > > > > Unfortunately, it's not rare as the device/cvq could be implemented
> > > > > via firmware or software.
> > > >
> > > > Currently that mean one either has sane firmware with a scheduler that
> > > > can meet deadlines, or loses ability to report errors back.
> > > >
> > > > > > But if they do they can always set this flag too.
> > > > >
> > > > > This may have false negatives and may confuse the management.
> > > > >
> > > > > Maybe we can extend the networking core to allow some device specific
> > > > > configurations to be done with device specific lock without rtnl. For
> > > > > example, split the set_channels to
> > > > >
> > > > > pre_set_channels
> > > > > set_channels
> > > > > post_set_channels
> > > > >
> > > > > The device specific part could be done in pre and post without a rtnl 
> > > > > lock?
> > > > >
> > > > > Thanks
> > > >
> > > >
> > > > Would the benefit be that errors can be reported to userspace then?
> > > > Then maybe.  I think you will have to show how this works for at least
> > > > one card besides virtio.
> > >
> > > Even for virtio, this seems not easy, as e.g the
> > > virtnet_send_command() and netif_set_real_num_tx_queues() need to
> > > appear to be atomic to the networking core.
> > >
> > > I wonder if we can re-consider the way of a timeout here and choose a
> > > sane value as a start.
> >
> > Michael, any more input on this?
> >
> > Thanks
>
> I think this is just mission creep. We are trying to fix
> vduse - let's do that for starters.
>
> Recovering from firmware timeouts is far from trivial and
> just assuming that just because it timed out it will not
> access memory is just as likely to cause memory corruption
> with worse results than an infinite spin.

Yes, this might require support not only in the driver

>
> I propose we fix this for vduse and assume hardware/firmware
> is well behaved.

One major case is the re-connection, in that case it might take
whatever longer that the kernel virito-net driver expects.

So we can have a timeout in VDUSE and trap CVQ then VDUSE can return
and fail early?

> Or maybe not well behaved firmware will
> set the flag losing error reporting ability.

This might be hard since it means not only the set but also the get is
unreliable.

Thanks

>
>
>
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > --
> > > > MST
> > > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_mmio: add suspend and resume calls for virtio_mmio devices

2023-08-10 Thread Jason Wang
On Fri, Aug 11, 2023 at 3:45 AM Michael S. Tsirkin  wrote:
>
> On Fri, Jul 28, 2023 at 12:31:27PM +0530, Anvesh Jain P wrote:
> > Handle suspend and resume calls for virtio mmio devices from
> > PM core. Expose these notifications to virtio drivers that can quiesce and
> > resume vq operations. Update virtio pm ops to handle freeze& restore and
> > suspend & resume callbacks separately.
> >
> > Signed-off-by: Anvesh Jain P 
> > Signed-off-by: Venkata Rao Kakani 
>
> okey but what is the motivation? does this addition of 200 LOC
> achieve something new? what is the issue fixed here?

+1

And I think we need an example that can use the new ops.

Thanks

>
>
> > ---
> >  drivers/virtio/virtio.c  | 112 +++
> >  drivers/virtio/virtio_mmio.c |  50 +++-
> >  include/linux/virtio.h   |  12 
> >  3 files changed, 173 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 3893dc29eb26..c6f25a267600 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -551,6 +551,118 @@ int virtio_device_restore(struct virtio_device *dev)
> >   return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_device_restore);
> > +
> > +int virtio_device_suspend(struct virtio_device *dev)
> > +{
> > + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > +
> > + virtio_config_disable(dev);
> > +
> > + dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> > +
> > + if (drv && drv->suspend)
> > + return drv->suspend(dev);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_device_suspend);
> > +
> > +int virtio_device_resume(struct virtio_device *dev)
> > +{
> > + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > + int ret;
> > +
> > + if (drv && drv->resume) {
> > + ret = drv->resume(dev);
> > + if (ret)
> > + goto err;
> > +
> > + if (!(dev->config->get_status(dev) & 
> > VIRTIO_CONFIG_S_DRIVER_OK))
> > + virtio_device_ready(dev);
> > +
> > + virtio_config_enable(dev);
> > + }
> > +
> > + return 0;
> > +err:
> > + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_device_resume);
> > +
> > +int virtio_device_suspend_late(struct virtio_device *dev)
> > +{
> > + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > +
> > + virtio_config_disable(dev);
> > +
> > + dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> > +
> > + if (drv && drv->suspend_late)
> > + return drv->suspend_late(dev);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_device_suspend_late);
> > +
> > +int virtio_device_resume_early(struct virtio_device *dev)
> > +{
> > + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > + int ret;
> > +
> > + if (drv && drv->resume_early) {
> > + ret = drv->resume_early(dev);
> > + if (ret)
> > + goto err;
> > + if (!(dev->config->get_status(dev) & 
> > VIRTIO_CONFIG_S_DRIVER_OK))
> > + virtio_device_ready(dev);
> > +
> > + virtio_config_enable(dev);
> > + }
> > +
> > + return 0;
> > +err:
> > + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_device_resume_early);
> > +
> > +int virtio_device_suspend_noirq(struct virtio_device *dev)
> > +{
> > + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > +
> > + virtio_config_disable(dev);
> > +
> > + dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> > +
> > + if (drv && drv->suspend_noirq)
> > + return drv->suspend_noirq(dev);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_device_suspend_noirq);
> > +
> > +int virtio_device_resume_noirq(struct virtio_device *dev)
> > +{
> > + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > + int ret;
> > +
> > + if (drv && drv->resume_noirq) {
> > + ret = drv->resume_noirq(dev);
> > + if (ret)
> > + goto err;
> > + if (!(dev->config->get_status(dev) & 
> > VIRTIO_CONFIG_S_DRIVER_OK))
> > + virtio_device_ready(dev);
> > +
> > + virtio_config_enable(dev);
> > + }
> > +
> > + return 0;
> > +err:
> > + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_device_resume_noirq);
> >  #endif
> >
> >  static int virtio_init(void)
> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > index a46a4a29e929..9385c7e65da9 100644
> > --- a/drivers/virtio/virtio_mmio.c
> > +++ b/drivers/virtio/virtio_mmio.c
> > @@ -596,9 +596,57 @@ static int virtio_mmio_restore(struct device 

[mst-vhost:vhost 34/46] drivers/vdpa/vdpa_user/vduse_dev.c:1812:23: error: use of undeclared identifier 'VIRTIO_RING_F_INDIRECT_DESC'

2023-08-10 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost
head:   bb59e1f960bd07f70a4b3d8de99bfd8d71835199
commit: 334f48a83105ebe129a660d1ea1a0c29f87d50c7 [34/46] vduse: Temporarily 
disable control queue features
config: x86_64-buildonly-randconfig-r001-20230811 
(https://download.01.org/0day-ci/archive/20230811/202308110712.wcqoog00-...@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git 
ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce: 
(https://download.01.org/0day-ci/archive/20230811/202308110712.wcqoog00-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308110712.wcqoog00-...@intel.com/

All errors (new ones prefixed by >>):

>> drivers/vdpa/vdpa_user/vduse_dev.c:1812:23: error: use of undeclared 
>> identifier 'VIRTIO_RING_F_INDIRECT_DESC'
   config->features &= VDUSE_NET_VALID_FEATURES_MASK;
   ^
   drivers/vdpa/vdpa_user/vduse_dev.c:66:11: note: expanded from macro 
'VDUSE_NET_VALID_FEATURES_MASK'
BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) | \
^
>> drivers/vdpa/vdpa_user/vduse_dev.c:1812:23: error: use of undeclared 
>> identifier 'VIRTIO_F_EVENT_IDX'
   drivers/vdpa/vdpa_user/vduse_dev.c:67:11: note: expanded from macro 
'VDUSE_NET_VALID_FEATURES_MASK'
BIT_ULL(VIRTIO_F_EVENT_IDX) |  \
^
>> drivers/vdpa/vdpa_user/vduse_dev.c:1812:23: error: use of undeclared 
>> identifier 'VIRTIO_F_IOMMU_PLATFORM'
   drivers/vdpa/vdpa_user/vduse_dev.c:69:11: note: expanded from macro 
'VDUSE_NET_VALID_FEATURES_MASK'
BIT_ULL(VIRTIO_F_IOMMU_PLATFORM) | \
^
   drivers/vdpa/vdpa_user/vduse_dev.c:2007:51: warning: shift count >= width of 
type [-Wshift-count-overflow]
   ret = dma_set_mask_and_coherent(>vdpa.dev, DMA_BIT_MASK(64));
^~~~
   include/linux/dma-mapping.h:77:54: note: expanded from macro 'DMA_BIT_MASK'
   #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
^ ~~~
   1 warning and 3 errors generated.


vim +/VIRTIO_RING_F_INDIRECT_DESC +1812 drivers/vdpa/vdpa_user/vduse_dev.c

  1804  
  1805  static void vduse_dev_features_filter(struct vduse_dev_config *config)
  1806  {
  1807  /*
  1808   * Temporarily filter out virtio-net's control virtqueue and 
features
  1809   * that depend on it while CVQ is being made more robust for 
VDUSE.
  1810   */
  1811  if (config->device_id == VIRTIO_ID_NET)
> 1812  config->features &= VDUSE_NET_VALID_FEATURES_MASK;
  1813  }
  1814  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] vdpa/mlx5: Fix mr->initialized semantics

2023-08-10 Thread Si-Wei Liu



On 8/9/2023 8:10 PM, Jason Wang wrote:

On Thu, Aug 10, 2023 at 8:40 AM Si-Wei Liu  wrote:



On 8/8/2023 11:52 PM, Jason Wang wrote:

On Wed, Aug 9, 2023 at 6:58 AM Si-Wei Liu  wrote:


On 8/7/2023 8:00 PM, Jason Wang wrote:

On Fri, Aug 4, 2023 at 1:58 AM Si-Wei Liu  wrote:

On 8/3/2023 1:03 AM, Jason Wang wrote:

On Thu, Aug 3, 2023 at 1:13 AM Dragos Tatulea  wrote:

The mr->initialized flag is shared between the control vq and data vq
part of the mr init/uninit. But if the control vq and data vq get placed
in different ASIDs, it can happen that initializing the control vq will
prevent the data vq mr from being initialized.

This patch consolidates the control and data vq init parts into their
own init functions. The mr->initialized will now be used for the data vq
only. The control vq currently doesn't need a flag.

The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
split into data and control vq functions which are now also ASID aware.

Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and 
data")
Signed-off-by: Dragos Tatulea 
Reviewed-by: Eugenio Pérez 
Reviewed-by: Gal Pressman 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
 drivers/vdpa/mlx5/core/mr.c| 97 +-
 2 files changed, 71 insertions(+), 27 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 25fc4120b618..a0420be5059f 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -31,6 +31,7 @@ struct mlx5_vdpa_mr {
struct list_head head;
unsigned long num_directs;
unsigned long num_klms;
+   /* state of dvq mr */
bool initialized;

/* serialize mkey creation and destruction */
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 03e543229791..4ae14a248a4b 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -489,60 +489,103 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_mr *mr
}
 }

-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
+static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned 
int asid)
+{
+   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
+   return;
+
+   prune_iotlb(mvdev);
+}
+
+static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned 
int asid)
 {
struct mlx5_vdpa_mr *mr = >mr;

-   mutex_lock(>mkey_mtx);
+   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
+   return;
+
if (!mr->initialized)
-   goto out;
+   return;

-   prune_iotlb(mvdev);
if (mr->user_mr)
destroy_user_mr(mvdev, mr);
else
destroy_dma_mr(mvdev, mr);

mr->initialized = false;
-out:
+}
+
+static void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned 
int asid)
+{
+   struct mlx5_vdpa_mr *mr = >mr;
+
+   mutex_lock(>mkey_mtx);
+
+   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
+   _mlx5_vdpa_destroy_cvq_mr(mvdev, asid);
+
mutex_unlock(>mkey_mtx);
 }

-static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
-   struct vhost_iotlb *iotlb, unsigned int asid)
+void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
+{
+   mlx5_vdpa_destroy_mr_asid(mvdev, 
mvdev->group2asid[MLX5_VDPA_CVQ_GROUP]);
+   mlx5_vdpa_destroy_mr_asid(mvdev, 
mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]);
+}
+
+static int _mlx5_vdpa_create_cvq_mr(struct mlx5_vdpa_dev *mvdev,
+   struct vhost_iotlb *iotlb,
+   unsigned int asid)
+{
+   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
+   return 0;
+
+   return dup_iotlb(mvdev, iotlb);

This worries me as conceptually, there should be no difference between
dvq mr and cvq mr. The virtqueue should be loosely coupled with mr.

One example is that, if we only do dup_iotlb() but not try to create
dma mr here, we will break virtio-vdpa:

For this case, I guess we may need another way to support virtio-vdpa
1:1 mapping rather than overloading virtio device reset semantics, see:

https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html

> Conceptually, the address mapping is not a part of the abstraction for
> a virtio device now. So resetting the memory mapping during virtio
> device reset seems wrong.

where we want to keep memory mapping intact across virtio device reset
for best live migration latency/downtime. I wonder would it work to
reset the mapping in vhost-vdpa life cycle out of virtio reset, say
introduce a .reset_map() op to restore 1:1 mapping within
vhost_vdpa_remove_as() right after vhost_vdpa_iotlb_unmap()? Then we can
move the iotlb reset logic to there without worry breaking 

Re: [PATCH v3 0/3] vduse: add support for networking devices

2023-08-10 Thread Michael S. Tsirkin
On Thu, Aug 10, 2023 at 02:29:49PM -0700, Jakub Kicinski wrote:
> On Thu, 10 Aug 2023 15:04:27 -0400 Michael S. Tsirkin wrote:
> > Another question is that with this userspace can inject
> > packets directly into net stack. Should we check CAP_NET_ADMIN
> > or such?
> 
> Directly into the stack? I thought VDUSE is vDPA in user space,
> meaning to get to the kernel the packet has to first go thru 
> a virtio-net instance.

yes. is that a sufficient filter in your opinion?

> Or you mean directly into the network?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: a new vcpu watchdog driver

2023-08-10 Thread Michael S. Tsirkin
On Mon, Jul 31, 2023 at 09:25:12AM +0800, zhanghao1 wrote:
> A new virtio pci driver is added for listening to vcpus
> inside guest. Each vcpu creates a corresponding thread to
> periodically send data to qemu's back-end watchdog device.
> If a vCPU is in the stall state, data cannot be sent to
> back-end virtio device. As a result, the back-end device
> can detect that the guest is in the stall state.
> 
> The driver is mainly used with the back-end watchdog device of qemu.
> 
> The qemu backend watchdog device is implemented as follow:
> https://lore.kernel.org/qemu-devel/20230705081813.411526-1-zhangh...@kylinos.cn/
> 
> Signed-off-by: zhanghao1 
> ---
>  drivers/virtio/Kconfig  |   9 +
>  drivers/virtio/Makefile |   1 +
>  drivers/virtio/virtio_vcpu_stall_detector.c | 299 
>  3 files changed, 309 insertions(+)
>  create mode 100644 drivers/virtio/virtio_vcpu_stall_detector.c
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 0a53a61231c2..869323e345a1 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -173,4 +173,13 @@ config VIRTIO_DMA_SHARED_BUFFER
>This option adds a flavor of dma buffers that are backed by
>virtio resources.
>  
> +config VIRTIO_VCPU_WATCHDOG
> + tristate "Virtio vcpu watchdog driver"
> + depends on VIRTIO_PCI
> + help
> +  When this driver is bound inside a KVM guest, it will
> +  periodically "pet" an PCI virtio watchdog device from each vCPU
> +  and allow the host to detect vCPU stalls.
> +
> +  If you do not intend to run this kernel as a guest, say N.
>  endif # VIRTIO_MENU
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 8e98d24917cc..c7341f078a34 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -12,3 +12,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>  obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
>  obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
>  obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
> +obj-$(CONFIG_VIRTIO_VCPU_WATCHDOG) += virtio_vcpu_stall_detector.o
> diff --git a/drivers/virtio/virtio_vcpu_stall_detector.c 
> b/drivers/virtio/virtio_vcpu_stall_detector.c
> new file mode 100644
> index ..58344ca528be
> --- /dev/null
> +++ b/drivers/virtio/virtio_vcpu_stall_detector.c
> @@ -0,0 +1,299 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +//
> +// VCPU stall detector.
> +// Copyright (C) Kylin Software, 2023
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define VCPU_STALL_REG_STATUS(0x00)
> +#define VCPU_STALL_REG_LOAD_CNT  (0x04)
> +#define VCPU_STALL_REG_CURRENT_CNT   (0x08)
> +#define VCPU_STALL_REG_CLOCK_FREQ_HZ (0x0C)
> +#define VCPU_STALL_REG_LEN   (0x10)
> +#define VCPU_STALL_REG_TIMEOUT_SEC   (0x14)
> +
> +#define VCPU_STALL_DEFAULT_CLOCK_HZ  (10)
> +#define VCPU_STALL_MAX_CLOCK_HZ  (100)
> +#define VCPU_STALL_DEFAULT_TIMEOUT_SEC   (8)
> +#define VCPU_STALL_MAX_TIMEOUT_SEC   (600)
> +
> +struct vcpu_stall_detect_config {
> + u32 clock_freq_hz;
> + u32 stall_timeout_sec;
> +
> + enum cpuhp_state hp_online;
> +};
> +
> +struct vcpu_stall_priv {
> + struct hrtimer vcpu_hrtimer;
> + struct virtio_device *vdev;
> + u32 cpu_id;
> +};
> +
> +struct vcpu_stall {
> + struct vcpu_stall_priv *priv;
> + struct virtqueue *vq;
> + spinlock_t lock;
> + struct pet_event {
> + u32 cpu_id;
> + bool is_initialized;
> + u32 ticks;
> + } pet_event;


should all be LE. and formatting here is very compiler dependent.
also put this in a header under uapi/

> +};
> +
> +static const struct virtio_device_id vcpu_stall_id_table[] = {
> + { VIRTIO_ID_WATCHDOG, VIRTIO_DEV_ANY_ID },
> + { 0, },
> +};
> +
> +/* The vcpu stall configuration structure which applies to all the CPUs */
> +static struct vcpu_stall_detect_config vcpu_stall_config;
> +static struct vcpu_stall *vcpu_stall;
> +
> +static struct vcpu_stall_priv __percpu *vcpu_stall_detectors;
> +
> +static enum hrtimer_restart
> +vcpu_stall_detect_timer_fn(struct hrtimer *hrtimer)
> +{
> + u32 ticks, ping_timeout_ms;
> + struct scatterlist sg;
> + int unused, err = 0;
> +
> + struct vcpu_stall_priv *vcpu_stall_detector =
> + this_cpu_ptr(vcpu_stall->priv);
> +
> + /* Reload the stall detector counter register every
> +  * `ping_timeout_ms` to prevent the virtual device
> +  * from decrementing it to 0. The virtual device decrements this
> +  * register at 'clock_freq_hz' frequency.
> +  */
> + ticks = vcpu_stall_config.clock_freq_hz *
> + vcpu_stall_config.stall_timeout_sec;
> +
> + spin_lock(_stall->lock);
> + while (virtqueue_get_buf(vcpu_stall->vq, ))

Re: [PATCH] A new virtio pci driver is added for listening to vcpus inside guest. Each vcpu creates a corresponding thread to periodically send data to qemu's back-end watchdog device.

2023-08-10 Thread kernel test robot
Hi zhanghao1,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.5-rc5 next-20230809]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/zhanghao1/A-new-virtio-pci-driver-is-added-for-listening-to-vcpus-inside-guest-Each-vcpu-creates-a-corresponding-thread-to-periodi/20230731-092546
base:   linus/master
patch link:
https://lore.kernel.org/r/20230731012405.234611-1-zhanghao1%40kylinos.cn
patch subject: [PATCH] A new virtio pci driver is added for listening to vcpus 
inside guest. Each vcpu creates a corresponding thread to periodically send 
data to qemu's back-end watchdog device.
config: sparc64-randconfig-r071-20230811 
(https://download.01.org/0day-ci/archive/20230811/202308110442.0txlneks-...@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 12.3.0
reproduce: 
(https://download.01.org/0day-ci/archive/20230811/202308110442.0txlneks-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308110442.0txlneks-...@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/virtio/virtio_vcpu_stall_detector.c:76:17: sparse: sparse: incorrect 
>> type in initializer (different address spaces) @@ expected void const 
>> [noderef] __percpu *__vpp_verify @@ got struct vcpu_stall_priv * @@
   drivers/virtio/virtio_vcpu_stall_detector.c:76:17: sparse: expected void 
const [noderef] __percpu *__vpp_verify
   drivers/virtio/virtio_vcpu_stall_detector.c:76:17: sparse: got struct 
vcpu_stall_priv *
>> drivers/virtio/virtio_vcpu_stall_detector.c:89:37: sparse: sparse: incorrect 
>> type in assignment (different base types) @@ expected unsigned int 
>> [usertype] ticks @@ got restricted __virtio32 @@
   drivers/virtio/virtio_vcpu_stall_detector.c:89:37: sparse: expected 
unsigned int [usertype] ticks
   drivers/virtio/virtio_vcpu_stall_detector.c:89:37: sparse: got 
restricted __virtio32
   drivers/virtio/virtio_vcpu_stall_detector.c:117:17: sparse: sparse: 
incorrect type in initializer (different address spaces) @@ expected void 
const [noderef] __percpu *__vpp_verify @@ got struct vcpu_stall_priv * @@
   drivers/virtio/virtio_vcpu_stall_detector.c:117:17: sparse: expected 
void const [noderef] __percpu *__vpp_verify
   drivers/virtio/virtio_vcpu_stall_detector.c:117:17: sparse: got struct 
vcpu_stall_priv *
   drivers/virtio/virtio_vcpu_stall_detector.c:129:37: sparse: sparse: 
incorrect type in assignment (different base types) @@ expected unsigned 
int [usertype] ticks @@ got restricted __virtio32 @@
   drivers/virtio/virtio_vcpu_stall_detector.c:129:37: sparse: expected 
unsigned int [usertype] ticks
   drivers/virtio/virtio_vcpu_stall_detector.c:129:37: sparse: got 
restricted __virtio32
>> drivers/virtio/virtio_vcpu_stall_detector.c:193:26: sparse: sparse: 
>> incorrect type in assignment (different address spaces) @@ expected 
>> struct vcpu_stall_priv *priv @@ got struct vcpu_stall_priv [noderef] 
>> __percpu * @@
   drivers/virtio/virtio_vcpu_stall_detector.c:193:26: sparse: expected 
struct vcpu_stall_priv *priv
   drivers/virtio/virtio_vcpu_stall_detector.c:193:26: sparse: got struct 
vcpu_stall_priv [noderef] __percpu *
   drivers/virtio/virtio_vcpu_stall_detector.c:203:24: sparse: sparse: 
incorrect type in initializer (different address spaces) @@ expected void 
const [noderef] __percpu *__vpp_verify @@ got struct vcpu_stall_priv * @@
   drivers/virtio/virtio_vcpu_stall_detector.c:203:24: sparse: expected 
void const [noderef] __percpu *__vpp_verify
   drivers/virtio/virtio_vcpu_stall_detector.c:203:24: sparse: got struct 
vcpu_stall_priv *
>> drivers/virtio/virtio_vcpu_stall_detector.c:207:15: sparse: sparse: no 
>> generic selection for 'unsigned int virtio_cread_v'
>> drivers/virtio/virtio_vcpu_stall_detector.c:207:15: sparse: sparse: 
>> incompatible types in comparison expression (different base types):
>> drivers/virtio/virtio_vcpu_stall_detector.c:207:15: sparse:bad type *
>> drivers/virtio/virtio_vcpu_stall_detector.c:207:15: sparse:unsigned int *
>> drivers/virtio/virtio_vcpu_stall_detector.c:207:15: sparse: sparse: no 
>> generic selection for 'unsigned int [addressable] virtio_cread_v'
   drivers/virtio/virtio_vcpu_stall_detector.c:217:15: sparse: sparse: no 
generic selection for 'unsigned int virtio_cread_v'
   drivers/virtio/virtio_vcpu_stall_detector.c:217:15: sparse: sparse: 
incompatible types in comparison expression (different base types):
   drivers/virtio/virtio_vcpu_stall_detector.c:217:15: 

Re: [PATCH] virtio_mmio: add suspend and resume calls for virtio_mmio devices

2023-08-10 Thread Michael S. Tsirkin
On Fri, Jul 28, 2023 at 12:31:27PM +0530, Anvesh Jain P wrote:
> Handle suspend and resume calls for virtio mmio devices from
> PM core. Expose these notifications to virtio drivers that can quiesce and
> resume vq operations. Update virtio pm ops to handle freeze& restore and
> suspend & resume callbacks separately.
> 
> Signed-off-by: Anvesh Jain P 
> Signed-off-by: Venkata Rao Kakani 

okey but what is the motivation? does this addition of 200 LOC
achieve something new? what is the issue fixed here?


> ---
>  drivers/virtio/virtio.c  | 112 +++
>  drivers/virtio/virtio_mmio.c |  50 +++-
>  include/linux/virtio.h   |  12 
>  3 files changed, 173 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 3893dc29eb26..c6f25a267600 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -551,6 +551,118 @@ int virtio_device_restore(struct virtio_device *dev)
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(virtio_device_restore);
> +
> +int virtio_device_suspend(struct virtio_device *dev)
> +{
> + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> +
> + virtio_config_disable(dev);
> +
> + dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> +
> + if (drv && drv->suspend)
> + return drv->suspend(dev);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(virtio_device_suspend);
> +
> +int virtio_device_resume(struct virtio_device *dev)
> +{
> + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> + int ret;
> +
> + if (drv && drv->resume) {
> + ret = drv->resume(dev);
> + if (ret)
> + goto err;
> +
> + if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> + virtio_device_ready(dev);
> +
> + virtio_config_enable(dev);
> + }
> +
> + return 0;
> +err:
> + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_device_resume);
> +
> +int virtio_device_suspend_late(struct virtio_device *dev)
> +{
> + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> +
> + virtio_config_disable(dev);
> +
> + dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> +
> + if (drv && drv->suspend_late)
> + return drv->suspend_late(dev);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(virtio_device_suspend_late);
> +
> +int virtio_device_resume_early(struct virtio_device *dev)
> +{
> + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> + int ret;
> +
> + if (drv && drv->resume_early) {
> + ret = drv->resume_early(dev);
> + if (ret)
> + goto err;
> + if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> + virtio_device_ready(dev);
> +
> + virtio_config_enable(dev);
> + }
> +
> + return 0;
> +err:
> + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_device_resume_early);
> +
> +int virtio_device_suspend_noirq(struct virtio_device *dev)
> +{
> + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> +
> + virtio_config_disable(dev);
> +
> + dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> +
> + if (drv && drv->suspend_noirq)
> + return drv->suspend_noirq(dev);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(virtio_device_suspend_noirq);
> +
> +int virtio_device_resume_noirq(struct virtio_device *dev)
> +{
> + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> + int ret;
> +
> + if (drv && drv->resume_noirq) {
> + ret = drv->resume_noirq(dev);
> + if (ret)
> + goto err;
> + if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> + virtio_device_ready(dev);
> +
> + virtio_config_enable(dev);
> + }
> +
> + return 0;
> +err:
> + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtio_device_resume_noirq);
>  #endif
>  
>  static int virtio_init(void)
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index a46a4a29e929..9385c7e65da9 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -596,9 +596,57 @@ static int virtio_mmio_restore(struct device *dev)
>  
>   return virtio_device_restore(_dev->vdev);
>  }
> +static int virtio_mmio_suspend(struct device *dev)
> +{
> +   struct virtio_mmio_device *vm_dev = dev_get_drvdata(dev);
> +
> +   return virtio_device_suspend(_dev->vdev);
> +}
> +
> +static int virtio_mmio_resume(struct device *dev)
> +{
> +   struct virtio_mmio_device *vm_dev = dev_get_drvdata(dev);
> +
> +   return 

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-08-10 Thread Michael S. Tsirkin
On Tue, Aug 08, 2023 at 10:30:56AM +0800, Jason Wang wrote:
> On Mon, Jul 31, 2023 at 2:30 PM Jason Wang  wrote:
> >
> > On Thu, Jul 27, 2023 at 5:46 PM Michael S. Tsirkin  wrote:
> > >
> > > On Thu, Jul 27, 2023 at 04:59:33PM +0800, Jason Wang wrote:
> > > > > They really shouldn't - any NIC that takes forever to
> > > > > program will create issues in the networking stack.
> > > >
> > > > Unfortunately, it's not rare as the device/cvq could be implemented
> > > > via firmware or software.
> > >
> > > Currently that mean one either has sane firmware with a scheduler that
> > > can meet deadlines, or loses ability to report errors back.
> > >
> > > > > But if they do they can always set this flag too.
> > > >
> > > > This may have false negatives and may confuse the management.
> > > >
> > > > Maybe we can extend the networking core to allow some device specific
> > > > configurations to be done with device specific lock without rtnl. For
> > > > example, split the set_channels to
> > > >
> > > > pre_set_channels
> > > > set_channels
> > > > post_set_channels
> > > >
> > > > The device specific part could be done in pre and post without a rtnl 
> > > > lock?
> > > >
> > > > Thanks
> > >
> > >
> > > Would the benefit be that errors can be reported to userspace then?
> > > Then maybe.  I think you will have to show how this works for at least
> > > one card besides virtio.
> >
> > Even for virtio, this seems not easy, as e.g the
> > virtnet_send_command() and netif_set_real_num_tx_queues() need to
> > appear to be atomic to the networking core.
> >
> > I wonder if we can re-consider the way of a timeout here and choose a
> > sane value as a start.
> 
> Michael, any more input on this?
> 
> Thanks

I think this is just mission creep. We are trying to fix
vduse - let's do that for starters.

Recovering from firmware timeouts is far from trivial and
just assuming that just because it timed out it will not
access memory is just as likely to cause memory corruption
with worse results than an infinite spin.

I propose we fix this for vduse and assume hardware/firmware
is well behaved. Or maybe not well behaved firmware will
set the flag losing error reporting ability.



> >
> > Thanks
> >
> > >
> > >
> > > --
> > > MST
> > >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 04/14] virtio-blk: limit max allowed submit queues

2023-08-10 Thread Michael S. Tsirkin
On Tue, Aug 08, 2023 at 06:42:29PM +0800, Ming Lei wrote:
> Take blk-mq's knowledge into account for calculating io queues.
> 
> Fix wrong queue mapping in case of kdump kernel.
> 
> On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
> `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
> still returns all CPUs because 'maxcpus=1' just bring up one single
> cpu core during booting.
> 
> blk-mq sees single queue in kdump kernel, and in driver's viewpoint
> there are still multiple queues, this inconsistency causes driver to apply
> wrong queue mapping for handling IO, and IO timeout is triggered.
> 
> Meantime, single queue makes much less resource utilization, and reduce
> risk of kernel failure.
> 
> Cc: Michael S. Tsirkin 
> Cc: Jason Wang 
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Ming Lei 

superficially:

Acked-by: Michael S. Tsirkin 

but this patch only makes sense if the rest of patchset is merged.
feel free to merge directly.

> ---
>  drivers/block/virtio_blk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 1fe011676d07..4ba79fe2a1b4 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -1047,7 +1047,8 @@ static int init_vq(struct virtio_blk *vblk)
>  
>   num_poll_vqs = min_t(unsigned int, poll_queues, num_vqs - 1);
>  
> - vblk->io_queues[HCTX_TYPE_DEFAULT] = num_vqs - num_poll_vqs;
> + vblk->io_queues[HCTX_TYPE_DEFAULT] = min_t(unsigned,
> + num_vqs - num_poll_vqs, blk_mq_max_nr_hw_queues());
>   vblk->io_queues[HCTX_TYPE_READ] = 0;
>   vblk->io_queues[HCTX_TYPE_POLL] = num_poll_vqs;
>  
> -- 
> 2.40.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_ring: fix avail_wrap_counter in virtqueue_add_packed

2023-08-10 Thread Michael S. Tsirkin
On Tue, Aug 08, 2023 at 07:05:38PM +0900, Yuan Yao wrote:
> Sorry, but please ignore this thread.

ok, dropped.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/3] vduse: add support for networking devices

2023-08-10 Thread Michael S. Tsirkin
On Wed, Jul 05, 2023 at 12:04:27PM +0200, Maxime Coquelin wrote:
> This small series enables virtio-net device type in VDUSE.
> With it, basic operation have been tested, both with
> virtio-vdpa and vhost-vdpa using DPDK Vhost library series
> adding VDUSE support using split rings layout (merged in
> DPDK v23.07-rc1).
> 
> Control queue support (and so multiqueue) has also been
> tested, but requires a Kernel series from Jason Wang
> relaxing control queue polling [1] to function reliably,
> so while Jason rework is done, a patch is added to disable
> CVQ and features that depend on it (tested also with DPDK
> v23.07-rc1).


So I can put this in next, the issue I think is
that of security: currently selinux can if necessary block
access to creating virtio block devices.
But if we have more than one type we need a way for selinux to
block specific types. Can be a patch on top but pls work to
address.

Another question is that with this userspace can inject
packets directly into net stack. Should we check CAP_NET_ADMIN
or such?



> [1]: 
> https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/
> 
> v2 -> v3 changes:
> =
> - Use allow list instead of deny list (Michael)
> 
> v1 -> v2 changes:
> =
> - Add a patch to disable CVQ (Michael)
> 
> RFC -> v1 changes:
> ==
> - Fail device init if it does not support VERSION_1 (Jason)
> 
> Maxime Coquelin (3):
>   vduse: validate block features only with block devices
>   vduse: enable Virtio-net device type
>   vduse: Temporarily disable control queue features
> 
>  drivers/vdpa/vdpa_user/vduse_dev.c | 51 +++---
>  1 file changed, 47 insertions(+), 4 deletions(-)
> 
> -- 
> 2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 0/2] vduse: add support for networking devices

2023-08-10 Thread Michael S. Tsirkin
On Tue, Jun 27, 2023 at 01:36:50PM +0200, Maxime Coquelin wrote:
> This small series enables virtio-net device type in VDUSE.
> With it, basic operation have been tested, both with
> virtio-vdpa and vhost-vdpa using DPDK Vhost library series
> adding VDUSE support using split rings layout (merged in
> DPDK v23.07-rc1).
> 
> Control queue support (and so multiqueue) has also been
> tested, but requires a Kernel series from Jason Wang
> relaxing control queue polling [1] to function reliably.
> 
> [1]: 
> https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/
> 
> RFC -> v1 changes:
> ==
> - Fail device init if it does not support VERSION_1 (Jason)

So I can put this in next, the issue I think is
that of security: currently selinux can if necessary block
access to creating virtio block devices.
But if we have more than one type we need a way for selinux to
block specific types. Can be a patch on top but pls work to
address.

Another question is that with this userspace can inject
packets directly into net stack. Should we check CAP_NET_ADMIN
or such?


> Maxime Coquelin (2):
>   vduse: validate block features only with block devices
>   vduse: enable Virtio-net device type
> 
>  drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> -- 
> 2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-08-10 Thread Michael S. Tsirkin
On Sat, Jul 22, 2023 at 11:03:29PM -0500, michael.chris...@oracle.com wrote:
> On 7/20/23 8:06 AM, Michael S. Tsirkin wrote:
> > On Thu, Feb 02, 2023 at 05:25:17PM -0600, Mike Christie wrote:
> >> For vhost workers we use the kthread API which inherit's its values from
> >> and checks against the kthreadd thread. This results in the wrong RLIMITs
> >> being checked, so while tools like libvirt try to control the number of
> >> threads based on the nproc rlimit setting we can end up creating more
> >> threads than the user wanted.
> >>
> >> This patch has us use the vhost_task helpers which will inherit its
> >> values/checks from the thread that owns the device similar to if we did
> >> a clone in userspace. The vhost threads will now be counted in the nproc
> >> rlimits. And we get features like cgroups and mm sharing automatically,
> >> so we can remove those calls.
> >>
> >> Signed-off-by: Mike Christie 
> >> Acked-by: Michael S. Tsirkin 
> > 
> > 
> > Hi Mike,
> > So this seems to have caused a measureable regression in networking
> > performance (about 30%). Take a look here, and there's a zip file
> > with detailed measuraments attached:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=603
> > 
> > 
> > Could you take a look please?
> > You can also ask reporter questions there assuming you
> > have or can create a (free) account.
> > 
> 
> Sorry for the late reply. I just got home from vacation.
> 
> The account creation link seems to be down. I keep getting a
> "unable to establish SMTP connection to bz-exim-prod port 25 " error.
> 
> Can you give me Quan's email?
> 
> I think I can replicate the problem. I just need some extra info from Quan:
> 
> 1. Just double check that they are using RHEL 9 on the host running the VMs.
> 2. The kernel config
> 3. Any tuning that was done. Is tuned running in guest and/or host running the
> VMs and what profile is being used in each.
> 4. Number of vCPUs and virtqueues being used.
> 5. Can they dump the contents of:
> 
> /sys/kernel/debug/sched
> 
> and
> 
> sysctl  -a
> 
> on the host running the VMs.
> 
> 6. With the 6.4 kernel, can they also run a quick test and tell me if they set
> the scheduler to batch:
> 
> ps -T -o comm,pid,tid $QEMU_THREAD
> 
> then for each vhost thread do:
> 
> chrt -b -p 0 $VHOST_THREAD
> 
> Does that end up increasing perf? When I do this I see throughput go up by
> around 50% vs 6.3 when sessions was 16 or more (16 was the number of vCPUs
> and virtqueues per net device in the VM). Note that I'm not saying that is a 
> fix.
> It's just a difference I noticed when running some other tests.


Mike I'm unsure what to do at this point. Regressions are not nice
but if the kernel is released with the new userspace api we won't
be able to revert. So what's the plan?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Unbinding virtio_pci_modern does not release BAR4 in Linux 6.5.0-rc4

2023-08-10 Thread Stefan Hajnoczi
On Thu, Aug 10, 2023 at 11:08:52AM +0800, Jason Wang wrote:
> On Thu, Aug 3, 2023 at 10:37 PM Stefan Hajnoczi  wrote:
> >
> > Hi,
> > After running "driverctl --nosave set-override :01:00.0 vfio-pci" on
> > a virtio-blk-pci device, /proc/iomem shows that BAR4 is still owned by
> > virtio_pci_modern even though the vfio-pci driver is now bound to the
> > PCI device.
> >
> > This regression was introduced after 6.4.7 but I don't see the culprit
> > in the git logs.
> >
> > Unfortunately I don't have time to investigate further right now but
> > I've included instructions on how to reproduce this below.
> >
> > Can anyone else reproduce this and can we still fix it for the upcoming
> > Linux 6.5?
> 
> This seems to be fixed by:
> 
> https://lore.kernel.org/lkml/20230720131423-mutt-send-email-...@kernel.org/T/

Awesome, thanks for letting me know!

Stefan

> 
> Thanks
> 
> >
> > Thanks,
> > Stefan
> > ---
> > $ qemu-system-x86_64 \
> > -M q35,accel=kvm,kernel-irqchip=split \
> > -cpu host \
> > -m 1G \
> > -device intel-iommu,intremap=on,device-iotlb=on \
> > --blockdev file,filename=test.img,cache.direct=on,node-name=drive0 \
> > --device virtio-blk-pci,drive=drive0 \
> > -blockdev file,filename=test2.img,cache.direct=on,node-name=drive2 \
> > --device ioh3420,id=pcie.1,chassis=1 \
> > --device 
> > virtio-blk-pci,disable-legacy=on,disable-modern=off,drive=drive2,iommu_platform=on,ats=on,bus=pcie.1
> >
> > (guest)# driverctl --nosave set-override :01:00.0 vfio-pci
> > (guest)# cat /proc/iomem
> 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v13 10/12] virtio_ring: introduce dma map api for virtqueue

2023-08-10 Thread Xuan Zhuo
Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in
advance. The purpose is to keep memory mapped across multiple add/get
buf operations.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 69 
 include/linux/virtio.h   |  8 +
 2 files changed, 77 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 639c20b19e06..916479c9c72c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -3106,4 +3106,73 @@ const struct vring *virtqueue_get_vring(const struct 
virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring);
 
+/**
+ * virtqueue_dma_map_single_attrs - map DMA for _vq
+ * @_vq: the struct virtqueue we're talking about.
+ * @ptr: the pointer of the buffer to do dma
+ * @size: the size of the buffer to do dma
+ * @dir: DMA direction
+ * @attrs: DMA Attrs
+ *
+ * The caller calls this to do dma mapping in advance. The DMA address can be
+ * passed to this _vq when it is in pre-mapped mode.
+ *
+ * return DMA address. Caller should check that by 
virtqueue_dma_mapping_error().
+ */
+dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr,
+ size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (!vq->use_dma_api)
+   return (dma_addr_t)virt_to_phys(ptr);
+
+   return dma_map_single_attrs(vring_dma_dev(vq), ptr, size, dir, attrs);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_map_single_attrs);
+
+/**
+ * virtqueue_dma_unmap_single_attrs - unmap DMA for _vq
+ * @_vq: the struct virtqueue we're talking about.
+ * @addr: the dma address to unmap
+ * @size: the size of the buffer
+ * @dir: DMA direction
+ * @attrs: DMA Attrs
+ *
+ * Unmap the address that is mapped by the virtqueue_dma_map_* APIs.
+ *
+ */
+void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (!vq->use_dma_api)
+   return;
+
+   dma_unmap_single_attrs(vring_dma_dev(vq), addr, size, dir, attrs);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_unmap_single_attrs);
+
+/**
+ * virtqueue_dma_mapping_error - check dma address
+ * @_vq: the struct virtqueue we're talking about.
+ * @addr: DMA address
+ *
+ * Returns 0 means dma valid. Other means invalid dma address.
+ */
+int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (!vq->use_dma_api)
+   return 0;
+
+   return dma_mapping_error(vring_dma_dev(vq), addr);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_mapping_error);
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 49a640e0a6f4..79e3c74391e0 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * struct virtqueue - a queue to register buffers for sending or receiving.
@@ -212,4 +213,11 @@ void unregister_virtio_driver(struct virtio_driver *drv);
 #define module_virtio_driver(__virtio_driver) \
module_driver(__virtio_driver, register_virtio_driver, \
unregister_virtio_driver)
+
+dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr, 
size_t size,
+ enum dma_data_direction dir, unsigned 
long attrs);
+void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs);
+int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);
 #endif /* _LINUX_VIRTIO_H */
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 12/12] virtio_net: merge dma operations when filling mergeable buffers

2023-08-10 Thread Xuan Zhuo
Currently, the virtio core will perform a dma operation for each
buffer. Although, the same page may be operated multiple times.

This patch, the driver does the dma operation and manages the dma
address based the feature premapped of virtio core.

This way, we can perform only one dma operation for the pages of the
alloc frag. This is beneficial for the iommu device.

kernel command line: intel_iommu=on iommu.passthrough=0

   |  strict=0  | strict=1
Before |  775496pps | 428614pps
After  | 1109316pps | 742853pps

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 228 ++-
 1 file changed, 202 insertions(+), 26 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 486b5849033d..16adb5ef18f8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -126,6 +126,14 @@ static const struct virtnet_stat_desc 
virtnet_rq_stats_desc[] = {
 #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
 #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
 
+/* The dma information of pages allocated at a time. */
+struct virtnet_rq_dma {
+   dma_addr_t addr;
+   u32 ref;
+   u16 len;
+   u16 need_sync;
+};
+
 /* Internal representation of a send virtqueue */
 struct send_queue {
/* Virtqueue associated with this send _queue */
@@ -175,6 +183,12 @@ struct receive_queue {
char name[16];
 
struct xdp_rxq_info xdp_rxq;
+
+   /* Record the last dma info to free after new pages is allocated. */
+   struct virtnet_rq_dma *last_dma;
+
+   /* Do dma by self */
+   bool do_dma;
 };
 
 /* This structure can contain rss message with maximum settings for 
indirection table and keysize
@@ -549,6 +563,156 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
*vi,
return skb;
 }
 
+static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len)
+{
+   struct page *page = virt_to_head_page(buf);
+   struct virtnet_rq_dma *dma;
+   void *head;
+   int offset;
+
+   head = page_address(page);
+
+   dma = head;
+
+   --dma->ref;
+
+   if (dma->ref) {
+   if (dma->need_sync && len) {
+   offset = buf - (head + sizeof(*dma));
+
+   virtqueue_dma_sync_single_range_for_cpu(rq->vq, 
dma->addr, offset,
+   len, 
DMA_FROM_DEVICE);
+   }
+
+   return;
+   }
+
+   virtqueue_dma_unmap_single_attrs(rq->vq, dma->addr, dma->len,
+DMA_FROM_DEVICE, 
DMA_ATTR_SKIP_CPU_SYNC);
+   put_page(page);
+}
+
+static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
+{
+   void *buf;
+
+   buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
+   if (buf && rq->do_dma)
+   virtnet_rq_unmap(rq, buf, *len);
+
+   return buf;
+}
+
+static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
+{
+   void *buf;
+
+   buf = virtqueue_detach_unused_buf(rq->vq);
+   if (buf && rq->do_dma)
+   virtnet_rq_unmap(rq, buf, 0);
+
+   return buf;
+}
+
+static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 
len)
+{
+   struct virtnet_rq_dma *dma;
+   dma_addr_t addr;
+   u32 offset;
+   void *head;
+
+   if (!rq->do_dma) {
+   sg_init_one(rq->sg, buf, len);
+   return;
+   }
+
+   head = page_address(rq->alloc_frag.page);
+
+   offset = buf - head;
+
+   dma = head;
+
+   addr = dma->addr - sizeof(*dma) + offset;
+
+   sg_init_table(rq->sg, 1);
+   rq->sg[0].dma_address = addr;
+   rq->sg[0].length = len;
+}
+
+static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
+{
+   struct page_frag *alloc_frag = >alloc_frag;
+   struct virtnet_rq_dma *dma;
+   void *buf, *head;
+   dma_addr_t addr;
+
+   if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
+   return NULL;
+
+   head = page_address(alloc_frag->page);
+
+   if (rq->do_dma) {
+   dma = head;
+
+   /* new pages */
+   if (!alloc_frag->offset) {
+   if (rq->last_dma) {
+   /* Now, the new page is allocated, the last dma
+* will not be used. So the dma can be unmapped
+* if the ref is 0.
+*/
+   virtnet_rq_unmap(rq, rq->last_dma, 0);
+   rq->last_dma = NULL;
+   }
+
+   dma->len = alloc_frag->size - sizeof(*dma);
+
+   addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
+ dma->len, 
DMA_FROM_DEVICE, 0);
+   if 

[PATCH vhost v13 11/12] virtio_ring: introduce dma sync api for virtqueue

2023-08-10 Thread Xuan Zhuo
These API has been introduced:

* virtqueue_dma_need_sync
* virtqueue_dma_sync_single_range_for_cpu
* virtqueue_dma_sync_single_range_for_device

These APIs can be used together with the premapped mechanism to sync the
DMA address.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 76 
 include/linux/virtio.h   |  8 
 2 files changed, 84 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 916479c9c72c..81ecb29c88f1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -3175,4 +3175,80 @@ int virtqueue_dma_mapping_error(struct virtqueue *_vq, 
dma_addr_t addr)
 }
 EXPORT_SYMBOL_GPL(virtqueue_dma_mapping_error);
 
+/**
+ * virtqueue_dma_need_sync - check a dma address needs sync
+ * @_vq: the struct virtqueue we're talking about.
+ * @addr: DMA address
+ *
+ * Check if the dma address mapped by the virtqueue_dma_map_* APIs needs to be
+ * synchronized
+ *
+ * return bool
+ */
+bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (!vq->use_dma_api)
+   return false;
+
+   return dma_need_sync(vring_dma_dev(vq), addr);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_need_sync);
+
+/**
+ * virtqueue_dma_sync_single_range_for_cpu - dma sync for cpu
+ * @_vq: the struct virtqueue we're talking about.
+ * @addr: DMA address
+ * @offset: DMA address offset
+ * @size: buf size for sync
+ * @dir: DMA direction
+ *
+ * Before calling this function, use virtqueue_dma_need_sync() to confirm that
+ * the DMA address really needs to be synchronized
+ *
+ */
+void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq,
+dma_addr_t addr,
+unsigned long offset, size_t size,
+enum dma_data_direction dir)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct device *dev = vring_dma_dev(vq);
+
+   if (!vq->use_dma_api)
+   return;
+
+   dma_sync_single_range_for_cpu(dev, addr, offset, size,
+ DMA_BIDIRECTIONAL);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_sync_single_range_for_cpu);
+
+/**
+ * virtqueue_dma_sync_single_range_for_device - dma sync for device
+ * @_vq: the struct virtqueue we're talking about.
+ * @addr: DMA address
+ * @offset: DMA address offset
+ * @size: buf size for sync
+ * @dir: DMA direction
+ *
+ * Before calling this function, use virtqueue_dma_need_sync() to confirm that
+ * the DMA address really needs to be synchronized
+ */
+void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq,
+   dma_addr_t addr,
+   unsigned long offset, size_t 
size,
+   enum dma_data_direction dir)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct device *dev = vring_dma_dev(vq);
+
+   if (!vq->use_dma_api)
+   return;
+
+   dma_sync_single_range_for_device(dev, addr, offset, size,
+DMA_BIDIRECTIONAL);
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_sync_single_range_for_device);
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 79e3c74391e0..1311a7fbe675 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -220,4 +220,12 @@ void virtqueue_dma_unmap_single_attrs(struct virtqueue 
*_vq, dma_addr_t addr,
  size_t size, enum dma_data_direction dir,
  unsigned long attrs);
 int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);
+
+bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
+void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq, dma_addr_t 
addr,
+unsigned long offset, size_t size,
+enum dma_data_direction dir);
+void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq, 
dma_addr_t addr,
+   unsigned long offset, size_t 
size,
+   enum dma_data_direction dir);
 #endif /* _LINUX_VIRTIO_H */
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 09/12] virtio_ring: introduce virtqueue_reset()

2023-08-10 Thread Xuan Zhuo
Introduce virtqueue_reset() to release all buffer inside vq.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 33 +
 include/linux/virtio.h   |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 23172d98e48e..639c20b19e06 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2812,6 +2812,39 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
 
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+   void (*recycle)(struct virtqueue *vq, void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   int err;
+
+   err = virtqueue_disable_and_recycle(_vq, recycle);
+   if (err)
+   return err;
+
+   if (vq->packed_ring)
+   virtqueue_reinit_packed(vq);
+   else
+   virtqueue_reinit_split(vq);
+
+   return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
  unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index bd55a05eec04..49a640e0a6f4 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -99,6 +99,8 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue 
*vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+   void (*recycle)(struct virtqueue *vq, void *buf));
 
 /**
  * struct virtio_device - representation of a device using virtio
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 02/12] virtio_ring: put mapping error check in vring_map_one_sg

2023-08-10 Thread Xuan Zhuo
This patch put the dma addr error check in vring_map_one_sg().

The benefits of doing this:

1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
   vring_mapping_error to check the return value. simplifies subsequent
   code

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 37 +---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f8754f1d64d3..87d7ceeecdbd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct 
vring_virtqueue *vq)
 }
 
 /* Map one sg entry. */
-static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
-  struct scatterlist *sg,
-  enum dma_data_direction direction)
+static int vring_map_one_sg(const struct vring_virtqueue *vq, struct 
scatterlist *sg,
+   enum dma_data_direction direction, dma_addr_t *addr)
 {
if (!vq->use_dma_api) {
/*
@@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
 * depending on the direction.
 */
kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, 
direction);
-   return (dma_addr_t)sg_phys(sg);
+   *addr = (dma_addr_t)sg_phys(sg);
+   return 0;
}
 
/*
@@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
 * the way it expects (we don't guarantee that the scatterlist
 * will exist for the lifetime of the mapping).
 */
-   return dma_map_page(vring_dma_dev(vq),
+   *addr = dma_map_page(vring_dma_dev(vq),
sg_page(sg), sg->offset, sg->length,
direction);
+
+   if (dma_mapping_error(vring_dma_dev(vq), *addr))
+   return -ENOMEM;
+
+   return 0;
 }
 
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   dma_addr_t addr;
+
+   if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, ))
goto unmap_release;
 
prev = i;
@@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
}
for (; n < (out_sgs + in_sgs); n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   dma_addr_t addr;
+
+   if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, ))
goto unmap_release;
 
prev = i;
@@ -1281,9 +1288,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   if (vring_map_one_sg(vq, sg, n < out_sgs ?
+DMA_TO_DEVICE : DMA_FROM_DEVICE, 
))
goto unmap_release;
 
desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1428,9 +1434,10 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
c = 0;
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   dma_addr_t addr;
+
+   if (vring_map_one_sg(vq, sg, n < out_sgs ?
+DMA_TO_DEVICE : DMA_FROM_DEVICE, 
))
goto unmap_release;
 
flags = cpu_to_le16(vq->packed.avail_used_flags |
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 06/12] virtio_ring: skip unmap for premapped

2023-08-10 Thread Xuan Zhuo
Now we add a case where we skip dma unmap, the vq->premapped is true.

We can't just rely on use_dma_api to determine whether to skip the dma
operation. For convenience, I introduced the "do_unmap". By default, it
is the same as use_dma_api. If the driver is configured with premapped,
then do_unmap is false.

So as long as do_unmap is false, for addr of desc, we should skip dma
unmap operation.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 42 
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index bb3d73d221cd..7973814b6e31 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -175,6 +175,11 @@ struct vring_virtqueue {
/* Do DMA mapping by driver */
bool premapped;
 
+   /* Do unmap or not for desc. Just when premapped is False and
+* use_dma_api is true, this is true.
+*/
+   bool do_unmap;
+
/* Head of free buffer list. */
unsigned int free_head;
/* Number we've added since last sync. */
@@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct 
vring_virtqueue *vq,
 {
u16 flags;
 
-   if (!vq->use_dma_api)
+   if (!vq->do_unmap)
return;
 
flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
@@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct 
vring_virtqueue *vq,
struct vring_desc_extra *extra = vq->split.desc_extra;
u16 flags;
 
-   if (!vq->use_dma_api)
-   goto out;
-
flags = extra[i].flags;
 
if (flags & VRING_DESC_F_INDIRECT) {
+   if (!vq->use_dma_api)
+   goto out;
+
dma_unmap_single(vring_dma_dev(vq),
 extra[i].addr,
 extra[i].len,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (!vq->do_unmap)
+   goto out;
+
dma_unmap_page(vring_dma_dev(vq),
   extra[i].addr,
   extra[i].len,
@@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
}
/* Last one doesn't continue. */
desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
-   if (!indirect && vq->use_dma_api)
+   if (!indirect && vq->do_unmap)
vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
~VRING_DESC_F_NEXT;
 
@@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-   if (vq->use_dma_api) {
+   if (vq->do_unmap) {
for (j = 0; j < len / sizeof(struct vring_desc); j++)
vring_unmap_one_split_indirect(vq, 
_desc[j]);
}
@@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct 
vring_virtqueue *vq,
 {
u16 flags;
 
-   if (!vq->use_dma_api)
-   return;
-
flags = extra->flags;
 
if (flags & VRING_DESC_F_INDIRECT) {
+   if (!vq->use_dma_api)
+   return;
+
dma_unmap_single(vring_dma_dev(vq),
 extra->addr, extra->len,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (!vq->do_unmap)
+   return;
+
dma_unmap_page(vring_dma_dev(vq),
   extra->addr, extra->len,
   (flags & VRING_DESC_F_WRITE) ?
@@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct 
vring_virtqueue *vq,
 {
u16 flags;
 
-   if (!vq->use_dma_api)
+   if (!vq->do_unmap)
return;
 
flags = le16_to_cpu(desc->flags);
@@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
sizeof(struct vring_packed_desc));
vq->packed.vring.desc[head].id = cpu_to_le16(id);
 
-   if (vq->use_dma_api) {
+   if (vq->do_unmap) {
vq->packed.desc_extra[id].addr = addr;
vq->packed.desc_extra[id].len = total_sg *
sizeof(struct vring_packed_desc);
@@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
desc[i].len = cpu_to_le32(sg->length);
desc[i].id = cpu_to_le16(id);
 
-   if (unlikely(vq->use_dma_api)) {
+   if 

[PATCH vhost v13 03/12] virtio_ring: introduce virtqueue_set_dma_premapped()

2023-08-10 Thread Xuan Zhuo
This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.

This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 53 
 include/linux/virtio.h   |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 87d7ceeecdbd..8e81b01e0735 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -172,6 +172,9 @@ struct vring_virtqueue {
/* Host publishes avail event idx */
bool event;
 
+   /* Do DMA mapping by driver */
+   bool premapped;
+
/* Head of free buffer list. */
unsigned int free_head;
/* Number we've added since last sync. */
@@ -2061,6 +2064,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->packed_ring = true;
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
+   vq->premapped = false;
 
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
@@ -2550,6 +2554,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned 
int index,
 #endif
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
+   vq->premapped = false;
 
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
@@ -2693,6 +2698,54 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_set_dma_premapped - set the vring premapped mode
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Enable the premapped mode of the vq.
+ *
+ * The vring in premapped mode does not do dma internally, so the driver must
+ * do dma mapping in advance. The driver must pass the dma_address through
+ * dma_address of scatterlist. When the driver got a used buffer from
+ * the vring, it has to unmap the dma address.
+ *
+ * This function must be called immediately after creating the vq, or after vq
+ * reset, and before adding any buffers to it.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EINVAL: vring does not use the dma api, so we can not enable premapped 
mode.
+ */
+int virtqueue_set_dma_premapped(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   u32 num;
+
+   START_USE(vq);
+
+   num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
+
+   if (num != vq->vq.num_free) {
+   END_USE(vq);
+   return -EINVAL;
+   }
+
+   if (!vq->use_dma_api) {
+   END_USE(vq);
+   return -EINVAL;
+   }
+
+   vq->premapped = true;
+
+   END_USE(vq);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
  unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index de6041deee37..8add38038877 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+int virtqueue_set_dma_premapped(struct virtqueue *_vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 07/12] virtio_ring: correct the expression of the description of virtqueue_resize()

2023-08-10 Thread Xuan Zhuo
Modify the "useless" to a more accurate "unused".

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 7973814b6e31..fd9ae020e0a3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2678,7 +2678,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * virtqueue_resize - resize the vring of vq
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
- * @recycle: callback for recycle the useless buffer
+ * @recycle: callback to recycle unused buffers
  *
  * When it is really necessary to create a new vring, it will set the current 
vq
  * into the reset state. Then call the passed callback to recycle the buffer
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 00/12] virtio core prepares for AF_XDP

2023-08-10 Thread Xuan Zhuo
## About DMA APIs

Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.

1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs 
just
   work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
   driver as the dma address. But the maintainers of xsk may want to use dma-buf
   to replace the DMA APIs. I think that may be a larger effort. We will wait
   too long.

So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature.

Thanks for the help from Christoph.

## For AF_XDP

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-|---|--|
xmit by sockperf: 90%|   100%|  |  318967
xmit by xsk:  100%   |   30% |   33%| 1192064
recv by sockperf: 100%   |   68% |   100%   |  692288
recv by xsk:  100%   |   33% |   43%|  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue

## VirtioNET rx dma merge

After introducing premapping, I added an example to virtio-net. virtio-net can
merge dma mappings through this feature. @Jason

kernel command line: intel_iommu=on iommu.passthrough=0

   |  strict=0  | strict=1
Before |  775496pps | 428614pps
After  | 1109316pps | 742853pps

Please review.

Thanks.

v13:
 1. virtio-net uses the virtqueue_dma_* APIs
 2. virtio-net unmap with the flags DMA_ATTR_SKIP_CPU_SYNC

v12:
 1. Alloc dma info from the alloc frag. Avoid alloc array to store the dma info.
 2. rename virtqueue_set_premapped() to virtqueue_set_dma_premapped()

v11
 1. virtio-net merges dma operates based on the feature premapped
 2. A better way to handle the map error with the premapped

v10:
 1. support to set vq to premapped mode, then the vq just handles the premapped 
request.
 2. virtio-net support to do dma mapping in advance

v9:
 1. use flag to distinguish the premapped operations. no do judgment by sg.

v8:
 1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a 
valid dma address
 2. remove unused code from vring_map_one_sg()

v7:
 1. virtqueue_dma_dev() return NULL when virtio is without DMA API.

v6:
 1. change the size of the flags to u32.

v5:
 1. fix for error handler
 2. add flags to record internal dma mapping

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.







Xuan Zhuo (12):
  virtio_ring: check use_dma_api before unmap desc for indirect
  virtio_ring: put mapping error check in vring_map_one_sg
  virtio_ring: introduce virtqueue_set_dma_premapped()
  virtio_ring: support add premapped buf
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: skip unmap for premapped
  virtio_ring: correct the expression of the description of
virtqueue_resize()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: introduce virtqueue_reset()
  virtio_ring: introduce dma map api for virtqueue
  virtio_ring: introduce dma sync api for virtqueue
  virtio_net: merge dma operations when filling mergeable buffers

 drivers/net/virtio_net.c | 228 ---
 drivers/virtio/virtio_ring.c | 410 ++-
 include/linux/virtio.h   |  22 ++
 3 files changed, 582 insertions(+), 78 deletions(-)

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 08/12] virtio_ring: separate the logic of reset/enable from virtqueue_resize

2023-08-10 Thread Xuan Zhuo
The subsequent reset function will reuse these logic.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 58 
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index fd9ae020e0a3..23172d98e48e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2152,6 +2152,43 @@ static int virtqueue_resize_packed(struct virtqueue 
*_vq, u32 num)
return -ENOMEM;
 }
 
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+void (*recycle)(struct virtqueue *vq, 
void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+   void *buf;
+   int err;
+
+   if (!vq->we_own_ring)
+   return -EPERM;
+
+   if (!vdev->config->disable_vq_and_reset)
+   return -ENOENT;
+
+   if (!vdev->config->enable_vq_after_reset)
+   return -ENOENT;
+
+   err = vdev->config->disable_vq_and_reset(_vq);
+   if (err)
+   return err;
+
+   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+   recycle(_vq, buf);
+
+   return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+
+   if (vdev->config->enable_vq_after_reset(_vq))
+   return -EBUSY;
+
+   return 0;
+}
 
 /*
  * Generic functions and exported symbols.
@@ -2702,13 +2739,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf))
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   struct virtio_device *vdev = vq->vq.vdev;
-   void *buf;
int err;
 
-   if (!vq->we_own_ring)
-   return -EPERM;
-
if (num > vq->vq.num_max)
return -E2BIG;
 
@@ -2718,28 +2750,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == 
num)
return 0;
 
-   if (!vdev->config->disable_vq_and_reset)
-   return -ENOENT;
-
-   if (!vdev->config->enable_vq_after_reset)
-   return -ENOENT;
-
-   err = vdev->config->disable_vq_and_reset(_vq);
+   err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
 
-   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
-   recycle(_vq, buf);
-
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
else
err = virtqueue_resize_split(_vq, num);
 
-   if (vdev->config->enable_vq_after_reset(_vq))
-   return -EBUSY;
-
-   return err;
+   return virtqueue_enable_after_reset(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 04/12] virtio_ring: support add premapped buf

2023-08-10 Thread Xuan Zhuo
If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 8e81b01e0735..f9f772e85a38 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -361,6 +361,11 @@ static struct device *vring_dma_dev(const struct 
vring_virtqueue *vq)
 static int vring_map_one_sg(const struct vring_virtqueue *vq, struct 
scatterlist *sg,
enum dma_data_direction direction, dma_addr_t *addr)
 {
+   if (vq->premapped) {
+   *addr = sg_dma_address(sg);
+   return 0;
+   }
+
if (!vq->use_dma_api) {
/*
 * If DMA is not used, KMSAN doesn't know that the scatterlist
@@ -639,8 +644,12 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
dma_addr_t addr = vring_map_single(
vq, desc, total_sg * sizeof(struct vring_desc),
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   if (vring_mapping_error(vq, addr)) {
+   if (vq->premapped)
+   goto free_indirect;
+
goto unmap_release;
+   }
 
virtqueue_add_desc_split(_vq, vq->split.vring.desc,
 head, addr,
@@ -706,6 +715,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
i = vring_unmap_one_split(vq, i);
}
 
+free_indirect:
if (indirect)
kfree(desc);
 
@@ -1307,8 +1317,12 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
addr = vring_map_single(vq, desc,
total_sg * sizeof(struct vring_packed_desc),
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
+   if (vring_mapping_error(vq, addr)) {
+   if (vq->premapped)
+   goto free_desc;
+
goto unmap_release;
+   }
 
vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
@@ -1366,6 +1380,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
for (i = 0; i < err_idx; i++)
vring_unmap_desc_packed(vq, [i]);
 
+free_desc:
kfree(desc);
 
END_USE(vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 01/12] virtio_ring: check use_dma_api before unmap desc for indirect

2023-08-10 Thread Xuan Zhuo
Inside detach_buf_split(), if use_dma_api is false,
vring_unmap_one_split_indirect will be called many times, but actually
nothing is done. So this patch check use_dma_api firstly.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..f8754f1d64d3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -774,8 +774,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-   for (j = 0; j < len / sizeof(struct vring_desc); j++)
-   vring_unmap_one_split_indirect(vq, _desc[j]);
+   if (vq->use_dma_api) {
+   for (j = 0; j < len / sizeof(struct vring_desc); j++)
+   vring_unmap_one_split_indirect(vq, 
_desc[j]);
+   }
 
kfree(indir_desc);
vq->split.desc_state[head].indir_desc = NULL;
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-10 Thread Xuan Zhuo
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 17 +
 include/linux/virtio.h   |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f9f772e85a38..bb3d73d221cd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (vq->use_dma_api)
+   return vring_dma_dev(vq);
+   else
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 8add38038877..bd55a05eec04 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
  void *data,
  gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] vdpa/mlx5: Fixes for ASID handling

2023-08-10 Thread Dragos Tatulea via Virtualization
On Thu, 2023-08-10 at 04:54 -0400, Michael S. Tsirkin wrote:
> On Wed, Aug 02, 2023 at 08:12:16PM +0300, Dragos Tatulea wrote:
> > This patch series is based on Eugenio's fix for handling CVQs in
> > a different ASID [0].
> > 
> > The first patch is the actual fix.
> > 
> > The next 2 patches are fixing a possible issue that I found while
> > implementing patch 1. The patches are ordered like this for clarity.
> > 
> > [0]
> > https://lore.kernel.org/lkml/20230112142218.725622-1-epere...@redhat.com/
> 
> 
> So what are we doing with this patchset? If we are merging anything
> for this release it has to happen now.
> 
Jason mentioned that wanted an additional cleanup patch to move the cvq specific
code to the net part of mlx5_vdpa. That's quite a refactoring though and would
like to take my time to do an RFC for that first.

It would be good if this got merged now as it fixes an actual problem ...

> > Dragos Tatulea (1):
> >   vdpa/mlx5: Fix mr->initialized semantics
> > 
> > Eugenio Pérez (1):
> >   vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
> > 
> >  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 +
> >  drivers/vdpa/mlx5/core/mr.c    | 97 +-
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c  |  4 +-
> >  3 files changed, 74 insertions(+), 29 deletions(-)
> > 
> > -- 
> > 2.41.0
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/2] vdpa/mlx5: Fixes for ASID handling

2023-08-10 Thread Jason Wang
On Thu, Aug 10, 2023 at 4:54 PM Michael S. Tsirkin  wrote:
>
> On Wed, Aug 02, 2023 at 08:12:16PM +0300, Dragos Tatulea wrote:
> > This patch series is based on Eugenio's fix for handling CVQs in
> > a different ASID [0].
> >
> > The first patch is the actual fix.
> >
> > The next 2 patches are fixing a possible issue that I found while
> > implementing patch 1. The patches are ordered like this for clarity.
> >
> > [0] 
> > https://lore.kernel.org/lkml/20230112142218.725622-1-epere...@redhat.com/
>
>
> So what are we doing with this patchset? If we are merging anything
> for this release it has to happen now.

I think we can merge this and do optimization on top.

Acked-by: Jason Wang 

Thanks

>
> > Dragos Tatulea (1):
> >   vdpa/mlx5: Fix mr->initialized semantics
> >
> > Eugenio Pérez (1):
> >   vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
> >
> >  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 +
> >  drivers/vdpa/mlx5/core/mr.c| 97 +-
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c  |  4 +-
> >  3 files changed, 74 insertions(+), 29 deletions(-)
> >
> > --
> > 2.41.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/2] vdpa/mlx5: Fixes for ASID handling

2023-08-10 Thread Michael S. Tsirkin
On Wed, Aug 02, 2023 at 08:12:16PM +0300, Dragos Tatulea wrote:
> This patch series is based on Eugenio's fix for handling CVQs in
> a different ASID [0].
> 
> The first patch is the actual fix.
> 
> The next 2 patches are fixing a possible issue that I found while
> implementing patch 1. The patches are ordered like this for clarity.
> 
> [0] https://lore.kernel.org/lkml/20230112142218.725622-1-epere...@redhat.com/


So what are we doing with this patchset? If we are merging anything
for this release it has to happen now.

> Dragos Tatulea (1):
>   vdpa/mlx5: Fix mr->initialized semantics
> 
> Eugenio Pérez (1):
>   vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
> 
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 +
>  drivers/vdpa/mlx5/core/mr.c| 97 +-
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  |  4 +-
>  3 files changed, 74 insertions(+), 29 deletions(-)
> 
> -- 
> 2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case

2023-08-10 Thread Michael S. Tsirkin
On Thu, Aug 10, 2023 at 11:15:57AM +0800, Hawkins Jiawei wrote:
> Kernel uses `struct virtio_net_ctrl_rss` to save command-specific-data
> for both the VIRTIO_NET_CTRL_MQ_HASH_CONFIG and
> VIRTIO_NET_CTRL_MQ_RSS_CONFIG commands.
> 
> According to the VirtIO standard, "Field reserved MUST contain zeroes.
> It is defined to make the structure to match the layout of
> virtio_net_rss_config structure, defined in 5.1.6.5.7.".
> 
> Yet for the VIRTIO_NET_CTRL_MQ_HASH_CONFIG command case, the `max_tx_vq`
> field in struct virtio_net_ctrl_rss, which corresponds to the
> `reserved` field in struct virtio_net_hash_config, is not zeroed,
> thereby violating the VirtIO standard.
> 
> This patch solves this problem by zeroing this field in
> virtnet_init_default_rss().
> 
> Signed-off-by: Hawkins Jiawei 



Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Cc: Andrew Melnychenko 
Acked-by: Michael S. Tsirkin 

And this is stable material I believe.



> ---
> 
> TestStep
> 
> 1. Boot QEMU with one virtio-net-pci net device with `mq` and `hash`
> feature on, command line like:
>   -netdev tap,vhost=off,...
>   -device virtio-net-pci,mq=on,hash=on,...
> 
> 2. Trigger VIRTIO_NET_CTRL_MQ_HASH_CONFIG command in guest, command
> line like:
>   ethtool -K eth0 rxhash on
> 
> Without this patch, in virtnet_commit_rss_command(), we can see the
> `max_tx_vq` field is 1 in gdb like below:
> 
>   pwndbg> p vi->ctrl->rss
>   $1 = {
> hash_types = 63,
> indirection_table_mask = 0,
> unclassified_queue = 0,
> indirection_table = {0 },
> max_tx_vq = 1,
> hash_key_length = 40 '(',
> ...
>   }
> 
> With this patch, in virtnet_commit_rss_command(), we can see the
> `max_tx_vq` field is 0 in gdb like below:
> 
>   pwndbg> p vi->ctrl->rss
>   $1 = {
> hash_types = 63,
> indirection_table_mask = 0,
> unclassified_queue = 0,
> indirection_table = {0 },
> max_tx_vq = 0,
> hash_key_length = 40 '(',
> ...
>   }
> 
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 1270c8d23463..8db38634ae82 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2761,7 +2761,7 @@ static void virtnet_init_default_rss(struct 
> virtnet_info *vi)
>   vi->ctrl->rss.indirection_table[i] = indir_val;
>   }
>  
> - vi->ctrl->rss.max_tx_vq = vi->curr_queue_pairs;
> + vi->ctrl->rss.max_tx_vq = vi->has_rss ? vi->curr_queue_pairs : 0;
>   vi->ctrl->rss.hash_key_length = vi->rss_key_size;
>  
>   netdev_rss_key_fill(vi->ctrl->rss.key, vi->rss_key_size);
> -- 
> 2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case

2023-08-10 Thread Michael S. Tsirkin
On Thu, Aug 10, 2023 at 11:15:57AM +0800, Hawkins Jiawei wrote:
> Kernel uses `struct virtio_net_ctrl_rss` to save command-specific-data
> for both the VIRTIO_NET_CTRL_MQ_HASH_CONFIG and
> VIRTIO_NET_CTRL_MQ_RSS_CONFIG commands.
> 
> According to the VirtIO standard, "Field reserved MUST contain zeroes.
> It is defined to make the structure to match the layout of
> virtio_net_rss_config structure, defined in 5.1.6.5.7.".
> 
> Yet for the VIRTIO_NET_CTRL_MQ_HASH_CONFIG command case, the `max_tx_vq`
> field in struct virtio_net_ctrl_rss, which corresponds to the
> `reserved` field in struct virtio_net_hash_config, is not zeroed,
> thereby violating the VirtIO standard.
> 
> This patch solves this problem by zeroing this field in
> virtnet_init_default_rss().
> 
> Signed-off-by: Hawkins Jiawei 
> ---
> 
> TestStep
> 
> 1. Boot QEMU with one virtio-net-pci net device with `mq` and `hash`
> feature on, command line like:
>   -netdev tap,vhost=off,...
>   -device virtio-net-pci,mq=on,hash=on,...
> 
> 2. Trigger VIRTIO_NET_CTRL_MQ_HASH_CONFIG command in guest, command
> line like:
>   ethtool -K eth0 rxhash on
> 
> Without this patch, in virtnet_commit_rss_command(), we can see the
> `max_tx_vq` field is 1 in gdb like below:
> 
>   pwndbg> p vi->ctrl->rss
>   $1 = {
> hash_types = 63,
> indirection_table_mask = 0,
> unclassified_queue = 0,
> indirection_table = {0 },
> max_tx_vq = 1,
> hash_key_length = 40 '(',
> ...
>   }
> 
> With this patch, in virtnet_commit_rss_command(), we can see the
> `max_tx_vq` field is 0 in gdb like below:
> 
>   pwndbg> p vi->ctrl->rss
>   $1 = {
> hash_types = 63,
> indirection_table_mask = 0,
> unclassified_queue = 0,
> indirection_table = {0 },
> max_tx_vq = 0,
> hash_key_length = 40 '(',
> ...
>   }
> 
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Fixes tag pls?

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 1270c8d23463..8db38634ae82 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2761,7 +2761,7 @@ static void virtnet_init_default_rss(struct 
> virtnet_info *vi)
>   vi->ctrl->rss.indirection_table[i] = indir_val;
>   }
>  
> - vi->ctrl->rss.max_tx_vq = vi->curr_queue_pairs;
> + vi->ctrl->rss.max_tx_vq = vi->has_rss ? vi->curr_queue_pairs : 0;
>   vi->ctrl->rss.hash_key_length = vi->rss_key_size;
>  
>   netdev_rss_key_fill(vi->ctrl->rss.key, vi->rss_key_size);
> -- 
> 2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-10 Thread Xuan Zhuo
On Thu, 10 Aug 2023 02:39:47 -0400, "Michael S. Tsirkin"  
wrote:
> On Thu, Aug 10, 2023 at 09:56:54AM +0800, Xuan Zhuo wrote:
> >
> > Ping!!
> >
> > Could we push this to the next linux version?
> >
> > Thanks.
>
> You sent v12, so not this one for sure.
> v12 triggered kbuild warnings, you need to fix them and repost.
> Note that I'm on vacation from monday, so if you want this
> merged this needs to be addressed ASAP.

I will post a new version today. The driver will use the wrappers.

Thanks.


>
> --
> MST
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-10 Thread Michael S. Tsirkin
On Thu, Aug 10, 2023 at 02:37:20PM +0800, Jason Wang wrote:
> On Thu, Aug 10, 2023 at 9:59 AM Xuan Zhuo  wrote:
> >
> >
> > Ping!!
> >
> > Could we push this to the next linux version?
> 
> How about implementing the wrappers along with virtqueue_dma_dev() to
> see if Christoph is happy?
> 
> Thanks

That, too.

> >
> > Thanks.
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-10 Thread Michael S. Tsirkin
On Thu, Aug 10, 2023 at 09:56:54AM +0800, Xuan Zhuo wrote:
> 
> Ping!!
> 
> Could we push this to the next linux version?
> 
> Thanks.

You sent v12, so not this one for sure.
v12 triggered kbuild warnings, you need to fix them and repost.
Note that I'm on vacation from monday, so if you want this
merged this needs to be addressed ASAP.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-08-10 Thread Jason Wang
On Thu, Aug 10, 2023 at 9:59 AM Xuan Zhuo  wrote:
>
>
> Ping!!
>
> Could we push this to the next linux version?

How about implementing the wrappers along with virtqueue_dma_dev() to
see if Christoph is happy?

Thanks

>
> Thanks.
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization