Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-17 Thread Jason Wang
On Wed, Oct 18, 2023 at 12:36 PM Si-Wei Liu  wrote:
>
>
>
> On 10/16/2023 7:35 PM, Jason Wang wrote:
> > On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
> >>> On Mon, Oct 16, 2023 at 8:33 AM Jason Wang  wrote:
>  On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu  wrote:
> >
> > On 10/12/2023 8:01 PM, Jason Wang wrote:
> >> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  
> >> wrote:
> >>> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> >>> may need to restore iotlb mapping to the initial or default state
> >>> using the .reset_map op, as it's desirable for some parent devices
> >>> to solely manipulate mappings by its own, independent of virtio device
> >>> state. For instance, device reset does not cause mapping go away on
> >>> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> >>> is going away, give them a chance to reset iotlb back to the initial
> >>> state in vhost_vdpa_cleanup().
> >>>
> >>> Signed-off-by: Si-Wei Liu 
> >>> ---
> >>> drivers/vhost/vdpa.c | 16 
> >>> 1 file changed, 16 insertions(+)
> >>>
> >>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >>> index 851535f..a3f8160 100644
> >>> --- a/drivers/vhost/vdpa.c
> >>> +++ b/drivers/vhost/vdpa.c
> >>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
> >>> *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> >>>return vhost_vdpa_alloc_as(v, asid);
> >>> }
> >>>
> >>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> >>> +{
> >>> +   struct vdpa_device *vdpa = v->vdpa;
> >>> +   const struct vdpa_config_ops *ops = vdpa->config;
> >>> +
> >>> +   if (ops->reset_map)
> >>> +   ops->reset_map(vdpa, asid);
> >>> +}
> >>> +
> >>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> >>> {
> >>>struct vhost_vdpa_as *as = asid_to_as(v, asid);
> >>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct 
> >>> vhost_vdpa *v, u32 asid)
> >>>
> >>>hlist_del(&as->hash_link);
> >>>vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, 
> >>> asid);
> >>> +   /*
> >>> +* Devices with vendor specific IOMMU may need to restore
> >>> +* iotlb to the initial or default state which is not done
> >>> +* through device reset, as the IOTLB mapping manipulation
> >>> +* could be decoupled from the virtio device life cycle.
> >>> +*/
> >> Should we do this according to whether IOTLB_PRESIST is set?
> > Well, in theory this seems like so but it's unnecessary code change
> > actually, as that is the way how vDPA parent behind platform IOMMU works
> > today, and userspace doesn't break as of today. :)
>  Well, this is one question I've ever asked before. You have explained
>  that one of the reason that we don't break userspace is that they may
>  couple IOTLB reset with vDPA reset as well. One example is the Qemu.
> 
> > As explained in previous threads [1][2], when IOTLB_PERSIST is not set
> > it doesn't necessarily mean the iotlb will definitely be destroyed
> > across reset (think about the platform IOMMU case), so userspace today
> > is already tolerating enough with either good or bad IOMMU.
> > I'm confused, how to define tolerating here?
>
> Tolerating defined as QEMU has to proactively unmap before reset just to
> workaround the driver bug (on-chip maps out of sync), unconditionally
> for platform or on-chip. While we all know it doesn't have to do so for
> platform IOMMU, though userspace has no means to distinguish. That said,
> userspace is sacrificing reset time performance on platform IOMMU setup
> just for working around buggy implementation in the other setup.

Ok, so what you actually mean is that userspace can tolerate the "bug"
with the performance penalty.


>
> > For example, if it has tolerance, why bother?
> I'm not sure I get the question. But I think userspace is compromising
> because of buggy implementation in a few drivers doesn't mean we should
> uniformly enforce such behavior for all set_map/dma_map implementations.

This is not my point. I meant, we can fix we need a negotiation in
order to let some "buggy" old user space to survive from the changes.

>
> >
>  This code of
> > not checking IOTLB_PERSIST being set is intentional, there's no point to
> > emulate bad IOMMU behavior even for older userspace (with improper
> > emulation to be done it would result in even worse performance).
> > I can easily imagine a case:
> >
> > The old Qemu that works only with a setup like mlx5_vdpa.
> Noted, seems to me there's no such case of a userspace implementation
> that only works with mlx5_v

Re: [RFC PATCH] vdpa_sim: implement .reset_map support

2023-10-17 Thread Si-Wei Liu

Hi Stefano,

On 10/17/2023 6:44 AM, Stefano Garzarella wrote:

On Fri, Oct 13, 2023 at 10:29:26AM -0700, Si-Wei Liu wrote:

Hi Stefano,

On 10/13/2023 2:22 AM, Stefano Garzarella wrote:

Hi Si-Wei,

On Fri, Oct 13, 2023 at 01:23:40AM -0700, Si-Wei Liu wrote:

RFC only. Not tested on vdpa-sim-blk with user virtual address.


I can test it, but what I should stress?
Great, thank you! As you see, my patch moved vhost_iotlb_reset out of 
vdpasim_reset for the sake of decoupling mapping from vdpa device 
reset. For hardware devices this decoupling makes sense as platform 
IOMMU already did it. But I'm not sure if there's something in the 
software device (esp. with vdpa-blk and the userspace library stack) 
that may have to rely on the current .reset behavior that clears the 
vhost_iotlb. So perhaps you can try to exercise every possible case 
involving blk device reset, and see if anything (related to mapping) 
breaks?


I just tried these steps without using a VM and the host kernel hangs
after adding the device:

[root@f38-vm-build ~]# modprobe virtio-vdpa
[root@f38-vm-build ~]# modprobe vdpa-sim-blk
[root@f38-vm-build ~]# vdpa dev add mgmtdev vdpasim_blk name blk0
[   35.284575][  T563] virtio_blk virtio6: 1/0/0 default/read/poll queues
[   35.286372][  T563] virtio_blk virtio6: [vdb] 262144 512-byte 
logical blocks (134 MB/128 MiB)

[   35.295271][  T564] vringh:

Reverting this patch (so building "vdpa/mlx5: implement .reset_map 
driver op") worked here.
I'm sorry, the previous RFC patch was incomplete - please see the v2 I 
just posted. Tested both use_va and !use_va on vdpa-sim-blk, and raw 
disk copy to the vdpa block simulator using dd seems fine. Just let me 
know how it goes on your side this time.


Thanks,
-Siwei








Works fine with vdpa-sim-net which uses physical address to map.


Can you share your tests? so I'll try to do the same with blk.
Basically everything involving virtio device reset in the guest, 
e.g.  reboot the VM, remove/unbind then reprobe/bind the virtio-net 
module/driver, then see if device I/O (which needs mapping properly) 
is still flowing as expected. And then everything else that could 
trigger QEMU's vhost_dev_start/stop paths ending up as passive 
vhos-vdpa backend reset, for e.g. link status change, 
suspend/hibernate, SVQ switch and live migration. I am not sure if 
vdpa-blk supports live migration through SVQ or not, if not you don't 
need to worry about.






This patch is based on top of [1].

[1] 
https://lore.kernel.org/virtualization/1696928580-7520-1-git-send-email-si-wei@oracle.com/


The series does not apply well on master or vhost tree.
Where should I apply it?

Sent the link through another email offline.


Received thanks!

Stefano



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[RFC v2 PATCH] vdpa_sim: implement .reset_map support

2023-10-17 Thread Si-Wei Liu
RFC only. Not tested on vdpa-sim-blk with user virtual address.
Works fine with vdpa-sim-net which uses physical address to map.

This patch is based on top of [1].

[1] 
https://lore.kernel.org/virtualization/1696928580-7520-1-git-send-email-si-wei@oracle.com/

Signed-off-by: Si-Wei Liu 

---
RFC v2:
  - initialize iotlb to passthrough mode in device add
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 34 
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 76d41058add9..2a0a6042d61d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -151,13 +151,6 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
 &vdpasim->iommu_lock);
}
 
-   for (i = 0; i < vdpasim->dev_attr.nas; i++) {
-   vhost_iotlb_reset(&vdpasim->iommu[i]);
-   vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX,
- 0, VHOST_MAP_RW);
-   vdpasim->iommu_pt[i] = true;
-   }
-
vdpasim->running = true;
spin_unlock(&vdpasim->iommu_lock);
 
@@ -259,8 +252,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
if (!vdpasim->iommu_pt)
goto err_iommu;
 
-   for (i = 0; i < vdpasim->dev_attr.nas; i++)
+   for (i = 0; i < vdpasim->dev_attr.nas; i++) {
vhost_iotlb_init(&vdpasim->iommu[i], max_iotlb_entries, 0);
+   vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX, 0,
+ VHOST_MAP_RW);
+   vdpasim->iommu_pt[i] = true;
+   }
 
for (i = 0; i < dev_attr->nvqs; i++)
vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0],
@@ -637,6 +634,25 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, 
unsigned int asid,
return ret;
 }
 
+static int vdpasim_reset_map(struct vdpa_device *vdpa, unsigned int asid)
+{
+   struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+   if (asid >= vdpasim->dev_attr.nas)
+   return -EINVAL;
+
+   spin_lock(&vdpasim->iommu_lock);
+   if (vdpasim->iommu_pt[asid])
+   goto out;
+   vhost_iotlb_reset(&vdpasim->iommu[asid]);
+   vhost_iotlb_add_range(&vdpasim->iommu[asid], 0, ULONG_MAX,
+ 0, VHOST_MAP_RW);
+   vdpasim->iommu_pt[asid] = true;
+out:
+   spin_unlock(&vdpasim->iommu_lock);
+   return 0;
+}
+
 static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
@@ -759,6 +775,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
.set_group_asid = vdpasim_set_group_asid,
.dma_map= vdpasim_dma_map,
.dma_unmap  = vdpasim_dma_unmap,
+   .reset_map  = vdpasim_reset_map,
.bind_mm= vdpasim_bind_mm,
.unbind_mm  = vdpasim_unbind_mm,
.free   = vdpasim_free,
@@ -796,6 +813,7 @@ static const struct vdpa_config_ops 
vdpasim_batch_config_ops = {
.get_iova_range = vdpasim_get_iova_range,
.set_group_asid = vdpasim_set_group_asid,
.set_map= vdpasim_set_map,
+   .reset_map  = vdpasim_reset_map,
.bind_mm= vdpasim_bind_mm,
.unbind_mm  = vdpasim_unbind_mm,
.free   = vdpasim_free,
-- 
2.39.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-17 Thread Si-Wei Liu



On 10/16/2023 7:35 PM, Jason Wang wrote:

On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu  wrote:



On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:

On Mon, Oct 16, 2023 at 8:33 AM Jason Wang  wrote:

On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu  wrote:


On 10/12/2023 8:01 PM, Jason Wang wrote:

On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:

Devices with on-chip IOMMU or vendor specific IOTLB implementation
may need to restore iotlb mapping to the initial or default state
using the .reset_map op, as it's desirable for some parent devices
to solely manipulate mappings by its own, independent of virtio device
state. For instance, device reset does not cause mapping go away on
such IOTLB model in need of persistent mapping. Before vhost-vdpa
is going away, give them a chance to reset iotlb back to the initial
state in vhost_vdpa_cleanup().

Signed-off-by: Si-Wei Liu 
---
drivers/vhost/vdpa.c | 16 
1 file changed, 16 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 851535f..a3f8160 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
*vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
   return vhost_vdpa_alloc_as(v, asid);
}

+static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   if (ops->reset_map)
+   ops->reset_map(vdpa, asid);
+}
+
static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
{
   struct vhost_vdpa_as *as = asid_to_as(v, asid);
@@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 
asid)

   hlist_del(&as->hash_link);
   vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
+   /*
+* Devices with vendor specific IOMMU may need to restore
+* iotlb to the initial or default state which is not done
+* through device reset, as the IOTLB mapping manipulation
+* could be decoupled from the virtio device life cycle.
+*/

Should we do this according to whether IOTLB_PRESIST is set?

Well, in theory this seems like so but it's unnecessary code change
actually, as that is the way how vDPA parent behind platform IOMMU works
today, and userspace doesn't break as of today. :)

Well, this is one question I've ever asked before. You have explained
that one of the reason that we don't break userspace is that they may
couple IOTLB reset with vDPA reset as well. One example is the Qemu.


As explained in previous threads [1][2], when IOTLB_PERSIST is not set
it doesn't necessarily mean the iotlb will definitely be destroyed
across reset (think about the platform IOMMU case), so userspace today
is already tolerating enough with either good or bad IOMMU.

I'm confused, how to define tolerating here?


Tolerating defined as QEMU has to proactively unmap before reset just to 
workaround the driver bug (on-chip maps out of sync), unconditionally 
for platform or on-chip. While we all know it doesn't have to do so for 
platform IOMMU, though userspace has no means to distinguish. That said, 
userspace is sacrificing reset time performance on platform IOMMU setup 
just for working around buggy implementation in the other setup.



For example, if it has tolerance, why bother?
I'm not sure I get the question. But I think userspace is compromising 
because of buggy implementation in a few drivers doesn't mean we should 
uniformly enforce such behavior for all set_map/dma_map implementations.





This code of

not checking IOTLB_PERSIST being set is intentional, there's no point to
emulate bad IOMMU behavior even for older userspace (with improper
emulation to be done it would result in even worse performance).

I can easily imagine a case:

The old Qemu that works only with a setup like mlx5_vdpa.
Noted, seems to me there's no such case of a userspace implementation 
that only works with mlx5_vdpa or its friends, but doesn't work with the 
others e.g. platform IOMMU, or well behaving on-chip IOMMU 
implementations. The Unmap+remap trick around vdpa reset works totally 
fine for platform IOMMU, except with sub-optimal performance. Other than 
this trick, I cannot easily think of other means or iotlb message 
sequence for userspace to recover the bogus state and make iotlb back to 
work again after reset. Are we talking about hypnosis that has no real 
basis to exist in the real world?



  If we do
this without a negotiation, IOTLB will not be clear but the Qemu will
try to re-program the IOTLB after reset. Which will break?

1) stick the exact old behaviour with just one line of check
It's not just one line of check here, the old behavior emulation has to 
be done as Eugenio illustrated in the other email. In addition, the 
emulation has to limit to those buggy drivers as I don't feel this 
emulation should apply uniformly to all future set_map/dma

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Jason Wang
On Wed, Oct 18, 2023 at 11:38 AM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 19:19:41 +0800, Xuan Zhuo  
> wrote:
> > On Tue, 17 Oct 2023 14:43:33 +0800, Xuan Zhuo  
> > wrote:
> > > On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > > > >  wrote:
> > > > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > > > >  wrote:
> > > > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > ## AF_XDP
> > > > > > > > > > > >
> > > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel 
> > > > > > > > > > > > network framework. The zero
> > > > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported 
> > > > > > > > > > > > by the driver. The
> > > > > > > > > > > > performance of zero copy is very good. mlx5 and intel 
> > > > > > > > > > > > ixgbe already support
> > > > > > > > > > > > this feature, This patch set allows virtio-net to 
> > > > > > > > > > > > support xsk's zerocopy xmit
> > > > > > > > > > > > feature.
> > > > > > > > > > > >
> > > > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > > > >
> > > > > > > > > > > > So it is time for Virtio-Net to complete the support 
> > > > > > > > > > > > for the XDP Socket
> > > > > > > > > > > > Zerocopy.
> > > > > > > > > > > >
> > > > > > > > > > > > Virtio-net can not increase the queue num at will, so 
> > > > > > > > > > > > xsk shares the queue with
> > > > > > > > > > > > kernel.
> > > > > > > > > > > >
> > > > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > > > interrupt from driver
> > > > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. 
> > > > > > > > > > > > If the CPU run by TX
> > > > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI 
> > > > > > > > > > > > on the remote CPU. If it
> > > > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > > > >
> > > > > > > > > > > > This patch set includes some refactor to the virtio-net 
> > > > > > > > > > > > to let that to support
> > > > > > > > > > > > AF_XDP.
> > > > > > > > > > > >
> > > > > > > > > > > > ## performance
> > > > > > > > > > > >
> > > > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > > > >
> > > > > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > > > > >
> > > > > > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > > > > > >
> > > > > > > > > > > > I write a tool that sends udp packets or recvs udp 
> > > > > > > > > > > > packets by AF_XDP.
> > > > > > > > > > > >
> > > > > > > > > > > >   | Guest APP CPU |Guest Softirq CPU | 
> > > > > > > > > > > > UDP PPS
> > > > > > > > > > > > --|---|--|
> > > > > > > > > > > > xmit by syscall   |   100%|  |  
> > > > > > > > > > > >  676,915
> > > > > > > > > > > > xmit by xsk   |   59.1%   |   100%   | 
> > > > > > > > > > > > 5,447,168
> > > > > > > > > > > > recv by syscall   |   60% |   100%   |  
> > > > > > > > > > > >  932,288
> > > > > > > > > > > > recv by xsk   |   35.7%   |   100%   | 
> > > > > > > > > > > > 3,343,168
> > > > > > > > > > >
> > > > > > > > > > > Any chance we can get a testpmd result (which I guess 
> > > > > > > > > > > should be better
> > > > > > > > > > > than PPS above)?
> > > > > > > > > >
> > > > > > > > > > Do you mean testpmd + DPDK + AF_XDP?
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes. This is probably better because my tool does more 
> > > > > > > > > > work. That is not a
> > > > > > > > > > complete testing tool used by our business.
> > > > > > > > >
> > > > > > > > > Probably, but it would be appealing for others. Especially 
> > > > > > > > > considering
> > > > > > > > > DPDK supports AF_XDP PMD now.
> > > > > > > >
> > > > > > > > OK.
> > > > > > > >
> > > > > > > > Let me try.
> > > > > > > >
> > > > > > > > But could you start to review firstly?
> > > > >

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Xuan Zhuo
On Tue, 17 Oct 2023 19:19:41 +0800, Xuan Zhuo  
wrote:
> On Tue, 17 Oct 2023 14:43:33 +0800, Xuan Zhuo  
> wrote:
> > On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  wrote:
> > > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > > >  wrote:
> > > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > ## AF_XDP
> > > > > > > > > > >
> > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network 
> > > > > > > > > > > framework. The zero
> > > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by 
> > > > > > > > > > > the driver. The
> > > > > > > > > > > performance of zero copy is very good. mlx5 and intel 
> > > > > > > > > > > ixgbe already support
> > > > > > > > > > > this feature, This patch set allows virtio-net to support 
> > > > > > > > > > > xsk's zerocopy xmit
> > > > > > > > > > > feature.
> > > > > > > > > > >
> > > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > > >
> > > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > > >
> > > > > > > > > > > So it is time for Virtio-Net to complete the support for 
> > > > > > > > > > > the XDP Socket
> > > > > > > > > > > Zerocopy.
> > > > > > > > > > >
> > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk 
> > > > > > > > > > > shares the queue with
> > > > > > > > > > > kernel.
> > > > > > > > > > >
> > > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > > interrupt from driver
> > > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. 
> > > > > > > > > > > If the CPU run by TX
> > > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on 
> > > > > > > > > > > the remote CPU. If it
> > > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > > >
> > > > > > > > > > > This patch set includes some refactor to the virtio-net 
> > > > > > > > > > > to let that to support
> > > > > > > > > > > AF_XDP.
> > > > > > > > > > >
> > > > > > > > > > > ## performance
> > > > > > > > > > >
> > > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > > >
> > > > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > > > >
> > > > > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > > > > >
> > > > > > > > > > > I write a tool that sends udp packets or recvs udp 
> > > > > > > > > > > packets by AF_XDP.
> > > > > > > > > > >
> > > > > > > > > > >   | Guest APP CPU |Guest Softirq CPU | 
> > > > > > > > > > > UDP PPS
> > > > > > > > > > > --|---|--|
> > > > > > > > > > > xmit by syscall   |   100%|  |   
> > > > > > > > > > > 676,915
> > > > > > > > > > > xmit by xsk   |   59.1%   |   100%   | 
> > > > > > > > > > > 5,447,168
> > > > > > > > > > > recv by syscall   |   60% |   100%   |   
> > > > > > > > > > > 932,288
> > > > > > > > > > > recv by xsk   |   35.7%   |   100%   | 
> > > > > > > > > > > 3,343,168
> > > > > > > > > >
> > > > > > > > > > Any chance we can get a testpmd result (which I guess 
> > > > > > > > > > should be better
> > > > > > > > > > than PPS above)?
> > > > > > > > >
> > > > > > > > > Do you mean testpmd + DPDK + AF_XDP?
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes. This is probably better because my tool does more work. 
> > > > > > > > > That is not a
> > > > > > > > > complete testing tool used by our business.
> > > > > > > >
> > > > > > > > Probably, but it would be appealing for others. Especially 
> > > > > > > > considering
> > > > > > > > DPDK supports AF_XDP PMD now.
> > > > > > >
> > > > > > > OK.
> > > > > > >
> > > > > > > Let me try.
> > > > > > >
> > > > > > > But could you start to review firstly?
> > > > > >
> > > > > > Yes, it's in my todo list.
> > > > >
> > > > > Speaking too fast, I think if it doesn't take too long time, I would
> > > > > wait for the result first as netdim series. One reason is that I
> > > > > remember claims to be only 10% to 20% loss comparing to wire spe

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Xuan Zhuo
On Wed, 18 Oct 2023 10:46:38 +0800, Jason Wang  wrote:
> On Tue, Oct 17, 2023 at 7:28 PM Xuan Zhuo  wrote:
> >
> > On Tue, 17 Oct 2023 14:43:33 +0800, Xuan Zhuo  
> > wrote:
> > > On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > > > >  wrote:
> > > > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > > > >  wrote:
> > > > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > ## AF_XDP
> > > > > > > > > > > >
> > > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel 
> > > > > > > > > > > > network framework. The zero
> > > > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported 
> > > > > > > > > > > > by the driver. The
> > > > > > > > > > > > performance of zero copy is very good. mlx5 and intel 
> > > > > > > > > > > > ixgbe already support
> > > > > > > > > > > > this feature, This patch set allows virtio-net to 
> > > > > > > > > > > > support xsk's zerocopy xmit
> > > > > > > > > > > > feature.
> > > > > > > > > > > >
> > > > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > > > >
> > > > > > > > > > > > So it is time for Virtio-Net to complete the support 
> > > > > > > > > > > > for the XDP Socket
> > > > > > > > > > > > Zerocopy.
> > > > > > > > > > > >
> > > > > > > > > > > > Virtio-net can not increase the queue num at will, so 
> > > > > > > > > > > > xsk shares the queue with
> > > > > > > > > > > > kernel.
> > > > > > > > > > > >
> > > > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > > > interrupt from driver
> > > > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. 
> > > > > > > > > > > > If the CPU run by TX
> > > > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI 
> > > > > > > > > > > > on the remote CPU. If it
> > > > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > > > >
> > > > > > > > > > > > This patch set includes some refactor to the virtio-net 
> > > > > > > > > > > > to let that to support
> > > > > > > > > > > > AF_XDP.
> > > > > > > > > > > >
> > > > > > > > > > > > ## performance
> > > > > > > > > > > >
> > > > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > > > >
> > > > > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > > > > >
> > > > > > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > > > > > >
> > > > > > > > > > > > I write a tool that sends udp packets or recvs udp 
> > > > > > > > > > > > packets by AF_XDP.
> > > > > > > > > > > >
> > > > > > > > > > > >   | Guest APP CPU |Guest Softirq CPU | 
> > > > > > > > > > > > UDP PPS
> > > > > > > > > > > > --|---|--|
> > > > > > > > > > > > xmit by syscall   |   100%|  |  
> > > > > > > > > > > >  676,915
> > > > > > > > > > > > xmit by xsk   |   59.1%   |   100%   | 
> > > > > > > > > > > > 5,447,168
> > > > > > > > > > > > recv by syscall   |   60% |   100%   |  
> > > > > > > > > > > >  932,288
> > > > > > > > > > > > recv by xsk   |   35.7%   |   100%   | 
> > > > > > > > > > > > 3,343,168
> > > > > > > > > > >
> > > > > > > > > > > Any chance we can get a testpmd result (which I guess 
> > > > > > > > > > > should be better
> > > > > > > > > > > than PPS above)?
> > > > > > > > > >
> > > > > > > > > > Do you mean testpmd + DPDK + AF_XDP?
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes. This is probably better because my tool does more 
> > > > > > > > > > work. That is not a
> > > > > > > > > > complete testing tool used by our business.
> > > > > > > > >
> > > > > > > > > Probably, but it would be appealing for others. Especially 
> > > > > > > > > considering
> > > > > > > > > DPDK supports AF_XDP PMD now.
> > > > > > > >
> > > > > > > > OK.
> > > > > > > >
> > > > > > > > Let me try.
> > > > > > > >
> > > > > > > > But could you start to review firstly?
> > > > > 

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Jason Wang
On Tue, Oct 17, 2023 at 7:28 PM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 14:43:33 +0800, Xuan Zhuo  
> wrote:
> > On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  wrote:
> > > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > > >  wrote:
> > > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > ## AF_XDP
> > > > > > > > > > >
> > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network 
> > > > > > > > > > > framework. The zero
> > > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by 
> > > > > > > > > > > the driver. The
> > > > > > > > > > > performance of zero copy is very good. mlx5 and intel 
> > > > > > > > > > > ixgbe already support
> > > > > > > > > > > this feature, This patch set allows virtio-net to support 
> > > > > > > > > > > xsk's zerocopy xmit
> > > > > > > > > > > feature.
> > > > > > > > > > >
> > > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > > >
> > > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > > >
> > > > > > > > > > > So it is time for Virtio-Net to complete the support for 
> > > > > > > > > > > the XDP Socket
> > > > > > > > > > > Zerocopy.
> > > > > > > > > > >
> > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk 
> > > > > > > > > > > shares the queue with
> > > > > > > > > > > kernel.
> > > > > > > > > > >
> > > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > > interrupt from driver
> > > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. 
> > > > > > > > > > > If the CPU run by TX
> > > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on 
> > > > > > > > > > > the remote CPU. If it
> > > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > > >
> > > > > > > > > > > This patch set includes some refactor to the virtio-net 
> > > > > > > > > > > to let that to support
> > > > > > > > > > > AF_XDP.
> > > > > > > > > > >
> > > > > > > > > > > ## performance
> > > > > > > > > > >
> > > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > > >
> > > > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > > > >
> > > > > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > > > > >
> > > > > > > > > > > I write a tool that sends udp packets or recvs udp 
> > > > > > > > > > > packets by AF_XDP.
> > > > > > > > > > >
> > > > > > > > > > >   | Guest APP CPU |Guest Softirq CPU | 
> > > > > > > > > > > UDP PPS
> > > > > > > > > > > --|---|--|
> > > > > > > > > > > xmit by syscall   |   100%|  |   
> > > > > > > > > > > 676,915
> > > > > > > > > > > xmit by xsk   |   59.1%   |   100%   | 
> > > > > > > > > > > 5,447,168
> > > > > > > > > > > recv by syscall   |   60% |   100%   |   
> > > > > > > > > > > 932,288
> > > > > > > > > > > recv by xsk   |   35.7%   |   100%   | 
> > > > > > > > > > > 3,343,168
> > > > > > > > > >
> > > > > > > > > > Any chance we can get a testpmd result (which I guess 
> > > > > > > > > > should be better
> > > > > > > > > > than PPS above)?
> > > > > > > > >
> > > > > > > > > Do you mean testpmd + DPDK + AF_XDP?
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes. This is probably better because my tool does more work. 
> > > > > > > > > That is not a
> > > > > > > > > complete testing tool used by our business.
> > > > > > > >
> > > > > > > > Probably, but it would be appealing for others. Especially 
> > > > > > > > considering
> > > > > > > > DPDK supports AF_XDP PMD now.
> > > > > > >
> > > > > > > OK.
> > > > > > >
> > > > > > > Let me try.
> > > > > > >
> > > > > > > But could you start to review firstly?
> > > > > >
> > > > > > Yes, it's in my todo list.
> > > > >
> > > > > Speaking too fast, I think if it doesn't take too long time, I would
> > > > > wait for the result first as netdim series. One reason is that I
> > > > > remember claims to be only 10% to 20% loss comparing to wire speed,

Re: virtio-sound: release control request clarification

2023-10-17 Thread Anton Yakovlev via Virtualization

Hi Matias,


On 18.10.2023 00:19, Matias Ezequiel Vara Larsen wrote:

Hello,

This email is to clarify the VirtIO specification regarding the RELEASE
control request. Section 5.14.6.6.5.1 [1] states the following device
requirements for the RELEASE control request:
1. The device MUST complete all pending I/O messages for the specified
stream ID.
2. The device MUST NOT complete the control request while there are
pending I/O messages for the specified stream ID.

The 1) requirement does not indicate what "complete" means. Does it mean
that the pending I/O messages in the tx queue shall be outputted in the
host, i.e., consumed by the audio backend? Or, completion means simply
to put the requests in the used-ring without consuming them?


Here "to complete" means moving the buffers to the used list in vring.
Technically, the specification only requires that the device "return" all
referenced DMA memory to the guest before completing the RELEASE control
request. What the device actually does with these I/O messages is
implementation dependent and is not within the scope of the specification.
Thus...



Regarding 2), I interpret it as "the device shall wait until all I/O
messages are proceeded to complete the RELEASE control request".


...you can do this way if you really need to.



Currently, the kernel driver seems not expecting such a delay when the
RELEASE command is sent. If I understand correctly, the kernel driver
first sends the RELEASE command and waits a fixed amount of time until
the device can process it. Then, the driver waits a fixed amount of time
until all pending IO messages are completed. If the device follows the
specification and waits until all messages IO are completed to issue the
completion of the RELEASE command, the kernel driver may timeout. The
time to complete N IO messages in the TX queue could be proportional
with the number of pending messages.


The default timeout for control requests in the ALSA driver is 1 second. In
theory, this time should be enough to completely reproduce/fill the 500ms
buffer, and complete all requests, including the RELEASE control request. If
the device fails to do this, then most likely there are some problems with the
implementation.



In our device implementation [2], RELEASE is handled as follows:
- Drop all messages in the TX queue without outputting in the host.
- Complete the RELEASE control request.

This seems to be working, however, I can observe that sometimes there
are still requests in the TX queue when we get RELEASE. Those requests
are never reproduced in the host.

My questions are:
- In the specification, should we modify it to clarify that all pending
   IO messages in the device are discarded during RELEASE, that is, not
   output to the host, but signaled to the guest as completed?


No, we shouldn't. See comment above.



- According to the specification, should the driver wait in RELEASE an
   amount of time proportional to the number of periods yet to be
   reproduced?


This is purely a matter of driver implementation. It is possible to implement
the driver without timeouts, but this would be a bad idea. Because bugs in the
device could lead to an infinite wait in the kernel.


Best regards,



Thanks, Matias.

[1]
https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdocs.oasis%2dopen.org%2fvirtio%2fvirtio%2fv1.2%2fcsd01%2fvirtio%2dv1.2%2dcsd01.html&umid=31e1136e-6322-4698-9f1d-d631ac36403e&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-586f0596c89224a3bc9e20df81eaea8933bb129a
[2]
https://github.com/rust-vmm/vhost-device/tree/main/staging/vhost-device-sound


--
Anton Yakovlev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Jason Wang
On Tue, Oct 17, 2023 at 3:00 PM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  wrote:
> > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  wrote:
> > > > >
> > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > >  wrote:
> > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > >  wrote:
> > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > ## AF_XDP
> > > > > > > > > >
> > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network 
> > > > > > > > > > framework. The zero
> > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by 
> > > > > > > > > > the driver. The
> > > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe 
> > > > > > > > > > already support
> > > > > > > > > > this feature, This patch set allows virtio-net to support 
> > > > > > > > > > xsk's zerocopy xmit
> > > > > > > > > > feature.
> > > > > > > > > >
> > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > >
> > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > >
> > > > > > > > > > So it is time for Virtio-Net to complete the support for 
> > > > > > > > > > the XDP Socket
> > > > > > > > > > Zerocopy.
> > > > > > > > > >
> > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk 
> > > > > > > > > > shares the queue with
> > > > > > > > > > kernel.
> > > > > > > > > >
> > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > interrupt from driver
> > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If 
> > > > > > > > > > the CPU run by TX
> > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on 
> > > > > > > > > > the remote CPU. If it
> > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > >
> > > > > > > > > > This patch set includes some refactor to the virtio-net to 
> > > > > > > > > > let that to support
> > > > > > > > > > AF_XDP.
> > > > > > > > > >
> > > > > > > > > > ## performance
> > > > > > > > > >
> > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > >
> > > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > > >
> > > > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > > > >
> > > > > > > > > > I write a tool that sends udp packets or recvs udp packets 
> > > > > > > > > > by AF_XDP.
> > > > > > > > > >
> > > > > > > > > >   | Guest APP CPU |Guest Softirq CPU | UDP 
> > > > > > > > > > PPS
> > > > > > > > > > --|---|--|
> > > > > > > > > > xmit by syscall   |   100%|  |   
> > > > > > > > > > 676,915
> > > > > > > > > > xmit by xsk   |   59.1%   |   100%   | 
> > > > > > > > > > 5,447,168
> > > > > > > > > > recv by syscall   |   60% |   100%   |   
> > > > > > > > > > 932,288
> > > > > > > > > > recv by xsk   |   35.7%   |   100%   | 
> > > > > > > > > > 3,343,168
> > > > > > > > >
> > > > > > > > > Any chance we can get a testpmd result (which I guess should 
> > > > > > > > > be better
> > > > > > > > > than PPS above)?
> > > > > > > >
> > > > > > > > Do you mean testpmd + DPDK + AF_XDP?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > >
> > > > > > > > Yes. This is probably better because my tool does more work. 
> > > > > > > > That is not a
> > > > > > > > complete testing tool used by our business.
> > > > > > >
> > > > > > > Probably, but it would be appealing for others. Especially 
> > > > > > > considering
> > > > > > > DPDK supports AF_XDP PMD now.
> > > > > >
> > > > > > OK.
> > > > > >
> > > > > > Let me try.
> > > > > >
> > > > > > But could you start to review firstly?
> > > > >
> > > > > Yes, it's in my todo list.
> > > >
> > > > Speaking too fast, I think if it doesn't take too long time, I would
> > > > wait for the result first as netdim series. One reason is that I
> > > > remember claims to be only 10% to 20% loss comparing to wire speed, so
> > > > I'd expect it should be much faster. I vaguely remember, even a vhost
> > > > can gives us more than 3M PPS if we disable SMAP, so the numbers here
> > > > are not as impressive as expected.
> > >
> > >
> > > What is SMAP? Cloud you give me more info?
> >
> > Supervisor Mode Acce

Re: [RFC PATCH] ALSA: virtio: use copy and fill_silence callbacks

2023-10-17 Thread Anton Yakovlev via Virtualization

Hi Matias,

On 17.10.2023 21:54, Matias Ezequiel Vara Larsen wrote:

Hello Anton,

Thanks for your help! I am going to send a second version of the patch
with your changes. Is it OK if I add you with the "Co-developed-by"
tag?.


Yes, I'm fine with that. :)


Best regards,

--
Anton Yakovlev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands

2023-10-17 Thread kernel test robot
Hi Yishai,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.6-rc6 next-20231017]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Yishai-Hadas/virtio-pci-Fix-common-config-map-for-modern-device/20231017-214450
base:   linus/master
patch link:
https://lore.kernel.org/r/20231017134217.82497-7-yishaih%40nvidia.com
patch subject: [PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy 
IO admin commands
config: alpha-allyesconfig 
(https://download.01.org/0day-ci/archive/20231018/202310180437.jo2csm6u-...@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20231018/202310180437.jo2csm6u-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202310180437.jo2csm6u-...@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_pci_modern.c:731:5: warning: no previous prototype for 
>> 'virtio_pci_admin_list_query' [-Wmissing-prototypes]
 731 | int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int 
buf_size)
 | ^~~
>> drivers/virtio/virtio_pci_modern.c:758:5: warning: no previous prototype for 
>> 'virtio_pci_admin_list_use' [-Wmissing-prototypes]
 758 | int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int 
buf_size)
 | ^
>> drivers/virtio/virtio_pci_modern.c:786:5: warning: no previous prototype for 
>> 'virtio_pci_admin_legacy_io_write' [-Wmissing-prototypes]
 786 | int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 
opcode,
 | ^~~~
>> drivers/virtio/virtio_pci_modern.c:831:5: warning: no previous prototype for 
>> 'virtio_pci_admin_legacy_io_read' [-Wmissing-prototypes]
 831 | int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode,
 | ^~~
>> drivers/virtio/virtio_pci_modern.c:877:5: warning: no previous prototype for 
>> 'virtio_pci_admin_legacy_io_notify_info' [-Wmissing-prototypes]
 877 | int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev,
 | ^~


vim +/virtio_pci_admin_list_query +731 drivers/virtio/virtio_pci_modern.c

   721  
   722  /*
   723   * virtio_pci_admin_list_query - Provides to driver list of commands
   724   * supported for the PCI VF.
   725   * @dev: VF pci_dev
   726   * @buf: buffer to hold the returned list
   727   * @buf_size: size of the given buffer
   728   *
   729   * Returns 0 on success, or negative on failure.
   730   */
 > 731  int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int 
 > buf_size)
   732  {
   733  struct virtio_device *virtio_dev = 
virtio_pci_vf_get_pf_dev(pdev);
   734  struct virtio_admin_cmd cmd = {};
   735  struct scatterlist result_sg;
   736  
   737  if (!virtio_dev)
   738  return -ENODEV;
   739  
   740  sg_init_one(&result_sg, buf, buf_size);
   741  cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
   742  cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   743  cmd.result_sg = &result_sg;
   744  
   745  return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
   746  }
   747  EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
   748  
   749  /*
   750   * virtio_pci_admin_list_use - Provides to device list of commands
   751   * used for the PCI VF.
   752   * @dev: VF pci_dev
   753   * @buf: buffer which holds the list
   754   * @buf_size: size of the given buffer
   755   *
   756   * Returns 0 on success, or negative on failure.
   757   */
 > 758  int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int 
 > buf_size)
   759  {
   760  struct virtio_device *virtio_dev = 
virtio_pci_vf_get_pf_dev(pdev);
   761  struct virtio_admin_cmd cmd = {};
   762  struct scatterlist data_sg;
   763  
   764  if (!virtio_dev)
   765  return -ENODEV;
   766  
   767  sg_init_one(&data_sg, buf, buf_size);
   768  cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
   769  cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
   770  cmd.data_sg = &data_sg;
   771  
   772  return vp_modern_admin_cmd_exec(virtio_dev, &c

Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-17 Thread Alex Williamson
On Tue, 17 Oct 2023 16:42:17 +0300
Yishai Hadas  wrote:
> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> +   const struct pci_device_id *id)
> +{
> + const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> + struct virtiovf_pci_core_device *virtvdev;
> + int ret;
> +
> + if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> + !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> + ops = &virtiovf_acc_vfio_pci_tran_ops;


This is still an issue for me, it's a very narrow use case where we
have a modern device and want to enable legacy support.  Implementing an
IO BAR and mangling the device ID seems like it should be an opt-in,
not standard behavior for any compatible device.  Users should
generally expect that the device they see in the host is the device
they see in the guest.  They might even rely on that principle.

We can't use the argument that users wanting the default device should
use vfio-pci rather than virtio-vfio-pci because we've already defined
the algorithm by which libvirt should choose a variant driver for a
device.  libvirt will choose this driver for all virtio-net devices.

This driver effectively has the option to expose two different profiles
for the device, native or transitional.  We've discussed profile
support for variant drivers previously as an equivalent functionality
to mdev types, but the only use case for this currently is out-of-tree.
I think this might be the opportunity to define how device profiles are
exposed and selected in a variant driver.

Jason had previously suggested a devlink interface for this, but I
understand that path had been shot down by devlink developers.  Another
obvious option is sysfs, where we might imagine an optional "profiles"
directory, perhaps under vfio-dev.  Attributes of "available" and
"current" could allow discovery and selection of a profile similar to
mdev types.

Is this where we should head with this or are there other options to
confine this transitional behavior?

BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?

> +
> + virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev,
> +  &pdev->dev, ops);
> + if (IS_ERR(virtvdev))
> + return PTR_ERR(virtvdev);
> +
> + dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
> + ret = vfio_pci_core_register_device(&virtvdev->core_device);
> + if (ret)
> + goto out;
> + return 0;
> +out:
> + vfio_put_device(&virtvdev->core_device.vdev);
> + return ret;
> +}
> +
> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> +{
> + struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev);
> +
> + vfio_pci_core_unregister_device(&virtvdev->core_device);
> + vfio_put_device(&virtvdev->core_device.vdev);
> +}
> +
> +static const struct pci_device_id virtiovf_pci_table[] = {
> + /* Only virtio-net is supported/tested so far */
> + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 
> 0x1041) },
> + {}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> +
> +static struct pci_driver virtiovf_pci_driver = {
> + .name = KBUILD_MODNAME,
> + .id_table = virtiovf_pci_table,
> + .probe = virtiovf_pci_probe,
> + .remove = virtiovf_pci_remove,
> + .err_handler = &vfio_pci_core_err_handlers,
> + .driver_managed_dma = true,
> +};
> +
> +module_pci_driver(virtiovf_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Yishai Hadas ");
> +MODULE_DESCRIPTION(
> + "VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device family");

Not yet "family" per the device table.  Thanks,

Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

2023-10-17 Thread Alexei Starovoitov
On Mon, Oct 16, 2023 at 7:38 PM Jason Wang  wrote:
>
> On Tue, Oct 17, 2023 at 7:53 AM Alexei Starovoitov
>  wrote:
> >
> > On Sun, Oct 15, 2023 at 10:10 AM Akihiko Odaki  
> > wrote:
> > >
> > > On 2023/10/16 1:07, Alexei Starovoitov wrote:
> > > > On Sun, Oct 15, 2023 at 7:17 AM Akihiko Odaki 
> > > >  wrote:
> > > >>
> > > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > >> index 0448700890f7..298634556fab 100644
> > > >> --- a/include/uapi/linux/bpf.h
> > > >> +++ b/include/uapi/linux/bpf.h
> > > >> @@ -988,6 +988,7 @@ enum bpf_prog_type {
> > > >>  BPF_PROG_TYPE_SK_LOOKUP,
> > > >>  BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls 
> > > >> */
> > > >>  BPF_PROG_TYPE_NETFILTER,
> > > >> +   BPF_PROG_TYPE_VNET_HASH,
> > > >
> > > > Sorry, we do not add new stable program types anymore.
> > > >
> > > >> @@ -6111,6 +6112,10 @@ struct __sk_buff {
> > > >>  __u8  tstamp_type;
> > > >>  __u32 :24;  /* Padding, future use. */
> > > >>  __u64 hwtstamp;
> > > >> +
> > > >> +   __u32 vnet_hash_value;
> > > >> +   __u16 vnet_hash_report;
> > > >> +   __u16 vnet_rss_queue;
> > > >>   };
> > > >
> > > > we also do not add anything to uapi __sk_buff.
> > > >
> > > >> +const struct bpf_verifier_ops vnet_hash_verifier_ops = {
> > > >> +   .get_func_proto = sk_filter_func_proto,
> > > >> +   .is_valid_access= sk_filter_is_valid_access,
> > > >> +   .convert_ctx_access = bpf_convert_ctx_access,
> > > >> +   .gen_ld_abs = bpf_gen_ld_abs,
> > > >> +};
> > > >
> > > > and we don't do ctx rewrites like this either.
> > > >
> > > > Please see how hid-bpf and cgroup rstat are hooking up bpf
> > > > in _unstable_ way.
> > >
> > > Can you describe what "stable" and "unstable" mean here? I'm new to BPF
> > > and I'm worried if it may mean the interface stability.
> > >
> > > Let me describe the context. QEMU bundles an eBPF program that is used
> > > for the "eBPF steering program" feature of tun. Now I'm proposing to
> > > extend the feature to allow to return some values to the userspace and
> > > vhost_net. As such, the extension needs to be done in a way that ensures
> > > interface stability.
> >
> > bpf is not an option then.
> > we do not add stable bpf program types or hooks any more.
>
> Does this mean eBPF could not be used for any new use cases other than
> the existing ones?

It means that any new use of bpf has to be unstable for the time being.

> > If a kernel subsystem wants to use bpf it needs to accept the fact
> > that such bpf extensibility will be unstable and subsystem maintainers
> > can decide to remove such bpf support in the future.
>
> I don't see how it is different from the existing ones.

Can we remove BPF_CGROUP_RUN_PROG_INET_INGRESS hook along
with BPF_PROG_TYPE_CGROUP_SKB program type?
Obviously not.
We can refactor it. We can move it around, but not remove.
That's the difference in stable vs unstable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio-sound: release control request clarification

2023-10-17 Thread Manos Pitsidianakis
On Tue, 17 Oct 2023 at 18:19, Matias Ezequiel Vara Larsen
 wrote:
>
> Hello,
>
> This email is to clarify the VirtIO specification regarding the RELEASE
> control request. Section 5.14.6.6.5.1 [1] states the following device
> requirements for the RELEASE control request:
> 1. The device MUST complete all pending I/O messages for the specified
> stream ID.
> 2. The device MUST NOT complete the control request while there are
> pending I/O messages for the specified stream ID.
>
> The 1) requirement does not indicate what "complete" means. Does it mean
> that the pending I/O messages in the tx queue shall be outputted in the
> host, i.e., consumed by the audio backend? Or, completion means simply
> to put the requests in the used-ring without consuming them?

It means the latter. At no point is the host's consumption of audio
data mentioned except for implicit or explicit period notifications.

>
> Regarding 2), I interpret it as "the device shall wait until all I/O
> messages are proceeded to complete the RELEASE control request".

Possible state transitions to RELEASE state are from PREPARE and STOP,
which neither are associated with active I/O in the streams.
The correct interpretation is "Do not reply to the control request if
you have pending I/O messages for this stream ID".

> Currently, the kernel driver seems not expecting such a delay when the
> RELEASE command is sent. If I understand correctly, the kernel driver
> first sends the RELEASE command and waits a fixed amount of time until
> the device can process it. Then, the driver waits a fixed amount of time
> until all pending IO messages are completed. If the device follows the
> specification and waits until all messages IO are completed to issue the
> completion of the RELEASE command, the kernel driver may timeout. The
> time to complete N IO messages in the TX queue could be proportional
> with the number of pending messages.
>
> In our device implementation [2], RELEASE is handled as follows:
> - Drop all messages in the TX queue without outputting in the host.
> - Complete the RELEASE control request.
>
> This seems to be working, however, I can observe that sometimes there
> are still requests in the TX queue when we get RELEASE. Those requests
> are never reproduced in the host.

My guess is this is because of the guest alsa doing prebuffering, not
that the host is supposed to handle those I/O messages.

--
Manos
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


virtio-sound: release control request clarification

2023-10-17 Thread Matias Ezequiel Vara Larsen
Hello,

This email is to clarify the VirtIO specification regarding the RELEASE
control request. Section 5.14.6.6.5.1 [1] states the following device
requirements for the RELEASE control request: 
1. The device MUST complete all pending I/O messages for the specified
stream ID.
2. The device MUST NOT complete the control request while there are
pending I/O messages for the specified stream ID.

The 1) requirement does not indicate what "complete" means. Does it mean
that the pending I/O messages in the tx queue shall be outputted in the
host, i.e., consumed by the audio backend? Or, completion means simply
to put the requests in the used-ring without consuming them?

Regarding 2), I interpret it as "the device shall wait until all I/O
messages are proceeded to complete the RELEASE control request".

Currently, the kernel driver seems not expecting such a delay when the
RELEASE command is sent. If I understand correctly, the kernel driver
first sends the RELEASE command and waits a fixed amount of time until
the device can process it. Then, the driver waits a fixed amount of time
until all pending IO messages are completed. If the device follows the
specification and waits until all messages IO are completed to issue the
completion of the RELEASE command, the kernel driver may timeout. The
time to complete N IO messages in the TX queue could be proportional
with the number of pending messages.

In our device implementation [2], RELEASE is handled as follows:
- Drop all messages in the TX queue without outputting in the host.
- Complete the RELEASE control request.

This seems to be working, however, I can observe that sometimes there
are still requests in the TX queue when we get RELEASE. Those requests
are never reproduced in the host.

My questions are:
- In the specification, should we modify it to clarify that all pending
  IO messages in the device are discarded during RELEASE, that is, not
  output to the host, but signaled to the guest as completed?
- According to the specification, should the driver wait in RELEASE an
  amount of time proportional to the number of periods yet to be
  reproduced?

Thanks, Matias.

[1]
https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html
[2]
https://github.com/rust-vmm/vhost-device/tree/main/staging/vhost-device-sound

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] vdpa_sim: implement .reset_map support

2023-10-17 Thread Stefano Garzarella

On Fri, Oct 13, 2023 at 10:29:26AM -0700, Si-Wei Liu wrote:

Hi Stefano,

On 10/13/2023 2:22 AM, Stefano Garzarella wrote:

Hi Si-Wei,

On Fri, Oct 13, 2023 at 01:23:40AM -0700, Si-Wei Liu wrote:

RFC only. Not tested on vdpa-sim-blk with user virtual address.


I can test it, but what I should stress?
Great, thank you! As you see, my patch moved vhost_iotlb_reset out of 
vdpasim_reset for the sake of decoupling mapping from vdpa device 
reset. For hardware devices this decoupling makes sense as platform 
IOMMU already did it. But I'm not sure if there's something in the 
software device (esp. with vdpa-blk and the userspace library stack) 
that may have to rely on the current .reset behavior that clears the 
vhost_iotlb. So perhaps you can try to exercise every possible case 
involving blk device reset, and see if anything (related to mapping) 
breaks?


I just tried these steps without using a VM and the host kernel hangs
after adding the device:

[root@f38-vm-build ~]# modprobe virtio-vdpa
[root@f38-vm-build ~]# modprobe vdpa-sim-blk
[root@f38-vm-build ~]# vdpa dev add mgmtdev vdpasim_blk name blk0
[   35.284575][  T563] virtio_blk virtio6: 1/0/0 default/read/poll queues
[   35.286372][  T563] virtio_blk virtio6: [vdb] 262144 512-byte logical blocks 
(134 MB/128 MiB)
[   35.295271][  T564] vringh:

Reverting this patch (so building "vdpa/mlx5: implement .reset_map 
driver op") worked here.







Works fine with vdpa-sim-net which uses physical address to map.


Can you share your tests? so I'll try to do the same with blk.
Basically everything involving virtio device reset in the guest, e.g.  
reboot the VM, remove/unbind then reprobe/bind the virtio-net 
module/driver, then see if device I/O (which needs mapping properly) is 
still flowing as expected. And then everything else that could trigger 
QEMU's vhost_dev_start/stop paths ending up as passive vhos-vdpa 
backend reset, for e.g. link status change, suspend/hibernate, SVQ 
switch and live migration. I am not sure if vdpa-blk supports live 
migration through SVQ or not, if not you don't need to worry about.






This patch is based on top of [1].

[1] 
https://lore.kernel.org/virtualization/1696928580-7520-1-git-send-email-si-wei@oracle.com/


The series does not apply well on master or vhost tree.
Where should I apply it?

Sent the link through another email offline.


Received thanks!

Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V1 vfio 8/9] vfio/pci: Expose vfio_pci_iowrite/read##size()

2023-10-17 Thread Yishai Hadas via Virtualization
Expose vfio_pci_iowrite/read##size() to let it be used by drivers.

This functionality is needed to enable direct access to some physical
BAR of the device with the proper locks/checks in place.

The next patches from this series will use this functionality on a data
path flow when a direct access to the BAR is needed.

Signed-off-by: Yishai Hadas 
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 10 ++
 include/linux/vfio_pci_core.h| 19 +++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 6f08b3ecbb89..817ec9a89123 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -38,7 +38,7 @@
 #define vfio_iowrite8  iowrite8
 
 #define VFIO_IOWRITE(size) \
-static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,   
\
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,  \
bool test_mem, u##size val, void __iomem *io)   \
 {  \
if (test_mem) { \
@@ -55,7 +55,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device 
*vdev,  \
up_read(&vdev->memory_lock);\
\
return 0;   \
-}
+}  \
+EXPORT_SYMBOL_GPL(vfio_pci_iowrite##size);
 
 VFIO_IOWRITE(8)
 VFIO_IOWRITE(16)
@@ -65,7 +66,7 @@ VFIO_IOWRITE(64)
 #endif
 
 #define VFIO_IOREAD(size) \
-static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,
\
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,   \
bool test_mem, u##size *val, void __iomem *io)  \
 {  \
if (test_mem) { \
@@ -82,7 +83,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device 
*vdev,   \
up_read(&vdev->memory_lock);\
\
return 0;   \
-}
+}  \
+EXPORT_SYMBOL_GPL(vfio_pci_ioread##size);
 
 VFIO_IOREAD(8)
 VFIO_IOREAD(16)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 67ac58e20e1d..22c915317788 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -131,4 +131,23 @@ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device 
*vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state);
 
+#define VFIO_IOWRITE_DECLATION(size) \
+int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev,  \
+   bool test_mem, u##size val, void __iomem *io);
+
+VFIO_IOWRITE_DECLATION(8)
+VFIO_IOWRITE_DECLATION(16)
+VFIO_IOWRITE_DECLATION(32)
+#ifdef iowrite64
+VFIO_IOWRITE_DECLATION(64)
+#endif
+
+#define VFIO_IOREAD_DECLATION(size) \
+int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev,   \
+   bool test_mem, u##size *val, void __iomem *io);
+
+VFIO_IOREAD_DECLATION(8)
+VFIO_IOREAD_DECLATION(16)
+VFIO_IOREAD_DECLATION(32)
+
 #endif /* VFIO_PCI_CORE_H */
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-17 Thread Yishai Hadas via Virtualization
Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.

Background, from the virtio spec [1].

In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.


Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.

This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.

The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.

For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.

Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.

The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.

Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the admin command VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the
probing/init flow.

With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Signed-off-by: Yishai Hadas 
---
 MAINTAINERS  |   7 +
 drivers/vfio/pci/Kconfig |   2 +
 drivers/vfio/pci/Makefile|   2 +
 drivers/vfio/pci/virtio/Kconfig  |  15 +
 drivers/vfio/pci/virtio/Makefile |   4 +
 drivers/vfio/pci/virtio/main.c   | 577 +++
 6 files changed, 607 insertions(+)
 create mode 100644 drivers/vfio/pci/virtio/Kconfig
 create mode 100644 drivers/vfio/pci/virtio/Makefile
 create mode 100644 drivers/vfio/pci/virtio/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7a7bd8bd80e9..680a70063775 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22620,6 +22620,13 @@ L: k...@vger.kernel.org
 S: Maintained
 F: drivers/vfio/pci/mlx5/
 
+VFIO VIRTIO PCI DRIVER
+M: Yishai Hadas 
+L: k...@vger.kernel.org
+L: virtualization@lists.linux-foundation.org
+S: Maintained
+F: drivers/vfio/pci/virtio
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R: Jason Gunthorpe 
 R: Yishai Hadas 
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..18c397df566d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
 
 source "drivers/vfio/pci/pds/Kconfig"
 
+source "drivers/vfio/pci/virtio/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..046139a4eca5 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)   += mlx5/
 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
 
 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index ..89eddce8b1bd
--- /dev/null
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VIRTIO_VFIO_PCI
+trista

[PATCH V1 vfio 7/9] vfio/pci: Expose vfio_pci_core_setup_barmap()

2023-10-17 Thread Yishai Hadas via Virtualization
Expose vfio_pci_core_setup_barmap() to be used by drivers.

This will let drivers to mmap a BAR and re-use it from both vfio and the
driver when it's applicable.

This API will be used in the next patches by the vfio/virtio coming
driver.

Signed-off-by: Yishai Hadas 
---
 drivers/vfio/pci/vfio_pci_core.c | 25 +
 drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--
 include/linux/vfio_pci_core.h|  1 +
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..ebea39836dd9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device 
*vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
 
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
+{
+   struct pci_dev *pdev = vdev->pdev;
+   void __iomem *io;
+   int ret;
+
+   if (vdev->barmap[bar])
+   return 0;
+
+   ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+   if (ret)
+   return ret;
+
+   io = pci_iomap(pdev, bar, 0);
+   if (!io) {
+   pci_release_selected_regions(pdev, 1 << bar);
+   return -ENOMEM;
+   }
+
+   vdev->barmap[bar] = io;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
+
 void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 {
struct vfio_pci_core_device *vdev =
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe..6f08b3ecbb89 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, 
bool test_mem,
return done;
 }
 
-static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
-{
-   struct pci_dev *pdev = vdev->pdev;
-   int ret;
-   void __iomem *io;
-
-   if (vdev->barmap[bar])
-   return 0;
-
-   ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
-   if (ret)
-   return ret;
-
-   io = pci_iomap(pdev, bar, 0);
-   if (!io) {
-   pci_release_selected_regions(pdev, 1 << bar);
-   return -ENOMEM;
-   }
-
-   vdev->barmap[bar] = io;
-
-   return 0;
-}
-
 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
 {
@@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, 
char __user *buf,
}
x_end = end;
} else {
-   int ret = vfio_pci_setup_barmap(vdev, bar);
+   int ret = vfio_pci_core_setup_barmap(vdev, bar);
if (ret) {
done = ret;
goto out;
@@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, 
loff_t offset,
return -EINVAL;
 #endif
 
-   ret = vfio_pci_setup_barmap(vdev, bar);
+   ret = vfio_pci_core_setup_barmap(vdev, bar);
if (ret)
return ret;
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 562e8754869d..67ac58e20e1d 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char 
*buf);
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state);
 
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V1 vfio 6/9] virtio-pci: Introduce APIs to execute legacy IO admin commands

2023-10-17 Thread Yishai Hadas via Virtualization
Introduce APIs to execute legacy IO admin commands.

It includes: list_query/use, io_legacy_read/write,
io_legacy_notify_info.

Those APIs will be used by the next patches from this series.

Signed-off-by: Yishai Hadas 
---
 drivers/virtio/virtio_pci_common.c |  11 ++
 drivers/virtio/virtio_pci_common.h |   2 +
 drivers/virtio/virtio_pci_modern.c | 206 +
 include/linux/virtio_pci_admin.h   |  18 +++
 4 files changed, 237 insertions(+)
 create mode 100644 include/linux/virtio_pci_admin.h

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 6b4766d5abe6..212d68401d2c 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -645,6 +645,17 @@ static struct pci_driver virtio_pci_driver = {
.sriov_configure = virtio_pci_sriov_configure,
 };
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev)
+{
+   struct virtio_pci_device *pf_vp_dev;
+
+   pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver);
+   if (IS_ERR(pf_vp_dev))
+   return NULL;
+
+   return &pf_vp_dev->vdev;
+}
+
 module_pci_driver(virtio_pci_driver);
 
 MODULE_AUTHOR("Anthony Liguori ");
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index a21b9ba01a60..2785e61ed668 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -155,4 +155,6 @@ static inline void virtio_pci_legacy_remove(struct 
virtio_pci_device *vp_dev)
 int virtio_pci_modern_probe(struct virtio_pci_device *);
 void virtio_pci_modern_remove(struct virtio_pci_device *);
 
+struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev);
+
 #endif
diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index cc159a8e6c70..00b65e20b2f5 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -719,6 +719,212 @@ static void vp_modern_destroy_avq(struct virtio_device 
*vdev)
vp_dev->del_vq(&vp_dev->admin_vq.info);
 }
 
+/*
+ * virtio_pci_admin_list_query - Provides to driver list of commands
+ * supported for the PCI VF.
+ * @dev: VF pci_dev
+ * @buf: buffer to hold the returned list
+ * @buf_size: size of the given buffer
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+   struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+   struct virtio_admin_cmd cmd = {};
+   struct scatterlist result_sg;
+
+   if (!virtio_dev)
+   return -ENODEV;
+
+   sg_init_one(&result_sg, buf, buf_size);
+   cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY);
+   cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+   cmd.result_sg = &result_sg;
+
+   return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_list_query);
+
+/*
+ * virtio_pci_admin_list_use - Provides to device list of commands
+ * used for the PCI VF.
+ * @dev: VF pci_dev
+ * @buf: buffer which holds the list
+ * @buf_size: size of the given buffer
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+   struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+   struct virtio_admin_cmd cmd = {};
+   struct scatterlist data_sg;
+
+   if (!virtio_dev)
+   return -ENODEV;
+
+   sg_init_one(&data_sg, buf, buf_size);
+   cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE);
+   cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV);
+   cmd.data_sg = &data_sg;
+
+   return vp_modern_admin_cmd_exec(virtio_dev, &cmd);
+}
+EXPORT_SYMBOL_GPL(virtio_pci_admin_list_use);
+
+/*
+ * virtio_pci_admin_legacy_io_write - Write legacy registers of a member device
+ * @dev: VF pci_dev
+ * @opcode: op code of the io write command
+ * @offset: starting byte offset within the registers to write to
+ * @size: size of the data to write
+ * @buf: buffer which holds the data
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode,
+u8 offset, u8 size, u8 *buf)
+{
+   struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+   struct virtio_admin_cmd_legacy_wr_data *data;
+   struct virtio_admin_cmd cmd = {};
+   struct scatterlist data_sg;
+   int vf_id;
+   int ret;
+
+   if (!virtio_dev)
+   return -ENODEV;
+
+   vf_id = pci_iov_vf_id(pdev);
+   if (vf_id < 0)
+   return vf_id;
+
+   data = kzalloc(sizeof(*data) + size, GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   data->offset = offset;
+   memcpy(data->registers, buf, size);
+   sg_init_one(&data_sg, data, sizeof(*data) + size);
+   cmd.opcode = c

[PATCH V1 vfio 5/9] virtio-pci: Introduce admin commands

2023-10-17 Thread Yishai Hadas via Virtualization
From: Feng Liu 

Introduces admin commands, as follow:

The "list query" command can be used by the driver to query the
set of admin commands supported by the virtio device.
The "list use" command is used to inform the virtio device which
admin commands the driver will use.
The "legacy common cfg rd/wr" commands are used to read from/write
into the legacy common configuration structure.
The "legacy dev cfg rd/wr" commands are used to read from/write
into the legacy device configuration structure.
The "notify info" command is used to query the notification region
information.

Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Signed-off-by: Yishai Hadas 
---
 include/uapi/linux/virtio_pci.h | 44 +
 1 file changed, 44 insertions(+)

diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 68eacc9676dc..6e42c211fc08 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -210,6 +210,23 @@ struct virtio_pci_cfg_cap {
 /* Admin command status. */
 #define VIRTIO_ADMIN_STATUS_OK 0
 
+/* Admin command opcode. */
+#define VIRTIO_ADMIN_CMD_LIST_QUERY0x0
+#define VIRTIO_ADMIN_CMD_LIST_USE  0x1
+
+/* Admin command group type. */
+#define VIRTIO_ADMIN_GROUP_TYPE_SRIOV  0x1
+
+/* Transitional device admin command. */
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE   0x2
+#define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ0x3
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE  0x4
+#define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ   0x5
+#define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO0x6
+
+/* Increment MAX_OPCODE to next value when new opcode is added */
+#define VIRTIO_ADMIN_MAX_CMD_OPCODE0x6
+
 struct __packed virtio_admin_cmd_hdr {
__le16 opcode;
/*
@@ -229,4 +246,31 @@ struct __packed virtio_admin_cmd_status {
__u8 reserved2[4];
 };
 
+struct __packed virtio_admin_cmd_legacy_wr_data {
+   __u8 offset; /* Starting offset of the register(s) to write. */
+   __u8 reserved[7];
+   __u8 registers[];
+};
+
+struct __packed virtio_admin_cmd_legacy_rd_data {
+   __u8 offset; /* Starting offset of the register(s) to read. */
+};
+
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1
+#define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2
+
+#define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4
+
+struct __packed virtio_admin_cmd_notify_info_data {
+   __u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */
+   __u8 bar; /* BAR of the member or the owner device */
+   __u8 padding[6];
+   __le64 offset; /* Offset within bar. */
+};
+
+struct virtio_admin_cmd_notify_info_result {
+   struct virtio_admin_cmd_notify_info_data 
entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO];
+};
+
 #endif
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V1 vfio 3/9] virtio-pci: Introduce admin virtqueue

2023-10-17 Thread Yishai Hadas via Virtualization
From: Feng Liu 

Introduce support for the admin virtqueue. By negotiating
VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
administration virtqueue. Administration virtqueue implementation in
virtio pci generic layer, enables multiple types of upper layer
drivers such as vfio, net, blk to utilize it.

Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Signed-off-by: Yishai Hadas 
---
 drivers/virtio/virtio.c| 37 ++--
 drivers/virtio/virtio_pci_common.c |  3 ++
 drivers/virtio/virtio_pci_common.h | 15 ++-
 drivers/virtio/virtio_pci_modern.c | 61 +-
 drivers/virtio/virtio_pci_modern_dev.c | 18 
 include/linux/virtio_config.h  |  4 ++
 include/linux/virtio_pci_modern.h  |  5 +++
 7 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..f4080692b351 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
if (err)
goto err;
 
+   if (dev->config->create_avq) {
+   err = dev->config->create_avq(dev);
+   if (err)
+   goto err;
+   }
+
err = drv->probe(dev);
if (err)
-   goto err;
+   goto err_probe;
 
/* If probe didn't do it, mark device DRIVER_OK ourselves. */
if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
virtio_config_enable(dev);
 
return 0;
+
+err_probe:
+   if (dev->config->destroy_avq)
+   dev->config->destroy_avq(dev);
 err:
virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
return err;
@@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
 
drv->remove(dev);
 
+   if (dev->config->destroy_avq)
+   dev->config->destroy_avq(dev);
+
/* Driver should have reset device. */
WARN_ON_ONCE(dev->config->get_status(dev));
 
@@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
 int virtio_device_freeze(struct virtio_device *dev)
 {
struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+   int ret;
 
virtio_config_disable(dev);
 
dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
 
-   if (drv && drv->freeze)
-   return drv->freeze(dev);
+   if (drv && drv->freeze) {
+   ret = drv->freeze(dev);
+   if (ret)
+   return ret;
+   }
+
+   if (dev->config->destroy_avq)
+   dev->config->destroy_avq(dev);
 
return 0;
 }
@@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
if (ret)
goto err;
 
+   if (dev->config->create_avq) {
+   ret = dev->config->create_avq(dev);
+   if (ret)
+   goto err;
+   }
+
if (drv->restore) {
ret = drv->restore(dev);
if (ret)
-   goto err;
+   goto err_restore;
}
 
/* If restore didn't do it, mark device DRIVER_OK ourselves. */
@@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
 
return 0;
 
+err_restore:
+   if (dev->config->destroy_avq)
+   dev->config->destroy_avq(dev);
 err:
virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
return ret;
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index c2524a7207cf..6b4766d5abe6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -236,6 +236,9 @@ void vp_del_vqs(struct virtio_device *vdev)
int i;
 
list_for_each_entry_safe(vq, n, &vdev->vqs, list) {
+   if (vp_dev->is_avq(vdev, vq->index))
+   continue;
+
if (vp_dev->per_vq_vectors) {
int v = vp_dev->vqs[vq->index]->msix_vector;
 
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index 4b773bd7c58c..e03af0966a4b 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
unsigned int msix_vector;
 };
 
+struct virtio_pci_admin_vq {
+   /* Virtqueue info associated with this admin queue. */
+   struct virtio_pci_vq_info info;
+   /* Name of the admin queue: avq.$index. */
+   char name[10];
+   u16 vq_index;
+};
+
 /* Our device structure */
 struct virtio_pci_device {
struct virtio_device vdev;
@@ -58,9 +66,13 @@ struct virtio_pci_device {
spinlock_t lock;
struct list_head virtqueues;
 
-   /* array of all queues for house-keeping */
+   /* Array of all virtqueues report

[PATCH V1 vfio 4/9] virtio-pci: Introduce admin command sending function

2023-10-17 Thread Yishai Hadas via Virtualization
From: Feng Liu 

Add support for sending admin command through admin virtqueue interface.
Abort any inflight admin commands once device reset completes.

To enforce the below statement from the specification [1], the admin
queue is activated for the upper layer users only post of setting status
to DRIVER_OK.

[1] The driver MUST NOT send any buffer available notifications to the
device before setting DRIVER_OK.

Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Signed-off-by: Yishai Hadas 
---
 drivers/virtio/virtio_pci_common.h |   3 +
 drivers/virtio/virtio_pci_modern.c | 174 +
 include/linux/virtio.h |   8 ++
 include/uapi/linux/virtio_pci.h|  22 
 4 files changed, 207 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index e03af0966a4b..a21b9ba01a60 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -44,9 +44,12 @@ struct virtio_pci_vq_info {
 struct virtio_pci_admin_vq {
/* Virtqueue info associated with this admin queue. */
struct virtio_pci_vq_info info;
+   struct completion flush_done;
+   refcount_t refcount;
/* Name of the admin queue: avq.$index. */
char name[10];
u16 vq_index;
+   bool abort;
 };
 
 /* Our device structure */
diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 01c5ba346471..cc159a8e6c70 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -36,6 +36,58 @@ static bool vp_is_avq(struct virtio_device *vdev, unsigned 
int index)
return index == vp_dev->admin_vq.vq_index;
 }
 
+static bool vp_modern_avq_get(struct virtio_pci_admin_vq *admin_vq)
+{
+   return refcount_inc_not_zero(&admin_vq->refcount);
+}
+
+static void vp_modern_avq_put(struct virtio_pci_admin_vq *admin_vq)
+{
+   if (refcount_dec_and_test(&admin_vq->refcount))
+   complete(&admin_vq->flush_done);
+}
+
+static bool vp_modern_avq_is_abort(const struct virtio_pci_admin_vq *admin_vq)
+{
+   return READ_ONCE(admin_vq->abort);
+}
+
+static void
+vp_modern_avq_set_abort(struct virtio_pci_admin_vq *admin_vq, bool abort)
+{
+   /* Mark the AVQ to abort, so that inflight commands can be aborted. */
+   WRITE_ONCE(admin_vq->abort, abort);
+}
+
+static void vp_modern_avq_activate(struct virtio_device *vdev)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
+
+   if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+   return;
+
+   init_completion(&admin_vq->flush_done);
+   refcount_set(&admin_vq->refcount, 1);
+   vp_modern_avq_set_abort(admin_vq, false);
+}
+
+static void vp_modern_avq_deactivate(struct virtio_device *vdev)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq;
+
+   if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ))
+   return;
+
+   vp_modern_avq_set_abort(admin_vq, true);
+   /* Balance with refcount_set() during vp_modern_avq_activate */
+   vp_modern_avq_put(admin_vq);
+
+   /* Wait for all the inflight admin commands to be aborted */
+   wait_for_completion(&vp_dev->admin_vq.flush_done);
+}
+
 static void vp_transport_features(struct virtio_device *vdev, u64 features)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
@@ -172,6 +224,8 @@ static void vp_set_status(struct virtio_device *vdev, u8 
status)
/* We should never be setting status to 0. */
BUG_ON(status == 0);
vp_modern_set_status(&vp_dev->mdev, status);
+   if (status & VIRTIO_CONFIG_S_DRIVER_OK)
+   vp_modern_avq_activate(vdev);
 }
 
 static void vp_reset(struct virtio_device *vdev)
@@ -188,6 +242,9 @@ static void vp_reset(struct virtio_device *vdev)
 */
while (vp_modern_get_status(mdev))
msleep(1);
+
+   vp_modern_avq_deactivate(vdev);
+
/* Flush pending VQ/configuration callbacks. */
vp_synchronize_vectors(vdev);
 }
@@ -505,6 +562,121 @@ static bool vp_get_shm_region(struct virtio_device *vdev,
return true;
 }
 
+static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
+   struct scatterlist **sgs,
+   unsigned int out_num,
+   unsigned int in_num,
+   void *data,
+   gfp_t gfp)
+{
+   struct virtqueue *vq;
+   int ret, len;
+
+   if (!vp_modern_avq_get(admin_vq))
+   return -EIO;
+
+   vq = admin_vq->info.vq;
+
+   ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, gfp);
+   if (ret < 0)
+   goto out;
+
+   if (unlikely(!virtqueue_kick(vq))) {
+   ret = -EIO;
+  

[PATCH V1 vfio 1/9] virtio-pci: Fix common config map for modern device

2023-10-17 Thread Yishai Hadas via Virtualization
From: Feng Liu 

Currently vp_modern_probe() missed out to map config space structure
starting from notify_data offset. Due to this when such structure
elements are accessed it can result in an error.

Fix it by considering the minimum size of what device has offered and
what driver will access.

Fixes: ea024594b1dc ("virtio_pci: struct virtio_pci_common_cfg add 
queue_notify_data")
Fixes: 0cdd450e7051 ("virtio_pci: struct virtio_pci_common_cfg add queue_reset")
Signed-off-by: Feng Liu 
Reported-by: Michael S . Tsirkin 
Closes: 
https://lkml.kernel.org/kvm/20230927172553-mutt-send-email-...@kernel.org/
Reviewed-by: Parav Pandit 
Signed-off-by: Yishai Hadas 
---
 drivers/virtio/virtio_pci_modern_dev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
b/drivers/virtio/virtio_pci_modern_dev.c
index aad7d9296e77..7fa70d7c8146 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -290,9 +290,9 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
 
err = -EINVAL;
mdev->common = vp_modern_map_capability(mdev, common,
- sizeof(struct virtio_pci_common_cfg), 4,
- 0, sizeof(struct virtio_pci_common_cfg),
- NULL, NULL);
+   sizeof(struct virtio_pci_common_cfg), 4,
+   0, sizeof(struct virtio_pci_modern_common_cfg),
+   NULL, NULL);
if (!mdev->common)
goto err_map_common;
mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V1 vfio 2/9] virtio: Define feature bit for administration virtqueue

2023-10-17 Thread Yishai Hadas via Virtualization
From: Feng Liu 

Introduce VIRTIO_F_ADMIN_VQ which is used for administration virtqueue
support.

Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Signed-off-by: Yishai Hadas 
---
 include/uapi/linux/virtio_config.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 2c712c654165..09d694968b14 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START   28
-#define VIRTIO_TRANSPORT_F_END 41
+#define VIRTIO_TRANSPORT_F_END 42
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -109,4 +109,10 @@
  * This feature indicates that the driver can reset a queue individually.
  */
 #define VIRTIO_F_RING_RESET40
+
+/*
+ * This feature indicates that the device support administration virtqueues.
+ */
+#define VIRTIO_F_ADMIN_VQ  41
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V1 vfio 0/9] Introduce a vfio driver over virtio devices

2023-10-17 Thread Yishai Hadas via Virtualization
This series introduce a vfio driver over virtio devices to support the
legacy interface functionality for VFs.

Background, from the virtio spec [1].

In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.


The first 6 patches are in the virtio area and handle the below:
- Fix common config map for modern device as was reported by Michael Tsirkin.
- Introduce the admin virtqueue infrastcture.
- Expose the layout of the commands that should be used for
  supporting the legacy access.
- Expose APIs to enable upper layers as of vfio, net, etc
  to execute admin commands.

The above follows the virtio spec that was lastly accepted in that area
[1].

The last 3 patches are in the vfio area and handle the below:
- Expose some APIs from vfio/pci to be used by the vfio/virtio driver.
- Introduce a vfio driver over virtio devices to support the legacy
  interface functionality for VFs. 

The series was tested successfully over virtio-net VFs in the host,
while running in the guest both modern and legacy drivers.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Changes from V0: 
https://www.spinics.net/lists/linux-virtualization/msg63802.html

Virtio:
- Fix the common config map size issue that was reported by Michael
  Tsirkin.
- Do not use vp_dev->vqs[] array upon vp_del_vqs() as was asked by
  Michael, instead skip the AQ specifically.
- Move admin vq implementation into virtio_pci_modern.c as was asked by
  Michael.
- Rename structure virtio_avq to virtio_pci_admin_vq and some extra
  corresponding renames.
- Remove exported symbols virtio_pci_vf_get_pf_dev(),
  virtio_admin_cmd_exec() as now callers are local to the module.
- Handle inflight commands as part of the device reset flow.
- Introduce APIs per admin command in virtio-pci as was asked by Michael.

Vfio:
- Change to use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL for
  vfio_pci_core_setup_barmap() and vfio_pci_iowrite#xxx() as pointed by
  Alex.
- Drop the intermediate patch which prepares the commands and calls the
  generic virtio admin command API (i.e. virtio_admin_cmd_exec()).
- Instead, call directly to the new APIs per admin command that are
  exported from Virtio - based on Michael's request.
- Enable only virtio-net as part of the pci_device_id table to enforce
  upon binding only what is supported as suggested by Alex.
- Add support for byte-wise access (read/write) over the device config
  region as was asked by Alex.
- Consider whether MSIX is practically enabled/disabled to choose the
  right opcode upon issuing read/write admin command, as mentioned
  by Michael.
- Move to use VIRTIO_PCI_CONFIG_OFF instead of adding some new defines
  as was suggested by Michael.
- Set the '.close_device' op to vfio_pci_core_close_device() as was
  pointed by Alex.
- Adapt to Vfio multi-line comment style in a few places.
- Add virtualization@lists.linux-foundation.org in the MAINTAINERS file
  to be CCed for the new driver as was suggested by Jason.

Yishai

Feng Liu (5):
  virtio-pci: Fix common config map for modern device
  virtio: Define feature bit for administration virtqueue
  virtio-pci: Introduce admin virtqueue
  virtio-pci: Introduce admin command sending function
  virtio-pci: Introduce admin commands

Yishai Hadas (4):
  virtio-pci: Introduce APIs to execute legacy IO admin commands
  vfio/pci: Expose vfio_pci_core_setup_barmap()
  vfio/pci: Expose vfio_pci_iowrite/read##size()
  vfio/virtio: Introduce a vfio driver over virtio devices

 MAINTAINERS|   7 +
 drivers/vfio/pci/Kconfig   |   2 +
 drivers/vfio/pci/Makefile  |   2 +
 drivers/vfio/pci/vfio_pci_core.c   |  25 ++
 drivers/vfio/pci/vfio_pci_rdwr.c   |  38 +-
 drivers/vfio/pci/virtio/Kconfig|  15 +
 drivers/vfio/pci/virtio/Makefile   |   4 +
 drivers/vfio/pci/virtio/main.c | 577 +
 drivers/virtio/virtio.c|  37 +-
 drivers/virtio/virtio_pci_common.c |  14 +
 drivers/virtio/virtio_pci_common.h |  20

Re: [RFC PATCH] ALSA: virtio: use copy and fill_silence callbacks

2023-10-17 Thread Stefano Garzarella

On Thu, Oct 12, 2023 at 11:16:54AM -0400, Michael S. Tsirkin wrote:

On Thu, Oct 12, 2023 at 05:10:50PM +0200, Matias Ezequiel Vara Larsen wrote:

This commit replaces the mmap mechanism with the copy() and
fill_silence() callbacks for both capturing and playback for the
virtio-sound driver. This change is required to prevent the updating of
the content of a buffer that is already in the available ring.

The current mechanism splits a dma buffer into descriptors that are
exposed to the device. This dma buffer is shared with the user
application. When the device consumes a buffer, the driver moves the
request from the used ring to available ring.

The driver exposes the buffer to the device without knowing if the
content has been updated from the user. The section 2.8.21.1 of the
virtio spec states that: "The device MAY access the descriptor chains
the driver created and the memory they refer to immediately". If the
device picks up buffers from the available ring just after it is
notified, it happens that the content may be old.

By providing the copy() callback, the driver first updates the content
of the buffer, and then, exposes the buffer to the device by enqueuing
it in the available ring. Thus, device always picks up a buffer that is
updated.

For capturing, the driver starts by exposing all the available buffers
to device. After device updates the content of a buffer, it enqueues it
in the used ring. It is only after the copy() for capturing is issued
that the driver re-enqueues the buffer in the available ring.

Note that the copy() function assumes that user is always writing a
period. Testing shows that this is true but I may be wrong. This RFC
aims at clarifying this.

Signed-off-by: Matias Ezequiel Vara Larsen 



Thank you for working on this!


Yep, +1!

@Michael do you think we should cc stable and add a Fixes tag since
the driver is not following the virtio spec?

Or it is too risky?

IIUC snd_pcm_ops is changed a bit from previous versions, so we may have
to adapt the patch for stable branches.

Stefano




---
 sound/virtio/virtio_pcm.c | 11 ++--
 sound/virtio/virtio_pcm.h |  9 +++-
 sound/virtio/virtio_pcm_msg.c | 50 ---
 sound/virtio/virtio_pcm_ops.c | 94 +++
 4 files changed, 137 insertions(+), 27 deletions(-)

diff --git a/sound/virtio/virtio_pcm.c b/sound/virtio/virtio_pcm.c
index c10d91fff2fb..bfe982952303 100644
--- a/sound/virtio/virtio_pcm.c
+++ b/sound/virtio/virtio_pcm.c
@@ -104,8 +104,6 @@ static int virtsnd_pcm_build_hw(struct virtio_pcm_substream 
*vss,
 * only message-based transport.
 */
vss->hw.info =
-   SNDRV_PCM_INFO_MMAP |
-   SNDRV_PCM_INFO_MMAP_VALID |
SNDRV_PCM_INFO_BATCH |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_INTERLEAVED |
@@ -471,12 +469,11 @@ int virtsnd_pcm_build_devs(struct virtio_snd *snd)
for (kss = ks->substream; kss; kss = kss->next)
vs->substreams[kss->number]->substream = kss;

-   snd_pcm_set_ops(vpcm->pcm, i, &virtsnd_pcm_ops);
+   if (i == SNDRV_PCM_STREAM_CAPTURE)
+   snd_pcm_set_ops(vpcm->pcm, i, 
&virtsnd_pcm_capture_ops);
+   else
+   snd_pcm_set_ops(vpcm->pcm, i, 
&virtsnd_pcm_playback_ops);
}
-
-   snd_pcm_set_managed_buffer_all(vpcm->pcm,
-  SNDRV_DMA_TYPE_VMALLOC, NULL,
-  0, 0);
}

return 0;
diff --git a/sound/virtio/virtio_pcm.h b/sound/virtio/virtio_pcm.h
index 062eb8e8f2cf..1c1106ec971f 100644
--- a/sound/virtio/virtio_pcm.h
+++ b/sound/virtio/virtio_pcm.h
@@ -50,6 +50,8 @@ struct virtio_pcm_substream {
struct work_struct elapsed_period;
spinlock_t lock;
size_t buffer_bytes;
+   u8 *buffer;
+   size_t buffer_sz;
size_t hw_ptr;
bool xfer_enabled;
bool xfer_xrun;
@@ -90,7 +92,8 @@ struct virtio_pcm {
struct virtio_pcm_stream streams[SNDRV_PCM_STREAM_LAST + 1];
 };

-extern const struct snd_pcm_ops virtsnd_pcm_ops;
+extern const struct snd_pcm_ops virtsnd_pcm_playback_ops;
+extern const struct snd_pcm_ops virtsnd_pcm_capture_ops;

 int virtsnd_pcm_validate(struct virtio_device *vdev);

@@ -117,7 +120,9 @@ int virtsnd_pcm_msg_alloc(struct virtio_pcm_substream *vss,

 void virtsnd_pcm_msg_free(struct virtio_pcm_substream *vss);

-int virtsnd_pcm_msg_send(struct virtio_pcm_substream *vss);
+int virtsnd_pcm_msg_send(struct virtio_pcm_substream *vss, bool single);
+
+int virtsnd_pcm_msg_send_locked(struct virtio_pcm_substream *vss, bool single);

 unsigned int virtsnd_pcm_msg_pending_num(struct virtio_pcm_substream *vss);

diff --git a/sound/virtio/virtio_pcm_msg.c b/sound/virtio/virtio_pcm_msg.c
index aca2dc1989ba..9a5f9814cb6

Re: [RFC PATCH] ALSA: virtio: use copy and fill_silence callbacks

2023-10-17 Thread Matias Ezequiel Vara Larsen
Hello Anton, 

Thanks for your help! I am going to send a second version of the patch
with your changes. Is it OK if I add you with the "Co-developed-by"
tag?.

Matias

On Tue, Oct 17, 2023 at 05:11:30PM +0900, Anton Yakovlev wrote:
> Hi Matias,
> 
> Thanks for your help! I updated and corrected your patch a little bit (see
> attachment). All changes were tested, there were no problems on my side.
> 
> See also a few inline comments.
> 
> 
> 
> On 13.10.2023 00:10, Matias Ezequiel Vara Larsen wrote:
> > This commit replaces the mmap mechanism with the copy() and
> > fill_silence() callbacks for both capturing and playback for the
> > virtio-sound driver. This change is required to prevent the updating of
> > the content of a buffer that is already in the available ring.
> > 
> > The current mechanism splits a dma buffer into descriptors that are
> > exposed to the device. This dma buffer is shared with the user
> > application. When the device consumes a buffer, the driver moves the
> > request from the used ring to available ring.
> > 
> > The driver exposes the buffer to the device without knowing if the
> > content has been updated from the user. The section 2.8.21.1 of the
> > virtio spec states that: "The device MAY access the descriptor chains
> > the driver created and the memory they refer to immediately". If the
> > device picks up buffers from the available ring just after it is
> > notified, it happens that the content may be old.
> > 
> > By providing the copy() callback, the driver first updates the content
> > of the buffer, and then, exposes the buffer to the device by enqueuing
> > it in the available ring. Thus, device always picks up a buffer that is
> > updated.
> > 
> > For capturing, the driver starts by exposing all the available buffers
> > to device. After device updates the content of a buffer, it enqueues it
> > in the used ring. It is only after the copy() for capturing is issued
> > that the driver re-enqueues the buffer in the available ring.
> > 
> > Note that the copy() function assumes that user is always writing a
> > period. Testing shows that this is true but I may be wrong. This RFC
> > aims at clarifying this.
> > 
> > Signed-off-by: Matias Ezequiel Vara Larsen 
> > ---
> >   sound/virtio/virtio_pcm.c | 11 ++--
> >   sound/virtio/virtio_pcm.h |  9 +++-
> >   sound/virtio/virtio_pcm_msg.c | 50 ---
> >   sound/virtio/virtio_pcm_ops.c | 94 +++
> >   4 files changed, 137 insertions(+), 27 deletions(-)
> > 
> > diff --git a/sound/virtio/virtio_pcm.c b/sound/virtio/virtio_pcm.c
> > index c10d91fff2fb..bfe982952303 100644
> > --- a/sound/virtio/virtio_pcm.c
> > +++ b/sound/virtio/virtio_pcm.c
> > @@ -104,8 +104,6 @@ static int virtsnd_pcm_build_hw(struct 
> > virtio_pcm_substream *vss,
> >  * only message-based transport.
> >  */
> > vss->hw.info =
> > -   SNDRV_PCM_INFO_MMAP |
> > -   SNDRV_PCM_INFO_MMAP_VALID |
> > SNDRV_PCM_INFO_BATCH |
> > SNDRV_PCM_INFO_BLOCK_TRANSFER |
> > SNDRV_PCM_INFO_INTERLEAVED |
> 
> We need also necessary to disable rewinds, since now only sequential
> reading/writing of frames is supported.
> 
> 
> > @@ -471,12 +469,11 @@ int virtsnd_pcm_build_devs(struct virtio_snd *snd)
> > for (kss = ks->substream; kss; kss = kss->next)
> > vs->substreams[kss->number]->substream = kss;
> > -   snd_pcm_set_ops(vpcm->pcm, i, &virtsnd_pcm_ops);
> > +   if (i == SNDRV_PCM_STREAM_CAPTURE)
> > +   snd_pcm_set_ops(vpcm->pcm, i, 
> > &virtsnd_pcm_capture_ops);
> > +   else
> > +   snd_pcm_set_ops(vpcm->pcm, i, 
> > &virtsnd_pcm_playback_ops);
> > }
> > -
> > -   snd_pcm_set_managed_buffer_all(vpcm->pcm,
> > -  SNDRV_DMA_TYPE_VMALLOC, NULL,
> > -  0, 0);
> 
> It is not right. Buffer allocation/freeing is controlled by the kernel
> subsystem, so the driver doesn't have to worry about it.
> 
> 
> > }
> > return 0;
> > diff --git a/sound/virtio/virtio_pcm.h b/sound/virtio/virtio_pcm.h
> > index 062eb8e8f2cf..1c1106ec971f 100644
> > --- a/sound/virtio/virtio_pcm.h
> > +++ b/sound/virtio/virtio_pcm.h
> > @@ -50,6 +50,8 @@ struct virtio_pcm_substream {
> > struct work_struct elapsed_period;
> > spinlock_t lock;
> > size_t buffer_bytes;
> > +   u8 *buffer;
> > +   size_t buffer_sz;
> > size_t hw_ptr;
> > bool xfer_enabled;
> > bool xfer_xrun;
> > @@ -90,7 +92,8 @@ struct virtio_pcm {
> > struct virtio_pcm_stream streams[SNDRV_PCM_STREAM_LAST + 1];
> >   };
> > -extern const struct snd_pcm_ops virtsnd_pcm_ops;
> > +extern const struct snd_pcm_ops virtsnd_pcm_playback_ops;
> > +extern const struct snd_pcm_ops virtsnd_pcm_capture_ops;
> >   int virtsnd_pcm_validate(struct virtio_device 

Re: [RFC PATCH] ALSA: virtio: use copy and fill_silence callbacks

2023-10-17 Thread Matias Ezequiel Vara Larsen
On Tue, Oct 17, 2023 at 09:44:04AM +0200, Takashi Iwai wrote:
> On Mon, 16 Oct 2023 17:10:00 +0200,
> Stefan Hajnoczi wrote:
> > 
> > On Thu, Oct 12, 2023 at 05:10:50PM +0200, Matias Ezequiel Vara Larsen wrote:
> > > This commit replaces the mmap mechanism with the copy() and
> > > fill_silence() callbacks for both capturing and playback for the
> > > virtio-sound driver. This change is required to prevent the updating of
> > > the content of a buffer that is already in the available ring.
> > > 
> > > The current mechanism splits a dma buffer into descriptors that are
> > > exposed to the device. This dma buffer is shared with the user
> > > application. When the device consumes a buffer, the driver moves the
> > > request from the used ring to available ring.
> > > 
> > > The driver exposes the buffer to the device without knowing if the
> > > content has been updated from the user. The section 2.8.21.1 of the
> > > virtio spec states that: "The device MAY access the descriptor chains
> > > the driver created and the memory they refer to immediately". If the
> > > device picks up buffers from the available ring just after it is
> > > notified, it happens that the content may be old.
> > > 
> > > By providing the copy() callback, the driver first updates the content
> > > of the buffer, and then, exposes the buffer to the device by enqueuing
> > > it in the available ring. Thus, device always picks up a buffer that is
> > > updated.
> > > 
> > > For capturing, the driver starts by exposing all the available buffers
> > > to device. After device updates the content of a buffer, it enqueues it
> > > in the used ring. It is only after the copy() for capturing is issued
> > > that the driver re-enqueues the buffer in the available ring.
> > > 
> > > Note that the copy() function assumes that user is always writing a
> > > period. Testing shows that this is true but I may be wrong. This RFC
> > > aims at clarifying this.
> > 
> > This sounds like an ALSA driver API question.
> > 
> > Jaroslav and Takashi: Any thoughts?
> 
> The size of the transfer is determined solely by PCM appl_ptr and
> hw_ptr values.  The former is what applications update via the mmap
> commit or the normal read/write operations, while the latter is
> updated via PCM position() op.  So basically the granularity depends
> on the values returned from PCM position op.
> 
> 

Thanks for your comment, I'll send a second version without the
assumption that applications always update a single period.

Matias

> Takashi
> 
> > 
> > Stefan
> > 
> > > 
> > > Signed-off-by: Matias Ezequiel Vara Larsen 
> > > ---
> > >  sound/virtio/virtio_pcm.c | 11 ++--
> > >  sound/virtio/virtio_pcm.h |  9 +++-
> > >  sound/virtio/virtio_pcm_msg.c | 50 ---
> > >  sound/virtio/virtio_pcm_ops.c | 94 +++
> > >  4 files changed, 137 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/sound/virtio/virtio_pcm.c b/sound/virtio/virtio_pcm.c
> > > index c10d91fff2fb..bfe982952303 100644
> > > --- a/sound/virtio/virtio_pcm.c
> > > +++ b/sound/virtio/virtio_pcm.c
> > > @@ -104,8 +104,6 @@ static int virtsnd_pcm_build_hw(struct 
> > > virtio_pcm_substream *vss,
> > >* only message-based transport.
> > >*/
> > >   vss->hw.info =
> > > - SNDRV_PCM_INFO_MMAP |
> > > - SNDRV_PCM_INFO_MMAP_VALID |
> > >   SNDRV_PCM_INFO_BATCH |
> > >   SNDRV_PCM_INFO_BLOCK_TRANSFER |
> > >   SNDRV_PCM_INFO_INTERLEAVED |
> > > @@ -471,12 +469,11 @@ int virtsnd_pcm_build_devs(struct virtio_snd *snd)
> > >   for (kss = ks->substream; kss; kss = kss->next)
> > >   vs->substreams[kss->number]->substream = kss;
> > >  
> > > - snd_pcm_set_ops(vpcm->pcm, i, &virtsnd_pcm_ops);
> > > + if (i == SNDRV_PCM_STREAM_CAPTURE)
> > > + snd_pcm_set_ops(vpcm->pcm, i, 
> > > &virtsnd_pcm_capture_ops);
> > > + else
> > > + snd_pcm_set_ops(vpcm->pcm, i, 
> > > &virtsnd_pcm_playback_ops);
> > >   }
> > > -
> > > - snd_pcm_set_managed_buffer_all(vpcm->pcm,
> > > -SNDRV_DMA_TYPE_VMALLOC, NULL,
> > > -0, 0);
> > >   }
> > >  
> > >   return 0;
> > > diff --git a/sound/virtio/virtio_pcm.h b/sound/virtio/virtio_pcm.h
> > > index 062eb8e8f2cf..1c1106ec971f 100644
> > > --- a/sound/virtio/virtio_pcm.h
> > > +++ b/sound/virtio/virtio_pcm.h
> > > @@ -50,6 +50,8 @@ struct virtio_pcm_substream {
> > >   struct work_struct elapsed_period;
> > >   spinlock_t lock;
> > >   size_t buffer_bytes;
> > > + u8 *buffer;
> > > + size_t buffer_sz;
> > >   size_t hw_ptr;
> > >   bool xfer_enabled;
> > >   bool xfer_xrun;
> > > @@ -90,7 +92,8 @@ struct virtio_pcm {
> > >   struct virtio_pcm_stream streams[SNDRV_PCM_STREAM_LAST + 1];
> > >  };
> > >  
> > > -extern const struct snd_pcm_ops virtsnd_pcm_ops;
> >

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Xuan Zhuo
On Tue, 17 Oct 2023 14:43:33 +0800, Xuan Zhuo  
wrote:
> On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  wrote:
> > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  wrote:
> > > > >
> > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > >  wrote:
> > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > >  wrote:
> > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > ## AF_XDP
> > > > > > > > > >
> > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network 
> > > > > > > > > > framework. The zero
> > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by 
> > > > > > > > > > the driver. The
> > > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe 
> > > > > > > > > > already support
> > > > > > > > > > this feature, This patch set allows virtio-net to support 
> > > > > > > > > > xsk's zerocopy xmit
> > > > > > > > > > feature.
> > > > > > > > > >
> > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > >
> > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > >
> > > > > > > > > > So it is time for Virtio-Net to complete the support for 
> > > > > > > > > > the XDP Socket
> > > > > > > > > > Zerocopy.
> > > > > > > > > >
> > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk 
> > > > > > > > > > shares the queue with
> > > > > > > > > > kernel.
> > > > > > > > > >
> > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > interrupt from driver
> > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If 
> > > > > > > > > > the CPU run by TX
> > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on 
> > > > > > > > > > the remote CPU. If it
> > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > >
> > > > > > > > > > This patch set includes some refactor to the virtio-net to 
> > > > > > > > > > let that to support
> > > > > > > > > > AF_XDP.
> > > > > > > > > >
> > > > > > > > > > ## performance
> > > > > > > > > >
> > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > >
> > > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > > >
> > > > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > > > >
> > > > > > > > > > I write a tool that sends udp packets or recvs udp packets 
> > > > > > > > > > by AF_XDP.
> > > > > > > > > >
> > > > > > > > > >   | Guest APP CPU |Guest Softirq CPU | UDP 
> > > > > > > > > > PPS
> > > > > > > > > > --|---|--|
> > > > > > > > > > xmit by syscall   |   100%|  |   
> > > > > > > > > > 676,915
> > > > > > > > > > xmit by xsk   |   59.1%   |   100%   | 
> > > > > > > > > > 5,447,168
> > > > > > > > > > recv by syscall   |   60% |   100%   |   
> > > > > > > > > > 932,288
> > > > > > > > > > recv by xsk   |   35.7%   |   100%   | 
> > > > > > > > > > 3,343,168
> > > > > > > > >
> > > > > > > > > Any chance we can get a testpmd result (which I guess should 
> > > > > > > > > be better
> > > > > > > > > than PPS above)?
> > > > > > > >
> > > > > > > > Do you mean testpmd + DPDK + AF_XDP?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > >
> > > > > > > > Yes. This is probably better because my tool does more work. 
> > > > > > > > That is not a
> > > > > > > > complete testing tool used by our business.
> > > > > > >
> > > > > > > Probably, but it would be appealing for others. Especially 
> > > > > > > considering
> > > > > > > DPDK supports AF_XDP PMD now.
> > > > > >
> > > > > > OK.
> > > > > >
> > > > > > Let me try.
> > > > > >
> > > > > > But could you start to review firstly?
> > > > >
> > > > > Yes, it's in my todo list.
> > > >
> > > > Speaking too fast, I think if it doesn't take too long time, I would
> > > > wait for the result first as netdim series. One reason is that I
> > > > remember claims to be only 10% to 20% loss comparing to wire speed, so
> > > > I'd expect it should be much faster. I vaguely remember, even a vhost
> > > > can gives us more than 3M PPS if we disable SMAP, so the numbers here
> > > > are not as impressive as expected.
> > >
> > >
> > > What is SMAP? Cloud you give me more info?
> >
> > Supervisor Mode A

Re: [RFC PATCH] ALSA: virtio: use copy and fill_silence callbacks

2023-10-17 Thread Anton Yakovlev via Virtualization

Hi Matias,

Thanks for your help! I updated and corrected your patch a little bit (see
attachment). All changes were tested, there were no problems on my side.

See also a few inline comments.



On 13.10.2023 00:10, Matias Ezequiel Vara Larsen wrote:

This commit replaces the mmap mechanism with the copy() and
fill_silence() callbacks for both capturing and playback for the
virtio-sound driver. This change is required to prevent the updating of
the content of a buffer that is already in the available ring.

The current mechanism splits a dma buffer into descriptors that are
exposed to the device. This dma buffer is shared with the user
application. When the device consumes a buffer, the driver moves the
request from the used ring to available ring.

The driver exposes the buffer to the device without knowing if the
content has been updated from the user. The section 2.8.21.1 of the
virtio spec states that: "The device MAY access the descriptor chains
the driver created and the memory they refer to immediately". If the
device picks up buffers from the available ring just after it is
notified, it happens that the content may be old.

By providing the copy() callback, the driver first updates the content
of the buffer, and then, exposes the buffer to the device by enqueuing
it in the available ring. Thus, device always picks up a buffer that is
updated.

For capturing, the driver starts by exposing all the available buffers
to device. After device updates the content of a buffer, it enqueues it
in the used ring. It is only after the copy() for capturing is issued
that the driver re-enqueues the buffer in the available ring.

Note that the copy() function assumes that user is always writing a
period. Testing shows that this is true but I may be wrong. This RFC
aims at clarifying this.

Signed-off-by: Matias Ezequiel Vara Larsen 
---
  sound/virtio/virtio_pcm.c | 11 ++--
  sound/virtio/virtio_pcm.h |  9 +++-
  sound/virtio/virtio_pcm_msg.c | 50 ---
  sound/virtio/virtio_pcm_ops.c | 94 +++
  4 files changed, 137 insertions(+), 27 deletions(-)

diff --git a/sound/virtio/virtio_pcm.c b/sound/virtio/virtio_pcm.c
index c10d91fff2fb..bfe982952303 100644
--- a/sound/virtio/virtio_pcm.c
+++ b/sound/virtio/virtio_pcm.c
@@ -104,8 +104,6 @@ static int virtsnd_pcm_build_hw(struct virtio_pcm_substream 
*vss,
 * only message-based transport.
 */
vss->hw.info =
-   SNDRV_PCM_INFO_MMAP |
-   SNDRV_PCM_INFO_MMAP_VALID |
SNDRV_PCM_INFO_BATCH |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_INTERLEAVED |


We need also necessary to disable rewinds, since now only sequential
reading/writing of frames is supported.



@@ -471,12 +469,11 @@ int virtsnd_pcm_build_devs(struct virtio_snd *snd)
for (kss = ks->substream; kss; kss = kss->next)
vs->substreams[kss->number]->substream = kss;
  
-			snd_pcm_set_ops(vpcm->pcm, i, &virtsnd_pcm_ops);

+   if (i == SNDRV_PCM_STREAM_CAPTURE)
+   snd_pcm_set_ops(vpcm->pcm, i, 
&virtsnd_pcm_capture_ops);
+   else
+   snd_pcm_set_ops(vpcm->pcm, i, 
&virtsnd_pcm_playback_ops);
}
-
-   snd_pcm_set_managed_buffer_all(vpcm->pcm,
-  SNDRV_DMA_TYPE_VMALLOC, NULL,
-  0, 0);


It is not right. Buffer allocation/freeing is controlled by the kernel
subsystem, so the driver doesn't have to worry about it.



}
  
  	return 0;

diff --git a/sound/virtio/virtio_pcm.h b/sound/virtio/virtio_pcm.h
index 062eb8e8f2cf..1c1106ec971f 100644
--- a/sound/virtio/virtio_pcm.h
+++ b/sound/virtio/virtio_pcm.h
@@ -50,6 +50,8 @@ struct virtio_pcm_substream {
struct work_struct elapsed_period;
spinlock_t lock;
size_t buffer_bytes;
+   u8 *buffer;
+   size_t buffer_sz;
size_t hw_ptr;
bool xfer_enabled;
bool xfer_xrun;
@@ -90,7 +92,8 @@ struct virtio_pcm {
struct virtio_pcm_stream streams[SNDRV_PCM_STREAM_LAST + 1];
  };
  
-extern const struct snd_pcm_ops virtsnd_pcm_ops;

+extern const struct snd_pcm_ops virtsnd_pcm_playback_ops;
+extern const struct snd_pcm_ops virtsnd_pcm_capture_ops;
  
  int virtsnd_pcm_validate(struct virtio_device *vdev);
  
@@ -117,7 +120,9 @@ int virtsnd_pcm_msg_alloc(struct virtio_pcm_substream *vss,
  
  void virtsnd_pcm_msg_free(struct virtio_pcm_substream *vss);
  
-int virtsnd_pcm_msg_send(struct virtio_pcm_substream *vss);

+int virtsnd_pcm_msg_send(struct virtio_pcm_substream *vss, bool single);
+
+int virtsnd_pcm_msg_send_locked(struct virtio_pcm_substream *vss, bool single);
  
  unsigned int virtsnd_pcm_msg_pending_num(struct virtio_pcm_substream *vss);
  
diff --git a/sound/virtio/virtio_pcm_msg.c b/sound/vir

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Xuan Zhuo
On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  wrote:
> On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  wrote:
> >
> > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  wrote:
> > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  wrote:
> > > >
> > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > >  wrote:
> > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > ## AF_XDP
> > > > > > > > >
> > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network 
> > > > > > > > > framework. The zero
> > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the 
> > > > > > > > > driver. The
> > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe 
> > > > > > > > > already support
> > > > > > > > > this feature, This patch set allows virtio-net to support 
> > > > > > > > > xsk's zerocopy xmit
> > > > > > > > > feature.
> > > > > > > > >
> > > > > > > > > At present, we have completed some preparation:
> > > > > > > > >
> > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > >
> > > > > > > > > So it is time for Virtio-Net to complete the support for the 
> > > > > > > > > XDP Socket
> > > > > > > > > Zerocopy.
> > > > > > > > >
> > > > > > > > > Virtio-net can not increase the queue num at will, so xsk 
> > > > > > > > > shares the queue with
> > > > > > > > > kernel.
> > > > > > > > >
> > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > interrupt from driver
> > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If 
> > > > > > > > > the CPU run by TX
> > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the 
> > > > > > > > > remote CPU. If it
> > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > >
> > > > > > > > > This patch set includes some refactor to the virtio-net to 
> > > > > > > > > let that to support
> > > > > > > > > AF_XDP.
> > > > > > > > >
> > > > > > > > > ## performance
> > > > > > > > >
> > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > >
> > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > >
> > > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > > >
> > > > > > > > > I write a tool that sends udp packets or recvs udp packets by 
> > > > > > > > > AF_XDP.
> > > > > > > > >
> > > > > > > > >   | Guest APP CPU |Guest Softirq CPU | UDP PPS
> > > > > > > > > --|---|--|
> > > > > > > > > xmit by syscall   |   100%|  |   
> > > > > > > > > 676,915
> > > > > > > > > xmit by xsk   |   59.1%   |   100%   | 
> > > > > > > > > 5,447,168
> > > > > > > > > recv by syscall   |   60% |   100%   |   
> > > > > > > > > 932,288
> > > > > > > > > recv by xsk   |   35.7%   |   100%   | 
> > > > > > > > > 3,343,168
> > > > > > > >
> > > > > > > > Any chance we can get a testpmd result (which I guess should be 
> > > > > > > > better
> > > > > > > > than PPS above)?
> > > > > > >
> > > > > > > Do you mean testpmd + DPDK + AF_XDP?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > >
> > > > > > > Yes. This is probably better because my tool does more work. That 
> > > > > > > is not a
> > > > > > > complete testing tool used by our business.
> > > > > >
> > > > > > Probably, but it would be appealing for others. Especially 
> > > > > > considering
> > > > > > DPDK supports AF_XDP PMD now.
> > > > >
> > > > > OK.
> > > > >
> > > > > Let me try.
> > > > >
> > > > > But could you start to review firstly?
> > > >
> > > > Yes, it's in my todo list.
> > >
> > > Speaking too fast, I think if it doesn't take too long time, I would
> > > wait for the result first as netdim series. One reason is that I
> > > remember claims to be only 10% to 20% loss comparing to wire speed, so
> > > I'd expect it should be much faster. I vaguely remember, even a vhost
> > > can gives us more than 3M PPS if we disable SMAP, so the numbers here
> > > are not as impressive as expected.
> >
> >
> > What is SMAP? Cloud you give me more info?
>
> Supervisor Mode Access Prevention
>
> Vhost suffers from this.
>
> >
> > So if we think the 3M as the wire speed, you expect the result
> > can reach 2.8M pps/core, right?
>
> It's AF_XDP that claims to be 80% if my memory is correct. So a
> correct AF_XDP implementation should not sit behind this too much.
>
> > Now the recv result is