Re: RE: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-29 Thread Yongji Xie
On Tue, Jun 29, 2021 at 3:56 PM Liu, Xiaodong  wrote:
>
>
>
> >-Original Message-
> >From: Jason Wang 
> >Sent: Tuesday, June 29, 2021 12:11 PM
> >To: Liu, Xiaodong ; Xie Yongji
> >; m...@redhat.com; stefa...@redhat.com;
> >sgarz...@redhat.com; pa...@nvidia.com; h...@infradead.org;
> >christian.brau...@canonical.com; rdun...@infradead.org; wi...@infradead.org;
> >v...@zeniv.linux.org.uk; ax...@kernel.dk; b...@kvack.org; cor...@lwn.net;
> >mika.pentt...@nextfour.com; dan.carpen...@oracle.com; j...@8bytes.org;
> >gre...@linuxfoundation.org
> >Cc: songmuc...@bytedance.com; virtualizat...@lists.linux-foundation.org;
> >net...@vger.kernel.org; k...@vger.kernel.org; linux-fsde...@vger.kernel.org;
> >iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> >Subject: Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace
> >
> >
> >在 2021/6/28 下午1:54, Liu, Xiaodong 写道:
> >>> Several issues:
> >>>
> >>> - VDUSE needs to limit the total size of the bounce buffers (64M if I was 
> >>> not
> >>> wrong). Does it work for SPDK?
> >> Yes, Jason. It is enough and works for SPDK.
> >> Since it's a kind of bounce buffer mainly for in-flight IO, so limited 
> >> size like
> >> 64MB is enough.
> >
> >
> >Ok.
> >
> >
> >>
> >>> - VDUSE can use hugepages but I'm not sure we can mandate hugepages (or
> >we
> >>> need introduce new flags for supporting this)
> >> Same with your worry, I'm afraid too that it is a hard for a kernel module
> >> to directly preallocate hugepage internal.
> >> What I tried is that:
> >> 1. A simple agent daemon (represents for one device)  `preallocates` and 
> >> maps
> >>  dozens of 2MB hugepages (like 64MB) for one device.
> >> 2. The daemon passes its mapping addr and hugepage fd to kernel
> >>  module through created IOCTL.
> >> 3. Kernel module remaps the hugepages inside kernel.
> >
> >
> >Such model should work, but the main "issue" is that it introduce
> >overheads in the case of vhost-vDPA.
> >
> >Note that in the case of vhost-vDPA, we don't use bounce buffer, the
> >userspace pages were shared directly.
> >
> >And since DMA is not done per page, it prevents us from using tricks
> >like vm_insert_page() in those cases.
> >
>
> Yes, really, it's a problem to handle vhost-vDPA case.
> But there are already several solutions to get VM served, like vhost-user,
> vfio-user, so at least for SPDK, it won't serve VM through VDUSE. If a user
> still want to do that, then the user should tolerate Introduced overhead.
>
> In other words, software backend like SPDK, will appreciate the virtio
> datapath of VDUSE to serve local host instead of VM. That's why I also drafted
> a "virtio-local" to bridge vhost-user target and local host kernel virtio-blk.
>
> >
> >> 4. Vhost user target gets and maps hugepage fd from kernel module
> >>  in vhost-user msg through Unix Domain Socket cmsg.
> >> Then kernel module and target map on the same hugepage based
> >> bounce buffer for in-flight IO.
> >>
> >> If there is one option in VDUSE to map userspace preallocated memory, then
> >> VDUSE should be able to mandate it even it is hugepage based.
> >>
> >
> >As above, this requires some kind of re-design since VDUSE depends on
> >the model of mmap(MAP_SHARED) instead of umem registering.
>
> Got it, Jason, this may be hard for current version of VDUSE.
> Maybe we can consider these options after VDUSE merged later.
>
> Since if VDUSE datapath could be directly leveraged by vhost-user target,
> its value will be propagated immediately.
>

Agreed!

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-29 Thread Liu, Xiaodong



>-Original Message-
>From: Jason Wang 
>Sent: Tuesday, June 29, 2021 12:11 PM
>To: Liu, Xiaodong ; Xie Yongji
>; m...@redhat.com; stefa...@redhat.com;
>sgarz...@redhat.com; pa...@nvidia.com; h...@infradead.org;
>christian.brau...@canonical.com; rdun...@infradead.org; wi...@infradead.org;
>v...@zeniv.linux.org.uk; ax...@kernel.dk; b...@kvack.org; cor...@lwn.net;
>mika.pentt...@nextfour.com; dan.carpen...@oracle.com; j...@8bytes.org;
>gre...@linuxfoundation.org
>Cc: songmuc...@bytedance.com; virtualizat...@lists.linux-foundation.org;
>net...@vger.kernel.org; k...@vger.kernel.org; linux-fsde...@vger.kernel.org;
>iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
>Subject: Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace
>
>
>在 2021/6/28 下午1:54, Liu, Xiaodong 写道:
>>> Several issues:
>>>
>>> - VDUSE needs to limit the total size of the bounce buffers (64M if I was 
>>> not
>>> wrong). Does it work for SPDK?
>> Yes, Jason. It is enough and works for SPDK.
>> Since it's a kind of bounce buffer mainly for in-flight IO, so limited size 
>> like
>> 64MB is enough.
>
>
>Ok.
>
>
>>
>>> - VDUSE can use hugepages but I'm not sure we can mandate hugepages (or
>we
>>> need introduce new flags for supporting this)
>> Same with your worry, I'm afraid too that it is a hard for a kernel module
>> to directly preallocate hugepage internal.
>> What I tried is that:
>> 1. A simple agent daemon (represents for one device)  `preallocates` and maps
>>  dozens of 2MB hugepages (like 64MB) for one device.
>> 2. The daemon passes its mapping addr and hugepage fd to kernel
>>  module through created IOCTL.
>> 3. Kernel module remaps the hugepages inside kernel.
>
>
>Such model should work, but the main "issue" is that it introduce
>overheads in the case of vhost-vDPA.
>
>Note that in the case of vhost-vDPA, we don't use bounce buffer, the
>userspace pages were shared directly.
>
>And since DMA is not done per page, it prevents us from using tricks
>like vm_insert_page() in those cases.
>

Yes, really, it's a problem to handle vhost-vDPA case.
But there are already several solutions to get VM served, like vhost-user,
vfio-user, so at least for SPDK, it won't serve VM through VDUSE. If a user
still want to do that, then the user should tolerate Introduced overhead.

In other words, software backend like SPDK, will appreciate the virtio
datapath of VDUSE to serve local host instead of VM. That's why I also drafted
a "virtio-local" to bridge vhost-user target and local host kernel virtio-blk.

>
>> 4. Vhost user target gets and maps hugepage fd from kernel module
>>  in vhost-user msg through Unix Domain Socket cmsg.
>> Then kernel module and target map on the same hugepage based
>> bounce buffer for in-flight IO.
>>
>> If there is one option in VDUSE to map userspace preallocated memory, then
>> VDUSE should be able to mandate it even it is hugepage based.
>>
>
>As above, this requires some kind of re-design since VDUSE depends on
>the model of mmap(MAP_SHARED) instead of umem registering.

Got it, Jason, this may be hard for current version of VDUSE.
Maybe we can consider these options after VDUSE merged later.

Since if VDUSE datapath could be directly leveraged by vhost-user target,
its value will be propagated immediately.

>
>Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-29 Thread Jason Wang


在 2021/6/29 下午2:40, Yongji Xie 写道:

On Tue, Jun 29, 2021 at 12:13 PM Jason Wang  wrote:


在 2021/6/28 下午6:32, Yongji Xie 写道:

The large barrier is bounce-buffer mapping: SPDK requires hugepages
for NVMe over PCIe and RDMA, so take some preallcoated hugepages to
map as bounce buffer is necessary. Or it's hard to avoid an extra
memcpy from bounce-buffer to hugepage.
If you can add an option to map hugepages as bounce-buffer,
then SPDK could also be a potential user of vduse.


I think we can support registering user space memory for bounce-buffer
use like XDP does. But this needs to pin the pages, so I didn't
consider it in this initial version.


Note that userspace should be unaware of the existence of the bounce buffer.


If so, it might be hard to use umem. Because we can't use umem for
coherent mapping which needs physical address contiguous space.

Thanks,
Yongji



We probably can use umem for memory other than the virtqueue (still via 
mmap()).


Thanks


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-29 Thread Yongji Xie
On Tue, Jun 29, 2021 at 12:13 PM Jason Wang  wrote:
>
>
> 在 2021/6/28 下午6:32, Yongji Xie 写道:
> >> The large barrier is bounce-buffer mapping: SPDK requires hugepages
> >> for NVMe over PCIe and RDMA, so take some preallcoated hugepages to
> >> map as bounce buffer is necessary. Or it's hard to avoid an extra
> >> memcpy from bounce-buffer to hugepage.
> >> If you can add an option to map hugepages as bounce-buffer,
> >> then SPDK could also be a potential user of vduse.
> >>
> > I think we can support registering user space memory for bounce-buffer
> > use like XDP does. But this needs to pin the pages, so I didn't
> > consider it in this initial version.
> >
>
> Note that userspace should be unaware of the existence of the bounce buffer.
>

If so, it might be hard to use umem. Because we can't use umem for
coherent mapping which needs physical address contiguous space.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Jason Wang


在 2021/6/28 下午6:32, Yongji Xie 写道:

The large barrier is bounce-buffer mapping: SPDK requires hugepages
for NVMe over PCIe and RDMA, so take some preallcoated hugepages to
map as bounce buffer is necessary. Or it's hard to avoid an extra
memcpy from bounce-buffer to hugepage.
If you can add an option to map hugepages as bounce-buffer,
then SPDK could also be a potential user of vduse.


I think we can support registering user space memory for bounce-buffer
use like XDP does. But this needs to pin the pages, so I didn't
consider it in this initial version.



Note that userspace should be unaware of the existence of the bounce buffer.

So we need to think carefully of mmap() vs umem registering.

Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Jason Wang



在 2021/6/28 下午1:54, Liu, Xiaodong 写道:

Several issues:

- VDUSE needs to limit the total size of the bounce buffers (64M if I was not
wrong). Does it work for SPDK?

Yes, Jason. It is enough and works for SPDK.
Since it's a kind of bounce buffer mainly for in-flight IO, so limited size like
64MB is enough.



Ok.





- VDUSE can use hugepages but I'm not sure we can mandate hugepages (or we
need introduce new flags for supporting this)

Same with your worry, I'm afraid too that it is a hard for a kernel module
to directly preallocate hugepage internal.
What I tried is that:
1. A simple agent daemon (represents for one device)  `preallocates` and maps
 dozens of 2MB hugepages (like 64MB) for one device.
2. The daemon passes its mapping addr and hugepage fd to kernel
 module through created IOCTL.
3. Kernel module remaps the hugepages inside kernel.



Such model should work, but the main "issue" is that it introduce  
overheads in the case of vhost-vDPA.


Note that in the case of vhost-vDPA, we don't use bounce buffer, the  
userspace pages were shared directly.


And since DMA is not done per page, it prevents us from using tricks  
like vm_insert_page() in those cases.




4. Vhost user target gets and maps hugepage fd from kernel module
 in vhost-user msg through Unix Domain Socket cmsg.
Then kernel module and target map on the same hugepage based
bounce buffer for in-flight IO.

If there is one option in VDUSE to map userspace preallocated memory, then
VDUSE should be able to mandate it even it is hugepage based.



As above, this requires some kind of re-design since VDUSE depends on  
the model of mmap(MAP_SHARED) instead of umem registering.


Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Yongji Xie
On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jun 15, 2021 at 10:13:21PM +0800, Xie Yongji wrote:
> > This series introduces a framework that makes it possible to implement
> > software-emulated vDPA devices in userspace. And to make it simple, the
> > emulated vDPA device's control path is handled in the kernel and only the
> > data path is implemented in the userspace.
>
> This looks interesting. Unfortunately I don't have enough time to do a
> full review, but I looked at the documentation and uapi header file to
> give feedback on the userspace ABI.
>

OK. Thanks for your comments. It's helpful!

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Stefan Hajnoczi
On Tue, Jun 15, 2021 at 10:13:21PM +0800, Xie Yongji wrote:
> This series introduces a framework that makes it possible to implement
> software-emulated vDPA devices in userspace. And to make it simple, the
> emulated vDPA device's control path is handled in the kernel and only the
> data path is implemented in the userspace.

This looks interesting. Unfortunately I don't have enough time to do a
full review, but I looked at the documentation and uapi header file to
give feedback on the userspace ABI.

Stefan


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Yongji Xie
On Mon, 28 Jun 2021 at 10:55, Liu Xiaodong  wrote:
>
> On Tue, Jun 15, 2021 at 10:13:21PM +0800, Xie Yongji wrote:
> >
> > This series introduces a framework that makes it possible to implement
> > software-emulated vDPA devices in userspace. And to make it simple, the
> > emulated vDPA device's control path is handled in the kernel and only the
> > data path is implemented in the userspace.
> >
> > Since the emuldated vDPA device's control path is handled in the kernel,
> > a message mechnism is introduced to make userspace be aware of the data
> > path related changes. Userspace can use read()/write() to receive/reply
> > the control messages.
> >
> > In the data path, the core is mapping dma buffer into VDUSE daemon's
> > address space, which can be implemented in different ways depending on
> > the vdpa bus to which the vDPA device is attached.
> >
> > In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> > buffer is reside in a userspace memory region which can be shared to the
> > VDUSE userspace processs via transferring the shmfd.
> >
> > The details and our user case is shown below:
> >
> > -   
> > --
> > |Container ||  QEMU(VM) |   |   
> > VDUSE daemon |
> > |   -  ||  ---  |   | 
> > -  |
> > |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> > emulation | | block driver | |
> > +--- ---+   
> > -+--+-
> > |   ||  
> > |
> > |   ||  
> > |
> > +---++--+-
> > || block device |   |  vhost device || vduse driver 
> > |  | TCP/IP ||
> > |---+   +
> > ---+  -+|
> > |   |   |   |   
> > ||
> > | --+--   --+--- 
> > ---+---||
> > | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device 
> > |||
> > | --+--   --+--- 
> > ---+---||
> > |   |  virtio bus   |   |   
> > ||
> > |   ++---   |   |   
> > ||
> > ||  |   |   
> > ||
> > |  --+--|   |   
> > ||
> > |  | virtio-blk device ||   |   
> > ||
> > |  --+--|   |   
> > ||
> > ||  |   |   
> > ||
> > | ---+---   |   |   
> > ||
> > | |  virtio-vdpa driver |   |   |   
> > ||
> > | ---+---   |   |   
> > ||
> > ||  |   |
> > vdpa bus   ||
> > | 
> > ---+--+---+ 
> >   ||
> > |   
> >  ---+--- |
> > -|
> >  NIC |--
> > 
> >  ---+---
> > 
> > |
> > 
> >-+-
> > 
> >| Remote Storages |
> > 
> >---
> >
> > We make use of it to implement a block device connecting to
> > our distributed storage, which can be used both 

RE: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-27 Thread Liu, Xiaodong



>-Original Message-
>From: Jason Wang 
>Sent: Monday, June 28, 2021 12:35 PM
>To: Liu, Xiaodong ; Xie Yongji
>; m...@redhat.com; stefa...@redhat.com;
>sgarz...@redhat.com; pa...@nvidia.com; h...@infradead.org;
>christian.brau...@canonical.com; rdun...@infradead.org; wi...@infradead.org;
>v...@zeniv.linux.org.uk; ax...@kernel.dk; b...@kvack.org; cor...@lwn.net;
>mika.pentt...@nextfour.com; dan.carpen...@oracle.com; j...@8bytes.org;
>gre...@linuxfoundation.org
>Cc: songmuc...@bytedance.com; virtualizat...@lists.linux-foundation.org;
>net...@vger.kernel.org; k...@vger.kernel.org; linux-fsde...@vger.kernel.org;
>iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
>Subject: Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace
>
>
>在 2021/6/28 下午6:33, Liu Xiaodong 写道:
>> On Tue, Jun 15, 2021 at 10:13:21PM +0800, Xie Yongji wrote:
>>> This series introduces a framework that makes it possible to
>>> implement software-emulated vDPA devices in userspace. And to make it
>>> simple, the emulated vDPA device's control path is handled in the
>>> kernel and only the data path is implemented in the userspace.
>>>
>>> Since the emuldated vDPA device's control path is handled in the
>>> kernel, a message mechnism is introduced to make userspace be aware
>>> of the data path related changes. Userspace can use read()/write() to
>>> receive/reply the control messages.
>>>
>>> In the data path, the core is mapping dma buffer into VDUSE daemon's
>>> address space, which can be implemented in different ways depending
>>> on the vdpa bus to which the vDPA device is attached.
>>>
>>> In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver
>>> with bounce-buffering mechanism to achieve that. And in vhost-vdpa
>>> case, the dma buffer is reside in a userspace memory region which can
>>> be shared to the VDUSE userspace processs via transferring the shmfd.
>>>
>>> The details and our user case is shown below:
>>>
>>> -   
>>> --
>
>>> |Container ||  QEMU(VM) |   |   
>>> VDUSE daemon
>|
>>> |   -  ||  ---  |   | 
>>> -  |
>>> |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
>>> emulation | |
>block driver | |
>>> +--- ---+   
>>> -+--+---
>--
>>>  |   || 
>>>  |
>>>  |   || 
>>>  |
>>> +---++--
>+-
>>> || block device |   |  vhost device || vduse driver 
>>> |  | TCP/IP |
>|
>>> |---+   +
>>> ---+  -+|
>>> |   |   |   |   
>>> ||
>>> | --+--   --+--- 
>>> ---+---||
>>> | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device 
>>> ||
>|
>>> | --+--   --+--- 
>>> ---+---||
>>> |   |  virtio bus   |   |   
>>> ||
>>> |   ++---   |   |   
>>> ||
>>> ||  |   |   
>>> ||
>>> |  --+--|   |   
>>> ||
>>> |  | virtio-blk device ||   |   
>>> ||
>>> |  --+-- 

Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-27 Thread Jason Wang


在 2021/6/28 下午6:33, Liu Xiaodong 写道:

On Tue, Jun 15, 2021 at 10:13:21PM +0800, Xie Yongji wrote:

This series introduces a framework that makes it possible to implement
software-emulated vDPA devices in userspace. And to make it simple, the
emulated vDPA device's control path is handled in the kernel and only the
data path is implemented in the userspace.

Since the emuldated vDPA device's control path is handled in the kernel,
a message mechnism is introduced to make userspace be aware of the data
path related changes. Userspace can use read()/write() to receive/reply
the control messages.

In the data path, the core is mapping dma buffer into VDUSE daemon's
address space, which can be implemented in different ways depending on
the vdpa bus to which the vDPA device is attached.

In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
buffer is reside in a userspace memory region which can be shared to the
VDUSE userspace processs via transferring the shmfd.

The details and our user case is shown below:

-   
--
|Container ||  QEMU(VM) |   |   
VDUSE daemon |
|   -  ||  ---  |   | 
-  |
|   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
emulation | | block driver | |
+--- ---+   
-+--+-
 |   || 
 |
 |   || 
 |
+---++--+-
|| block device |   |  vhost device || vduse driver |   
   | TCP/IP ||
|---+   +---+   
   -+|
|   |   |   |   
||
| --+--   --+--- ---+---
||
| | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |
||
| --+--   --+--- ---+---
||
|   |  virtio bus   |   |   
||
|   ++---   |   |   
||
||  |   |   
||
|  --+--|   |   
||
|  | virtio-blk device ||   |   
||
|  --+--|   |   
||
||  |   |   
||
| ---+---   |   |   
||
| |  virtio-vdpa driver |   |   |   
||
| ---+---   |   |   
||
||  |   |vdpa 
bus   ||
| 
---+--+---+ 
  ||
|   
 ---+--- |
-|
 NIC |--

  ---+---

 |

-+-

| Remote Storages |

---

We make use of it to implement a block device connecting to
our distributed storage, which can be used both in containers and
VMs. Thus, we can have an unified technology stack in this two cases.

To test it with null-blk:

   $ qemu-storage-daemon \
   --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
   --monitor chardev=charmonitor \
   --blockdev 
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
 \
  

Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-27 Thread Liu Xiaodong
On Tue, Jun 15, 2021 at 10:13:21PM +0800, Xie Yongji wrote:
> 
> This series introduces a framework that makes it possible to implement
> software-emulated vDPA devices in userspace. And to make it simple, the
> emulated vDPA device's control path is handled in the kernel and only the
> data path is implemented in the userspace.
> 
> Since the emuldated vDPA device's control path is handled in the kernel,
> a message mechnism is introduced to make userspace be aware of the data
> path related changes. Userspace can use read()/write() to receive/reply
> the control messages.
> 
> In the data path, the core is mapping dma buffer into VDUSE daemon's
> address space, which can be implemented in different ways depending on
> the vdpa bus to which the vDPA device is attached.
> 
> In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> buffer is reside in a userspace memory region which can be shared to the
> VDUSE userspace processs via transferring the shmfd.
> 
> The details and our user case is shown below:
> 
> -   
> --
> |Container ||  QEMU(VM) |   | 
>   VDUSE daemon |
> |   -  ||  ---  |   | 
> -  |
> |   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
> emulation | | block driver | |
> +--- ---+   
> -+--+-
> |   ||
>   |
> |   ||
>   |
> +---++--+-
> || block device |   |  vhost device || vduse driver | 
>  | TCP/IP ||
> |---+   +---+ 
>  -+|
> |   |   |   | 
>   ||
> | --+--   --+--- ---+---  
>   ||
> | | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |  
>   ||
> | --+--   --+--- ---+---  
>   ||
> |   |  virtio bus   |   | 
>   ||
> |   ++---   |   | 
>   ||
> ||  |   | 
>   ||
> |  --+--|   | 
>   ||
> |  | virtio-blk device ||   | 
>   ||
> |  --+--|   | 
>   ||
> ||  |   | 
>   ||
> | ---+---   |   | 
>   ||
> | |  virtio-vdpa driver |   |   | 
>   ||
> | ---+---   |   | 
>   ||
> ||  |   |vdpa 
> bus   ||
> | 
> ---+--+---+   
> ||
> | 
>---+--- |
> -|
>  NIC |--
>   
>---+---
>   
>   |
>   
>  -+-
>   
>  | Remote Storages |
>   
>  ---
> 
> We make use of it to implement a block device connecting to
> our distributed storage, which can be used both in containers and
> VMs. Thus, we can have an unified technology stack in this two cases.
> 
> To test it with null-blk:
> 
>   $ qemu-storage-daemon \
>   --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>

[PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-15 Thread Xie Yongji
This series introduces a framework that makes it possible to implement
software-emulated vDPA devices in userspace. And to make it simple, the
emulated vDPA device's control path is handled in the kernel and only the
data path is implemented in the userspace.

Since the emuldated vDPA device's control path is handled in the kernel,
a message mechnism is introduced to make userspace be aware of the data
path related changes. Userspace can use read()/write() to receive/reply
the control messages.

In the data path, the core is mapping dma buffer into VDUSE daemon's
address space, which can be implemented in different ways depending on
the vdpa bus to which the vDPA device is attached.

In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
buffer is reside in a userspace memory region which can be shared to the
VDUSE userspace processs via transferring the shmfd.

The details and our user case is shown below:

-   
--
|Container ||  QEMU(VM) |   |   
VDUSE daemon |
|   -  ||  ---  |   | 
-  |
|   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
emulation | | block driver | |
+--- ---+   
-+--+-
|   ||  
|
|   ||  
|
+---++--+-
|| block device |   |  vhost device || vduse driver |   
   | TCP/IP ||
|---+   +---+   
   -+|
|   |   |   |   
||
| --+--   --+--- ---+---
||
| | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |
||
| --+--   --+--- ---+---
||
|   |  virtio bus   |   |   
||
|   ++---   |   |   
||
||  |   |   
||
|  --+--|   |   
||
|  | virtio-blk device ||   |   
||
|  --+--|   |   
||
||  |   |   
||
| ---+---   |   |   
||
| |  virtio-vdpa driver |   |   |   
||
| ---+---   |   |   
||
||  |   |vdpa 
bus   ||
| 
---+--+---+ 
  ||
|   
 ---+--- |
-|
 NIC |--

 ---+---

|

   -+-

   | Remote Storages |

   ---

We make use of it to implement a block device connecting to
our distributed storage, which can be used both in containers and
VMs. Thus, we can have an unified technology stack in this two cases.

To test it with null-blk:

  $ qemu-storage-daemon \
  --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
  --monitor chardev=charmonitor \
  --blockdev 
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
 \
  --export 
type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128

The