On 2018年04月12日 11:41, Michael S. Tsirkin wrote:
On Thu, Apr 12, 2018 at 11:37:35AM +0800, Jason Wang wrote:
On 2018年04月12日 09:57, Michael S. Tsirkin wrote:
On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
feature for vhost-user. By default, vhost-user backend needs
to query the IOTLBs from QEMU after meeting unknown IOVAs.
With this protocol feature negotiated, QEMU will provide all
the IOTLBs to vhost-user backend without waiting for the
queries from backend. This is helpful when using a hardware
accelerator which is not able to handle unknown IOVAs at the
Signed-off-by: Tiwei Bie <tiwei....@intel.com>
This is potentially a large amount of data to be sent
on a socket.
If we take the hardware accelerator out of this picture, we
will find that it's actually a question of "pre-loading" vs
"lazy-loading". I think neither of them is perfect.
For "pre-loading", as you said, we may have a tough starting.
But for "lazy-loading", we can't have a predictable performance.
A sudden, unexpected performance drop may happen at any time,
because we may meet an unknown IOVA at any time in this case.
That's how hardware behaves too though. So we can expect guests
to try to optimize locality.
The difference is that, the software implementation needs to
query the mappings via socket. And it's much slower..
If you are proposing this new feature as an optimization,
then I'd like to see numbers showing the performance gains.
It's definitely possible to optimize things out. Pre-loading isn't
where I would start optimizing though. For example, DPDK could have its
own VTD emulation, then it could access guest memory directly.
Have vtd emulation in dpdk have many disadvantages:
- vendor locked, can only work for intel
I don't see what would prevent other vendors from doing the same.
Technically it can, two questions here:
- Shouldn't we keep vhost-user vendor/transport independent?
- Do we really prefer the split device model here, it means to implement
datapath in two places at least. Personally I prefer to keep all virt
stuffs inside qemu.
- duplication of codes and bugs
- a huge number of new message types needs to be invented
Oh, just the flush I'd wager.
Not only flush, but also error reporting, context entry programming and
even PRS in the future. And we need a feature negotiation between them
like vhost to keep the compatibility for future features. This sounds
So I tend to go to a reverse way, link dpdk to qemu.
Won't really help as people want to build software using dpdk.
Well I believe the main use case it vDPA which is hardware virtio
offload. For building software using dpdk like ovs-dpdk, it's another
interesting topic. We can seek solution other than linking dpdk to qemu,
e.g we can do all virtio and packet copy stuffs inside a qemu IOThread
and use another inter-process channel to communicate with OVS-dpdk (or
another virtio-user here). The key is to hide all virtualization details
Once we meet an unknown IOVA, the backend's data path will need
to stop and query the mapping of the IOVA via the socket and
wait for the reply. And the latency is not negligible (sometimes
it's even unacceptable), especially in high performance network
case. So maybe it's better to make both of them available to
the vhost backend.
I had an impression that a hardware accelerator was using
VFIO anyway. Given this, can't we have QEMU program
the shadow IOMMU tables into VFIO directly?
I think it's a good idea! Currently, my concern about it is
that, the hardware device may not use IOMMU and it may have
its builtin address translation unit. And it will be a pain
for device vendors to teach VFIO to be able to work with the
builtin address translation unit.
I think such drivers would have to interate with VFIO somehow.
Otherwise, what is the plan for assigning such devices then?
Such devices are just for vhost data path acceleration.
That's not true I think. E.g. RDMA devices have an on-card MMU.
They have many available queue pairs, the switch logic
will be done among those queue pairs. And different queue
pairs will serve different VMs directly.
The way I would do it is attach different PASID values to
different queues. This way you can use the standard IOMMU
to enforce protection.
So that's just shared virtual memory on host which can share iova address
space between a specific queue pair and a process. I'm not sure how hard can
exist vhost-user backend to support this.
That would be VFIO's job, nothing to do with vhost-user besides
sharing the VFIO descriptor.
At least dpdk need to offload DMA mapping setup to qemu.