Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-08-31 16:11, Michael S. Tsirkin wrote:
> Hello!
> During the KVM forum, we discussed supporting virtio on top
> of ivshmem.

No, not on top of ivshmem. On top of shared memory. Our model is
different from the simplistic ivshmem.

> I have considered it, and came up with an alternative
> that has several advantages over that - please see below.
> Comments welcome.
> 
> -
> 
> Existing solutions to userspace switching between VMs on the
> same host are vhost-user and ivshmem.
> 
> vhost-user works by mapping memory of all VMs being bridged into the
> switch memory space.
> 
> By comparison, ivshmem works by exposing a shared region of memory to all VMs.
> VMs are required to use this region to store packets. The switch only
> needs access to this region.
> 
> Another difference between vhost-user and ivshmem surfaces when polling
> is used. With vhost-user, the switch is required to handle
> data movement between VMs, if using polling, this means that 1 host CPU
> needs to be sacrificed for this task.
> 
> This is easiest to understand when one of the VMs is
> used with VF pass-through. This can be schematically shown below:
> 
> +-- VM1 --++---VM2---+
> | virtio-pci  +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- 
> NIC
> +-++-+
> 
> 
> With ivshmem in theory communication can happen directly, with two VMs
> polling the shared memory region.
> 
> 
> I won't spend time listing advantages of vhost-user over ivshmem.
> Instead, having identified two advantages of ivshmem over vhost-user,
> below is a proposal to extend vhost-user to gain the advantages
> of ivshmem.
> 
> 
> 1: virtio in guest can be extended to allow support
> for IOMMUs. This provides guest with full flexibility
> about memory which is readable or write able by each device.
> By setting up a virtio device for each other VM we need to
> communicate to, guest gets full control of its security, from
> mapping all memory (like with current vhost-user) to only
> mapping buffers used for networking (like ivshmem) to
> transient mappings for the duration of data transfer only.
> This also allows use of VFIO within guests, for improved
> security.
> 
> vhost user would need to be extended to send the
> mappings programmed by guest IOMMU.
> 
> 2. qemu can be extended to serve as a vhost-user client:
> remote VM mappings over the vhost-user protocol, and
> map them into another VM's memory.
> This mapping can take, for example, the form of
> a BAR of a pci device, which I'll call here vhost-pci - 
> with bus address allowed
> by VM1's IOMMU mappings being translated into
> offsets within this BAR within VM2's physical
> memory space.
> 
> Since the translation can be a simple one, VM2
> can perform it within its vhost-pci device driver.
> 
> While this setup would be the most useful with polling,
> VM1's ioeventfd can also be mapped to
> another VM2's irqfd, and vice versa, such that VMs
> can trigger interrupts to each other without need
> for a helper thread on the host.
> 
> 
> The resulting channel might look something like the following:
> 
> +-- VM1 --+  +---VM2---+
> | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC
> +-+  +-+
> 
> comparing the two diagrams, a vhost-user thread on the host is
> no longer required, reducing the host CPU utilization when
> polling is active.  At the same time, VM2 can not access all of VM1's
> memory - it is limited by the iommu configuration setup by VM1.
> 
> 
> Advantages over ivshmem:
> 
> - more flexibility, endpoint VMs do not have to place data at any
>   specific locations to use the device, in practice this likely
>   means less data copies.
> - better standardization/code reuse
>   virtio changes within guests would be fairly easy to implement
>   and would also benefit other backends, besides vhost-user
>   standard hotplug interfaces can be used to add and remove these
>   channels as VMs are added or removed.
> - migration support
>   It's easy to implement since ownership of memory is well defined.
>   For example, during migration VM2 can notify hypervisor of VM1
>   by updating dirty bitmap each time is writes into VM1 memory.
> 
> Thanks,
> 

This sounds like a different interface to a concept very similar to
Xen's grant table, no? Well, there might be benefits for some use cases,
for ours this is too dynamic, in fact. We'd like to avoid remappings
during runtime controlled by guest activities, which is clearly required
for this model.

Another shortcoming: If VM1 does not trust (security or safety-wise) VM2
while preparing a message for it, it has to keep the buffer invisible
for VM2 until it is completed and signed, hashed etc. That means it has
to reprogram the IOMMU frequently. With the concept we discussed at KVM
Forum, there would be shared memory mapped read-only to VM2 while being
R/W for VM1. That would 

Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Nakajima, Jun
My previous email has been bounced by virtio-...@lists.oasis-open.org.
I tried to subscribed it, but to no avail...

On Tue, Sep 1, 2015 at 1:17 AM, Michael S. Tsirkin  wrote:
> On Mon, Aug 31, 2015 at 11:35:55AM -0700, Nakajima, Jun wrote:
>> On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin  wrote:

>> > 1: virtio in guest can be extended to allow support
>> > for IOMMUs. This provides guest with full flexibility
>> > about memory which is readable or write able by each device.
>>
>> I assume that you meant VFIO only for virtio by "use of VFIO".  To get
>> VFIO working for general direct-I/O (including VFs) in guests, as you
>> know, we need to virtualize IOMMU (e.g. VT-d) and the interrupt
>> remapping table on x86 (i.e. nested VT-d).
>
> Not necessarily: if pmd is used, mappings stay mostly static,
> and there are no interrupts, so existing IOMMU emulation in qemu
> will do the job.

OK. It would work, although we need to engage additional/complex code
in the guests when we are making just memory operations under the
hood.

>> > By setting up a virtio device for each other VM we need to
>> > communicate to, guest gets full control of its security, from
>> > mapping all memory (like with current vhost-user) to only
>> > mapping buffers used for networking (like ivshmem) to
>> > transient mappings for the duration of data transfer only.
>>
>> And I think that we can use VMFUNC to have such transient mappings.
>
> Interesting. There are two points to make here:
>
>
> 1. To create transient mappings, VMFUNC isn't strictly required.
> Instead, mappings can be created when first access by VM2
> within BAR triggers a page fault.
> I guess VMFUNC could remove this first pagefault by hypervisor mapping
> host PTE into the alternative view, then VMFUNC making
> VM2 PTE valid - might be important if mappings are very dynamic
> so there are many pagefaults.

I agree that VMFUNC isn't strictly required. It would provide
performance optimization.
And I think it can add some level of protection as well because you
might want to keep mapping guest physical memory (which is partial or
entire VM1's memory) at BAR of VM2 all the time. IOMMU on VM1 can
limit the address ranges accessed by VM2, but such restriction becomes
loose  as you want them static and thus large enough.

>
> 2. To invalidate mappings, VMFUNC isn't sufficient since
> translation cache of other CPUs needs to be invalidated.
> I don't think VMFUNC can do this.

I don't think we need to invalidate mappings often. And if we do, we
need to invalidate EPT anyway.

>>
>> Also, the ivshmem functionality could be implemented by this proposal:
>> - vswitch (or some VM) allocates memory regions in its address space, and
>> - it sets up that IOMMU mappings on the VMs be translated into the regions
>
> I agree it's possible, but that's not something that exists on real
> hardware. It's not clear to me what are the security implications
> of having VM2 control IOMMU of VM1. Having each VM control its own IOMMU
> seems more straight-forward.

I meant the vswitch's IOMMU. It can a bare-metal (or host) process or
VM. For a bare-metal process, it's basically VFIO, where virtual
address is used as bus address. Each VM accesses the shared memory
using vhost-pci BAR + bus (i.e. virtual) address.


-- 
Jun
Intel Open Source Technology Center
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Nakajima, Jun
On Tue, Sep 1, 2015 at 9:28 AM, Jan Kiszka  wrote:
> On 2015-09-01 18:02, Michael S. Tsirkin wrote:
...
>> You don't need to be able to map all guest memory if you know
>> guest won't try to allow device access to all of it.
>> It's a question of how good is the bus address allocator.
>
> But those BARs need to allocate a guest-physical address range as large
> as the other guest's RAM is, possibly even larger if that RAM is not
> contiguous, and you can't put other resources into potential holes
> because VM2 does not know where those holes will be.
>

I think you can allocate such guest-physical address ranges
efficiently if each BAR sets the base of each memory region reported
by VHOST_SET_MEM_TABLE, for example.  The issue is that we would need
to 8 (VHOST_MEMORY_MAX_NREGIONS) of them vs. 6 (defined by PCI-SIG).

-- 
Jun
Intel Open Source Technology Center
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Michael S. Tsirkin
On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
> On 2015-08-31 16:11, Michael S. Tsirkin wrote:
> > Hello!
> > During the KVM forum, we discussed supporting virtio on top
> > of ivshmem.
> 
> No, not on top of ivshmem. On top of shared memory. Our model is
> different from the simplistic ivshmem.
> 
> > I have considered it, and came up with an alternative
> > that has several advantages over that - please see below.
> > Comments welcome.
> > 
> > -
> > 
> > Existing solutions to userspace switching between VMs on the
> > same host are vhost-user and ivshmem.
> > 
> > vhost-user works by mapping memory of all VMs being bridged into the
> > switch memory space.
> > 
> > By comparison, ivshmem works by exposing a shared region of memory to all 
> > VMs.
> > VMs are required to use this region to store packets. The switch only
> > needs access to this region.
> > 
> > Another difference between vhost-user and ivshmem surfaces when polling
> > is used. With vhost-user, the switch is required to handle
> > data movement between VMs, if using polling, this means that 1 host CPU
> > needs to be sacrificed for this task.
> > 
> > This is easiest to understand when one of the VMs is
> > used with VF pass-through. This can be schematically shown below:
> > 
> > +-- VM1 --++---VM2---+
> > | virtio-pci  +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- 
> > NIC
> > +-++-+
> > 
> > 
> > With ivshmem in theory communication can happen directly, with two VMs
> > polling the shared memory region.
> > 
> > 
> > I won't spend time listing advantages of vhost-user over ivshmem.
> > Instead, having identified two advantages of ivshmem over vhost-user,
> > below is a proposal to extend vhost-user to gain the advantages
> > of ivshmem.
> > 
> > 
> > 1: virtio in guest can be extended to allow support
> > for IOMMUs. This provides guest with full flexibility
> > about memory which is readable or write able by each device.
> > By setting up a virtio device for each other VM we need to
> > communicate to, guest gets full control of its security, from
> > mapping all memory (like with current vhost-user) to only
> > mapping buffers used for networking (like ivshmem) to
> > transient mappings for the duration of data transfer only.
> > This also allows use of VFIO within guests, for improved
> > security.
> > 
> > vhost user would need to be extended to send the
> > mappings programmed by guest IOMMU.
> > 
> > 2. qemu can be extended to serve as a vhost-user client:
> > remote VM mappings over the vhost-user protocol, and
> > map them into another VM's memory.
> > This mapping can take, for example, the form of
> > a BAR of a pci device, which I'll call here vhost-pci - 
> > with bus address allowed
> > by VM1's IOMMU mappings being translated into
> > offsets within this BAR within VM2's physical
> > memory space.
> > 
> > Since the translation can be a simple one, VM2
> > can perform it within its vhost-pci device driver.
> > 
> > While this setup would be the most useful with polling,
> > VM1's ioeventfd can also be mapped to
> > another VM2's irqfd, and vice versa, such that VMs
> > can trigger interrupts to each other without need
> > for a helper thread on the host.
> > 
> > 
> > The resulting channel might look something like the following:
> > 
> > +-- VM1 --+  +---VM2---+
> > | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC
> > +-+  +-+
> > 
> > comparing the two diagrams, a vhost-user thread on the host is
> > no longer required, reducing the host CPU utilization when
> > polling is active.  At the same time, VM2 can not access all of VM1's
> > memory - it is limited by the iommu configuration setup by VM1.
> > 
> > 
> > Advantages over ivshmem:
> > 
> > - more flexibility, endpoint VMs do not have to place data at any
> >   specific locations to use the device, in practice this likely
> >   means less data copies.
> > - better standardization/code reuse
> >   virtio changes within guests would be fairly easy to implement
> >   and would also benefit other backends, besides vhost-user
> >   standard hotplug interfaces can be used to add and remove these
> >   channels as VMs are added or removed.
> > - migration support
> >   It's easy to implement since ownership of memory is well defined.
> >   For example, during migration VM2 can notify hypervisor of VM1
> >   by updating dirty bitmap each time is writes into VM1 memory.
> > 
> > Thanks,
> > 
> 
> This sounds like a different interface to a concept very similar to
> Xen's grant table, no?

Yes in a sense that grant tables are also memory sharing and
include permissions.
But we are emulating an IOMMU, and keep the PV part
as simple as possible (e.g. offset within BAR)
without attaching any policy to it.
Xen is fundamentally a PV interface.

> Well, there might be benefits for some use 

Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Michael S. Tsirkin
On Mon, Aug 31, 2015 at 11:35:55AM -0700, Nakajima, Jun wrote:
> On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin  wrote:
> > Hello!
> > During the KVM forum, we discussed supporting virtio on top
> > of ivshmem. I have considered it, and came up with an alternative
> > that has several advantages over that - please see below.
> > Comments welcome.
> 
> Hi Michael,
> 
> I like this, and it should be able to achieve what I presented at KVM
> Forum (vhost-user-shmem).
> Comments below.
> 
> >
> > -
> >
> > Existing solutions to userspace switching between VMs on the
> > same host are vhost-user and ivshmem.
> >
> > vhost-user works by mapping memory of all VMs being bridged into the
> > switch memory space.
> >
> > By comparison, ivshmem works by exposing a shared region of memory to all 
> > VMs.
> > VMs are required to use this region to store packets. The switch only
> > needs access to this region.
> >
> > Another difference between vhost-user and ivshmem surfaces when polling
> > is used. With vhost-user, the switch is required to handle
> > data movement between VMs, if using polling, this means that 1 host CPU
> > needs to be sacrificed for this task.
> >
> > This is easiest to understand when one of the VMs is
> > used with VF pass-through. This can be schematically shown below:
> >
> > +-- VM1 --++---VM2---+
> > | virtio-pci  +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- 
> > NIC
> > +-++-+
> >
> >
> > With ivshmem in theory communication can happen directly, with two VMs
> > polling the shared memory region.
> >
> >
> > I won't spend time listing advantages of vhost-user over ivshmem.
> > Instead, having identified two advantages of ivshmem over vhost-user,
> > below is a proposal to extend vhost-user to gain the advantages
> > of ivshmem.
> >
> >
> > 1: virtio in guest can be extended to allow support
> > for IOMMUs. This provides guest with full flexibility
> > about memory which is readable or write able by each device.
> 
> I assume that you meant VFIO only for virtio by "use of VFIO".  To get
> VFIO working for general direct-I/O (including VFs) in guests, as you
> know, we need to virtualize IOMMU (e.g. VT-d) and the interrupt
> remapping table on x86 (i.e. nested VT-d).

Not necessarily: if pmd is used, mappings stay mostly static,
and there are no interrupts, so existing IOMMU emulation in qemu
will do the job.


> > By setting up a virtio device for each other VM we need to
> > communicate to, guest gets full control of its security, from
> > mapping all memory (like with current vhost-user) to only
> > mapping buffers used for networking (like ivshmem) to
> > transient mappings for the duration of data transfer only.
> 
> And I think that we can use VMFUNC to have such transient mappings.

Interesting. There are two points to make here:


1. To create transient mappings, VMFUNC isn't strictly required.
Instead, mappings can be created when first access by VM2
within BAR triggers a page fault.
I guess VMFUNC could remove this first pagefault by hypervisor mapping
host PTE into the alternative view, then VMFUNC making
VM2 PTE valid - might be important if mappings are very dynamic
so there are many pagefaults.

2. To invalidate mappings, VMFUNC isn't sufficient since
translation cache of other CPUs needs to be invalidated.
I don't think VMFUNC can do this.




> > This also allows use of VFIO within guests, for improved
> > security.
> >
> > vhost user would need to be extended to send the
> > mappings programmed by guest IOMMU.
> 
> Right. We need to think about cases where other VMs (VM3, etc.) join
> the group or some existing VM leaves.
> PCI hot-plug should work there (as you point out at "Advantages over
> ivshmem" below).
> 
> >
> > 2. qemu can be extended to serve as a vhost-user client:
> > remote VM mappings over the vhost-user protocol, and
> > map them into another VM's memory.
> > This mapping can take, for example, the form of
> > a BAR of a pci device, which I'll call here vhost-pci -
> > with bus address allowed
> > by VM1's IOMMU mappings being translated into
> > offsets within this BAR within VM2's physical
> > memory space.
> 
> I think it's sensible.
> 
> >
> > Since the translation can be a simple one, VM2
> > can perform it within its vhost-pci device driver.
> >
> > While this setup would be the most useful with polling,
> > VM1's ioeventfd can also be mapped to
> > another VM2's irqfd, and vice versa, such that VMs
> > can trigger interrupts to each other without need
> > for a helper thread on the host.
> >
> >
> > The resulting channel might look something like the following:
> >
> > +-- VM1 --+  +---VM2---+
> > | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC
> > +-+  +-+
> >
> > comparing the two diagrams, a vhost-user thread on the host is
> > no longer required, 

Re: [Qemu-devel] rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Michael S. Tsirkin
On Tue, Sep 01, 2015 at 03:03:12AM +, Varun Sethi wrote:
> Hi Michael,
> When you talk about VFIO in guest, is it with a purely emulated IOMMU in Qemu?

This can use the emulated IOMMU in Qemu.
That's probably fast enough if mappings are mostly static.
We can also add a PV-IOMMU if necessary.

> Also, I am not clear on the following points:
> 1. How transient memory would be mapped using BAR in the backend VM

The simplest way is that 
each update sends a vhost-user message. backend gets it and
mmaps it into backend QEMU and make it part of RAM memory slot.

Or - backend QEMU could detect a pagefault on access and get the
IOMMU from frontend QEMU - using vhost-user messages or
from shared memory.




> 2. How would the backend VM update the dirty page bitmap for the frontend VM
> 
> Regards
> Varun

The easiest to implement way is probably for backend QEMU to setup dirty 
tracking
for the relevant slot (upon getting vhost user message
from the frontend) then retrieve the dirty map
from kvm and record it in a shared memory region
(when do it? We could have an eventfd and/or vhost-user message to
trigger this from the frontend QEMU, or just use a timer).

An alternative is for backend VM to get access to dirty log
(e.g. map it within BAR) and update it directly in shared memory.
Seems like more work.

Marc-André Lureau recently sent patches to support passing
dirty log around, these would be useful.


> > -Original Message-
> > From: qemu-devel-bounces+varun.sethi=freescale@nongnu.org
> > [mailto:qemu-devel-bounces+varun.sethi=freescale@nongnu.org] On
> > Behalf Of Nakajima, Jun
> > Sent: Monday, August 31, 2015 1:36 PM
> > To: Michael S. Tsirkin
> > Cc: virtio-...@lists.oasis-open.org; Jan Kiszka;
> > claudio.font...@huawei.com; qemu-de...@nongnu.org; Linux
> > Virtualization; opnfv-tech-disc...@lists.opnfv.org
> > Subject: Re: [Qemu-devel] rfc: vhost user enhancements for vm2vm
> > communication
> > 
> > On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin 
> > wrote:
> > > Hello!
> > > During the KVM forum, we discussed supporting virtio on top of
> > > ivshmem. I have considered it, and came up with an alternative that
> > > has several advantages over that - please see below.
> > > Comments welcome.
> > 
> > Hi Michael,
> > 
> > I like this, and it should be able to achieve what I presented at KVM Forum
> > (vhost-user-shmem).
> > Comments below.
> > 
> > >
> > > -
> > >
> > > Existing solutions to userspace switching between VMs on the same host
> > > are vhost-user and ivshmem.
> > >
> > > vhost-user works by mapping memory of all VMs being bridged into the
> > > switch memory space.
> > >
> > > By comparison, ivshmem works by exposing a shared region of memory to
> > all VMs.
> > > VMs are required to use this region to store packets. The switch only
> > > needs access to this region.
> > >
> > > Another difference between vhost-user and ivshmem surfaces when
> > > polling is used. With vhost-user, the switch is required to handle
> > > data movement between VMs, if using polling, this means that 1 host
> > > CPU needs to be sacrificed for this task.
> > >
> > > This is easiest to understand when one of the VMs is used with VF
> > > pass-through. This can be schematically shown below:
> > >
> > > +-- VM1 --++---VM2---+
> > > | virtio-pci  +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU 
> > > -- NIC
> > > +-++-+
> > >
> > >
> > > With ivshmem in theory communication can happen directly, with two VMs
> > > polling the shared memory region.
> > >
> > >
> > > I won't spend time listing advantages of vhost-user over ivshmem.
> > > Instead, having identified two advantages of ivshmem over vhost-user,
> > > below is a proposal to extend vhost-user to gain the advantages of
> > > ivshmem.
> > >
> > >
> > > 1: virtio in guest can be extended to allow support for IOMMUs. This
> > > provides guest with full flexibility about memory which is readable or
> > > write able by each device.
> > 
> > I assume that you meant VFIO only for virtio by "use of VFIO".  To get VFIO
> > working for general direct-I/O (including VFs) in guests, as you know, we
> > need to virtualize IOMMU (e.g. VT-d) and the interrupt remapping table on
> > x86 (i.e. nested VT-d).
> > 
> > > By setting up a virtio device for each other VM we need to communicate
> > > to, guest gets full control of its security, from mapping all memory
> > > (like with current vhost-user) to only mapping buffers used for
> > > networking (like ivshmem) to transient mappings for the duration of
> > > data transfer only.
> > 
> > And I think that we can use VMFUNC to have such transient mappings.
> > 
> > > This also allows use of VFIO within guests, for improved security.
> > >
> > > vhost user would need to be extended to send the mappings programmed
> > > by guest IOMMU.
> > 
> > Right. We need to think about cases where other 

Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-09-01 10:01, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
>> Leaving all the implementation and interface details aside, this
>> discussion is first of all about two fundamentally different approaches:
>> static shared memory windows vs. dynamically remapped shared windows (a
>> third one would be copying in the hypervisor, but I suppose we all agree
>> that the whole exercise is about avoiding that). Which way do we want or
>> have to go?
>>
>> Jan
> 
> Dynamic is a superset of static: you can always make it static if you
> wish. Static has the advantage of simplicity, but that's lost once you
> realize you need to invent interfaces to make it work.  Since we can use
> existing IOMMU interfaces for the dynamic one, what's the disadvantage?

Complexity. Having to emulate even more of an IOMMU in the hypervisor
(we already have to do a bit for VT-d IR in Jailhouse) and doing this
per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
sense, generic grant tables would be more appealing. But what we would
actually need is an interface that is only *optionally* configured by a
guest for dynamic scenarios, otherwise preconfigured by the hypervisor
for static setups. And we need guests that support both. That's the
challenge.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Michael S. Tsirkin
On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
> On 2015-09-01 10:01, Michael S. Tsirkin wrote:
> > On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
> >> Leaving all the implementation and interface details aside, this
> >> discussion is first of all about two fundamentally different approaches:
> >> static shared memory windows vs. dynamically remapped shared windows (a
> >> third one would be copying in the hypervisor, but I suppose we all agree
> >> that the whole exercise is about avoiding that). Which way do we want or
> >> have to go?
> >>
> >> Jan
> > 
> > Dynamic is a superset of static: you can always make it static if you
> > wish. Static has the advantage of simplicity, but that's lost once you
> > realize you need to invent interfaces to make it work.  Since we can use
> > existing IOMMU interfaces for the dynamic one, what's the disadvantage?
> 
> Complexity. Having to emulate even more of an IOMMU in the hypervisor
> (we already have to do a bit for VT-d IR in Jailhouse) and doing this
> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
> sense, generic grant tables would be more appealing.

That's not how we do things for KVM, PV features need to be
modular and interchangeable with emulation.

If you just want something that's cross-platform and easy to
implement, just build a PV IOMMU. Maybe use virtio for this.

> But what we would
> actually need is an interface that is only *optionally* configured by a
> guest for dynamic scenarios, otherwise preconfigured by the hypervisor
> for static setups. And we need guests that support both. That's the
> challenge.
> 
> Jan

That's already there for IOMMUs: vfio does the static setup by default,
enabling iommu by guests is optional.

> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-09-01 18:02, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 05:34:37PM +0200, Jan Kiszka wrote:
>> On 2015-09-01 16:34, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote:
 On 2015-09-01 11:24, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
>> On 2015-09-01 10:01, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
 Leaving all the implementation and interface details aside, this
 discussion is first of all about two fundamentally different 
 approaches:
 static shared memory windows vs. dynamically remapped shared windows (a
 third one would be copying in the hypervisor, but I suppose we all 
 agree
 that the whole exercise is about avoiding that). Which way do we want 
 or
 have to go?

 Jan
>>>
>>> Dynamic is a superset of static: you can always make it static if you
>>> wish. Static has the advantage of simplicity, but that's lost once you
>>> realize you need to invent interfaces to make it work.  Since we can use
>>> existing IOMMU interfaces for the dynamic one, what's the disadvantage?
>>
>> Complexity. Having to emulate even more of an IOMMU in the hypervisor
>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this
>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
>> sense, generic grant tables would be more appealing.
>
> That's not how we do things for KVM, PV features need to be
> modular and interchangeable with emulation.

 I know, and we may have to make some compromise for Jailhouse if that
 brings us valuable standardization and broad guest support. But we will
 surely not support an arbitrary amount of IOMMU models for that reason.

>
> If you just want something that's cross-platform and easy to
> implement, just build a PV IOMMU. Maybe use virtio for this.

 That is likely required to keep the complexity manageable and to allow
 static preconfiguration.
>>>
>>> Real IOMMU allow static configuration just fine. This is exactly
>>> what VFIO uses.
>>
>> Please specify more precisely which feature in which IOMMU you are
>> referring to. Also, given that you refer to VFIO, I suspect we have
>> different thing in mind. I'm talking about an IOMMU device model, like
>> the one we have in QEMU now for VT-d. That one is not at all
>> preconfigured by the host for VFIO.
> 
> I really just mean that VFIO creates a mostly static IOMMU configuration.
> 
> It's configured by the guest, not the host.

OK, that resolves my confusion.

> 
> I don't see host control over configuration as being particularly important.

We do, see below.

> 
> 
>>>
 Well, we could declare our virtio-shmem device to be an IOMMU device
 that controls access of a remote VM to RAM of the one that owns the
 device. In the static case, this access may at most be enabled/disabled
 but not moved around. The static regions would have to be discoverable
 for the VM (register read-back), and the guest's firmware will likely
 have to declare those ranges reserved to the guest OS.
 In the dynamic case, the guest would be able to create an alternative
 mapping.
>>>
>>>
>>> I don't think we want a special device just to support the
>>> static case. It might be a bit less code to write, but
>>> eventually it should be up to the guest.
>>> Fundamentally, it's policy that host has no business
>>> dictating.
>>
>> "A bit less" is to be validated, and I doubt its just "a bit". But if
>> KVM and its guests will also support some PV-IOMMU that we can reuse for
>> our scenarios, than that is fine. KVM would not have to mandate support
>> for it while we would, that's all.
> 
> Someone will have to do this work.
> 
>>>
 We would probably have to define a generic page table structure
 for that. Or do you rather have some MPU-like control structure in mind,
 more similar to the memory region descriptions vhost is already using?
>>>
>>> I don't care much. Page tables use less memory if a lot of memory needs
>>> to be covered. OTOH if you want to use virtio (e.g. to allow command
>>> batching) that likely means commands to manipulate the IOMMU, and
>>> maintaining it all on the host. You decide.
>>
>> I don't care very much about the dynamic case as we won't support it
>> anyway. However, if the configuration concept used for it is applicable
>> to static mode as well, then we could reuse it. But preconfiguration
>> will required register-based region description, I suspect.
> 
> I don't know what you mean by preconfiguration exactly.
> 
> Do you want the host to configure the IOMMU? Why not let the
> guest do this?

We simply freeze GPA-to-HPA mappings during runtime. Avoids having to
validate and synchronize guest-triggered changes.

>>>

Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Michael S. Tsirkin
On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote:
> On 2015-09-01 11:24, Michael S. Tsirkin wrote:
> > On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
> >> On 2015-09-01 10:01, Michael S. Tsirkin wrote:
> >>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
>  Leaving all the implementation and interface details aside, this
>  discussion is first of all about two fundamentally different approaches:
>  static shared memory windows vs. dynamically remapped shared windows (a
>  third one would be copying in the hypervisor, but I suppose we all agree
>  that the whole exercise is about avoiding that). Which way do we want or
>  have to go?
> 
>  Jan
> >>>
> >>> Dynamic is a superset of static: you can always make it static if you
> >>> wish. Static has the advantage of simplicity, but that's lost once you
> >>> realize you need to invent interfaces to make it work.  Since we can use
> >>> existing IOMMU interfaces for the dynamic one, what's the disadvantage?
> >>
> >> Complexity. Having to emulate even more of an IOMMU in the hypervisor
> >> (we already have to do a bit for VT-d IR in Jailhouse) and doing this
> >> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
> >> sense, generic grant tables would be more appealing.
> > 
> > That's not how we do things for KVM, PV features need to be
> > modular and interchangeable with emulation.
> 
> I know, and we may have to make some compromise for Jailhouse if that
> brings us valuable standardization and broad guest support. But we will
> surely not support an arbitrary amount of IOMMU models for that reason.
> 
> > 
> > If you just want something that's cross-platform and easy to
> > implement, just build a PV IOMMU. Maybe use virtio for this.
> 
> That is likely required to keep the complexity manageable and to allow
> static preconfiguration.

Real IOMMU allow static configuration just fine. This is exactly
what VFIO uses.

> Well, we could declare our virtio-shmem device to be an IOMMU device
> that controls access of a remote VM to RAM of the one that owns the
> device. In the static case, this access may at most be enabled/disabled
> but not moved around. The static regions would have to be discoverable
> for the VM (register read-back), and the guest's firmware will likely
> have to declare those ranges reserved to the guest OS.
> In the dynamic case, the guest would be able to create an alternative
> mapping.


I don't think we want a special device just to support the
static case. It might be a bit less code to write, but
eventually it should be up to the guest.
Fundamentally, it's policy that host has no business
dictating.

> We would probably have to define a generic page table structure
> for that. Or do you rather have some MPU-like control structure in mind,
> more similar to the memory region descriptions vhost is already using?

I don't care much. Page tables use less memory if a lot of memory needs
to be covered. OTOH if you want to use virtio (e.g. to allow command
batching) that likely means commands to manipulate the IOMMU, and
maintaining it all on the host. You decide.


> Also not yet clear to me are how the vhost-pci device and the
> translations it will have to do should look like for VM2.

I think we can use vhost-pci BAR + VM1 bus address as the
VM2 physical address. In other words, all memory exposed to
virtio-pci by VM1 through it's IOMMU is mapped into BAR of
vhost-pci.

Bus addresses can be validated to make sure they fit
in the BAR.


One issue to consider is that VM1 can trick VM2 into writing
into bus address that isn't mapped in the IOMMU, or
is mapped read-only.
We probably would have to teach KVM to handle this somehow,
e.g. exit to QEMU, or even just ignore. Maybe notify guest
e.g. by setting a bit in the config space of the device,
to avoid easy DOS.



> > 
> >> But what we would
> >> actually need is an interface that is only *optionally* configured by a
> >> guest for dynamic scenarios, otherwise preconfigured by the hypervisor
> >> for static setups. And we need guests that support both. That's the
> >> challenge.
> >>
> >> Jan
> > 
> > That's already there for IOMMUs: vfio does the static setup by default,
> > enabling iommu by guests is optional.
> 
> Cannot follow yet how vfio comes into play regarding some preconfigured
> virtual IOMMU.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-09-01 11:24, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
>> On 2015-09-01 10:01, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
 Leaving all the implementation and interface details aside, this
 discussion is first of all about two fundamentally different approaches:
 static shared memory windows vs. dynamically remapped shared windows (a
 third one would be copying in the hypervisor, but I suppose we all agree
 that the whole exercise is about avoiding that). Which way do we want or
 have to go?

 Jan
>>>
>>> Dynamic is a superset of static: you can always make it static if you
>>> wish. Static has the advantage of simplicity, but that's lost once you
>>> realize you need to invent interfaces to make it work.  Since we can use
>>> existing IOMMU interfaces for the dynamic one, what's the disadvantage?
>>
>> Complexity. Having to emulate even more of an IOMMU in the hypervisor
>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this
>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
>> sense, generic grant tables would be more appealing.
> 
> That's not how we do things for KVM, PV features need to be
> modular and interchangeable with emulation.

I know, and we may have to make some compromise for Jailhouse if that
brings us valuable standardization and broad guest support. But we will
surely not support an arbitrary amount of IOMMU models for that reason.

> 
> If you just want something that's cross-platform and easy to
> implement, just build a PV IOMMU. Maybe use virtio for this.

That is likely required to keep the complexity manageable and to allow
static preconfiguration.

Well, we could declare our virtio-shmem device to be an IOMMU device
that controls access of a remote VM to RAM of the one that owns the
device. In the static case, this access may at most be enabled/disabled
but not moved around. The static regions would have to be discoverable
for the VM (register read-back), and the guest's firmware will likely
have to declare those ranges reserved to the guest OS.

In the dynamic case, the guest would be able to create an alternative
mapping. We would probably have to define a generic page table structure
for that. Or do you rather have some MPU-like control structure in mind,
more similar to the memory region descriptions vhost is already using?
Also not yet clear to me are how the vhost-pci device and the
translations it will have to do should look like for VM2.

> 
>> But what we would
>> actually need is an interface that is only *optionally* configured by a
>> guest for dynamic scenarios, otherwise preconfigured by the hypervisor
>> for static setups. And we need guests that support both. That's the
>> challenge.
>>
>> Jan
> 
> That's already there for IOMMUs: vfio does the static setup by default,
> enabling iommu by guests is optional.

Cannot follow yet how vfio comes into play regarding some preconfigured
virtual IOMMU.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Michael S. Tsirkin
On Tue, Sep 01, 2015 at 05:34:37PM +0200, Jan Kiszka wrote:
> On 2015-09-01 16:34, Michael S. Tsirkin wrote:
> > On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote:
> >> On 2015-09-01 11:24, Michael S. Tsirkin wrote:
> >>> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
>  On 2015-09-01 10:01, Michael S. Tsirkin wrote:
> > On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
> >> Leaving all the implementation and interface details aside, this
> >> discussion is first of all about two fundamentally different 
> >> approaches:
> >> static shared memory windows vs. dynamically remapped shared windows (a
> >> third one would be copying in the hypervisor, but I suppose we all 
> >> agree
> >> that the whole exercise is about avoiding that). Which way do we want 
> >> or
> >> have to go?
> >>
> >> Jan
> >
> > Dynamic is a superset of static: you can always make it static if you
> > wish. Static has the advantage of simplicity, but that's lost once you
> > realize you need to invent interfaces to make it work.  Since we can use
> > existing IOMMU interfaces for the dynamic one, what's the disadvantage?
> 
>  Complexity. Having to emulate even more of an IOMMU in the hypervisor
>  (we already have to do a bit for VT-d IR in Jailhouse) and doing this
>  per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
>  sense, generic grant tables would be more appealing.
> >>>
> >>> That's not how we do things for KVM, PV features need to be
> >>> modular and interchangeable with emulation.
> >>
> >> I know, and we may have to make some compromise for Jailhouse if that
> >> brings us valuable standardization and broad guest support. But we will
> >> surely not support an arbitrary amount of IOMMU models for that reason.
> >>
> >>>
> >>> If you just want something that's cross-platform and easy to
> >>> implement, just build a PV IOMMU. Maybe use virtio for this.
> >>
> >> That is likely required to keep the complexity manageable and to allow
> >> static preconfiguration.
> > 
> > Real IOMMU allow static configuration just fine. This is exactly
> > what VFIO uses.
> 
> Please specify more precisely which feature in which IOMMU you are
> referring to. Also, given that you refer to VFIO, I suspect we have
> different thing in mind. I'm talking about an IOMMU device model, like
> the one we have in QEMU now for VT-d. That one is not at all
> preconfigured by the host for VFIO.

I really just mean that VFIO creates a mostly static IOMMU configuration.

It's configured by the guest, not the host.

I don't see host control over configuration as being particularly important.


> > 
> >> Well, we could declare our virtio-shmem device to be an IOMMU device
> >> that controls access of a remote VM to RAM of the one that owns the
> >> device. In the static case, this access may at most be enabled/disabled
> >> but not moved around. The static regions would have to be discoverable
> >> for the VM (register read-back), and the guest's firmware will likely
> >> have to declare those ranges reserved to the guest OS.
> >> In the dynamic case, the guest would be able to create an alternative
> >> mapping.
> > 
> > 
> > I don't think we want a special device just to support the
> > static case. It might be a bit less code to write, but
> > eventually it should be up to the guest.
> > Fundamentally, it's policy that host has no business
> > dictating.
> 
> "A bit less" is to be validated, and I doubt its just "a bit". But if
> KVM and its guests will also support some PV-IOMMU that we can reuse for
> our scenarios, than that is fine. KVM would not have to mandate support
> for it while we would, that's all.

Someone will have to do this work.

> > 
> >> We would probably have to define a generic page table structure
> >> for that. Or do you rather have some MPU-like control structure in mind,
> >> more similar to the memory region descriptions vhost is already using?
> > 
> > I don't care much. Page tables use less memory if a lot of memory needs
> > to be covered. OTOH if you want to use virtio (e.g. to allow command
> > batching) that likely means commands to manipulate the IOMMU, and
> > maintaining it all on the host. You decide.
> 
> I don't care very much about the dynamic case as we won't support it
> anyway. However, if the configuration concept used for it is applicable
> to static mode as well, then we could reuse it. But preconfiguration
> will required register-based region description, I suspect.

I don't know what you mean by preconfiguration exactly.

Do you want the host to configure the IOMMU? Why not let the
guest do this?


> > 
> >> Also not yet clear to me are how the vhost-pci device and the
> >> translations it will have to do should look like for VM2.
> > 
> > I think we can use vhost-pci BAR + VM1 bus address as the
> > VM2 physical address. In other words, all memory exposed to
> > 

Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-09-01 16:34, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote:
>> On 2015-09-01 11:24, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
 On 2015-09-01 10:01, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
>> Leaving all the implementation and interface details aside, this
>> discussion is first of all about two fundamentally different approaches:
>> static shared memory windows vs. dynamically remapped shared windows (a
>> third one would be copying in the hypervisor, but I suppose we all agree
>> that the whole exercise is about avoiding that). Which way do we want or
>> have to go?
>>
>> Jan
>
> Dynamic is a superset of static: you can always make it static if you
> wish. Static has the advantage of simplicity, but that's lost once you
> realize you need to invent interfaces to make it work.  Since we can use
> existing IOMMU interfaces for the dynamic one, what's the disadvantage?

 Complexity. Having to emulate even more of an IOMMU in the hypervisor
 (we already have to do a bit for VT-d IR in Jailhouse) and doing this
 per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
 sense, generic grant tables would be more appealing.
>>>
>>> That's not how we do things for KVM, PV features need to be
>>> modular and interchangeable with emulation.
>>
>> I know, and we may have to make some compromise for Jailhouse if that
>> brings us valuable standardization and broad guest support. But we will
>> surely not support an arbitrary amount of IOMMU models for that reason.
>>
>>>
>>> If you just want something that's cross-platform and easy to
>>> implement, just build a PV IOMMU. Maybe use virtio for this.
>>
>> That is likely required to keep the complexity manageable and to allow
>> static preconfiguration.
> 
> Real IOMMU allow static configuration just fine. This is exactly
> what VFIO uses.

Please specify more precisely which feature in which IOMMU you are
referring to. Also, given that you refer to VFIO, I suspect we have
different thing in mind. I'm talking about an IOMMU device model, like
the one we have in QEMU now for VT-d. That one is not at all
preconfigured by the host for VFIO.

> 
>> Well, we could declare our virtio-shmem device to be an IOMMU device
>> that controls access of a remote VM to RAM of the one that owns the
>> device. In the static case, this access may at most be enabled/disabled
>> but not moved around. The static regions would have to be discoverable
>> for the VM (register read-back), and the guest's firmware will likely
>> have to declare those ranges reserved to the guest OS.
>> In the dynamic case, the guest would be able to create an alternative
>> mapping.
> 
> 
> I don't think we want a special device just to support the
> static case. It might be a bit less code to write, but
> eventually it should be up to the guest.
> Fundamentally, it's policy that host has no business
> dictating.

"A bit less" is to be validated, and I doubt its just "a bit". But if
KVM and its guests will also support some PV-IOMMU that we can reuse for
our scenarios, than that is fine. KVM would not have to mandate support
for it while we would, that's all.

> 
>> We would probably have to define a generic page table structure
>> for that. Or do you rather have some MPU-like control structure in mind,
>> more similar to the memory region descriptions vhost is already using?
> 
> I don't care much. Page tables use less memory if a lot of memory needs
> to be covered. OTOH if you want to use virtio (e.g. to allow command
> batching) that likely means commands to manipulate the IOMMU, and
> maintaining it all on the host. You decide.

I don't care very much about the dynamic case as we won't support it
anyway. However, if the configuration concept used for it is applicable
to static mode as well, then we could reuse it. But preconfiguration
will required register-based region description, I suspect.

> 
>> Also not yet clear to me are how the vhost-pci device and the
>> translations it will have to do should look like for VM2.
> 
> I think we can use vhost-pci BAR + VM1 bus address as the
> VM2 physical address. In other words, all memory exposed to
> virtio-pci by VM1 through it's IOMMU is mapped into BAR of
> vhost-pci.
> 
> Bus addresses can be validated to make sure they fit
> in the BAR.

Sounds simple but may become challenging for VMs that have many of such
devices (in order to connect to many possibly large VMs).

> 
> 
> One issue to consider is that VM1 can trick VM2 into writing
> into bus address that isn't mapped in the IOMMU, or
> is mapped read-only.
> We probably would have to teach KVM to handle this somehow,
> e.g. exit to QEMU, or even just ignore. Maybe notify guest
> e.g. by setting a bit in the config space of the device,
> to avoid easy DOS.

Well, that