Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 13:15, Michael S. Tsirkin wrote:
> Well that's not exactly true. I think we would like to make
> it possible to put virtio devices behind an IOMMU on x86,
> but if this means existing guests break, then many people won't be able
> to use this option: having to find out which kernel version your guest
> is running is a significant burden.
> 
> So on the host side, we need to detect guests that
> don't program the IOMMU and make QEMU ignore it.
> I think we need to figure out a way to do this
> before we commit to the guest change.

What is the usecase for putting virtio devices behind an IOMMU, apart from:

1) "because you can"

2) using VFIO within the guest

?  Case 1 can be ignored, and in case 2 the guest will do the right thing.

> Additionally, IOMMU overhead is very high when running within the VM.
> So for uses such as VFIO, we'd like a way to make something like
> iommu-pt the default.

That's not something that the kernel cares about.  It's just a
configuration issue.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Michael S. Tsirkin
On Thu, Oct 29, 2015 at 06:09:45PM -0700, Andy Lutomirski wrote:
> This switches virtio to use the DMA API unconditionally.  I'm sure
> it breaks things, but it seems to work on x86 using virtio-pci, with
> and without Xen, and using both the modern 1.0 variant and the
> legacy variant.
> 
> This appears to work on native and Xen x86_64 using both modern and
> legacy virtio-pci.  It also appears to work on arm and arm64.
> 
> It definitely won't work as-is on s390x, and I haven't been able to
> test Christian's patches because I can't get virtio-ccw to work in
> QEMU at all.  I don't know what I'm doing wrong.
> 
> It doesn't work on ppc64.  Ben, consider yourself pinged to send me
> a patch :)
> 
> It doesn't work on sparc64.  I didn't realize at Kernel Summit that
> sparc64 has the same problem as ppc64.
> 
> DaveM, for background, we're trying to fix virtio to use the DMA
> API.  That will require that every platform that uses virtio
> supplies valid DMA operations on devices that use virtio_ring.
> Unfortunately, QEMU historically ignores the IOMMU on virtio
> devices.
> 
> On x86, this isn't really a problem.  x86 has a nice way for the
> platform to describe which devices are behind an IOMMU, and QEMU
> will be adjusted accordingly.  The only thing that will break is a
> recently-added experimental mode.

Well that's not exactly true. I think we would like to make
it possible to put virtio devices behind an IOMMU on x86,
but if this means existing guests break, then many people won't be able
to use this option: having to find out which kernel version your guest
is running is a significant burden.


So on the host side, we need to detect guests that
don't program the IOMMU and make QEMU ignore it.
I think we need to figure out a way to do this
before we commit to the guest change.

Additionally, IOMMU overhead is very high when running within the VM.
So for uses such as VFIO, we'd like a way to make something like
iommu-pt the default.



> Ben's plan for powerpc is to add a quirk for existing virtio-pci
> devices and to eventually update the devicetree stuff to allow QEMU
> to tell the guest which devices use the IOMMU.
> 
> AFAICT sparc has a similar problem to powerpc.  DaveM, can you come
> up with a straightforward way to get sparc's DMA API to work
> correctly for virtio-pci devices?
> 
> NB: Sadly, the platforms I've successfully tested on don't include any
> big-endian platforms, so there could still be lurking endian problems.
> 
> Changes from v3:
>  - More big-endian fixes.
>  - Added better virtio-ring APIs that handle allocation and use them in
>virtio-mmio and virtio-pci.
>  - Switch to Michael's virtio-net patch.
> 
> Changes from v2:
>  - Fix vring_mapping_error incorrect argument
> 
> Changes from v1:
>  - Fix an endian conversion error causing a BUG to hit.
>  - Fix a DMA ordering issue (swiotlb=force works now).
>  - Minor cleanups.
> 
> Andy Lutomirski (5):
>   virtio_ring: Support DMA APIs
>   virtio_pci: Use the DMA API
>   virtio: Add improved queue allocation API
>   virtio_mmio: Use the DMA API
>   virtio_pci: Use the DMA API
> 
> Michael S. Tsirkin (1):
>   virtio-net: Stop doing DMA from the stack
> 
>  drivers/net/virtio_net.c   |  34 ++--
>  drivers/virtio/Kconfig |   2 +-
>  drivers/virtio/virtio_mmio.c   |  67 ++-
>  drivers/virtio/virtio_pci_common.h |   6 -
>  drivers/virtio/virtio_pci_legacy.c |  42 ++---
>  drivers/virtio/virtio_pci_modern.c |  61 ++-
>  drivers/virtio/virtio_ring.c   | 348 
> ++---
>  include/linux/virtio.h |  23 ++-
>  include/linux/virtio_ring.h|  35 
>  tools/virtio/linux/dma-mapping.h   |  17 ++
>  10 files changed, 426 insertions(+), 209 deletions(-)
>  create mode 100644 tools/virtio/linux/dma-mapping.h
> 
> -- 
> 2.4.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Benjamin Herrenschmidt
On Mon, 2015-11-09 at 18:18 -0800, Andy Lutomirski wrote:
> 
> /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF.
> */
> static const struct pci_device_id virtio_pci_id_table[] = {
>     { PCI_DEVICE(0x1af4, PCI_ANY_ID) },
>     { 0 }
> };
> 
> Can we match on that range?

We can, but the problem remains, how do we differenciate an existing
device that does bypass vs. a newer one that needs the IOMMU and thus
doesn't have the new "bypass" property in the device-tree.

Cheers,
Ben.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Benjamin Herrenschmidt
On Mon, 2015-11-09 at 16:46 -0800, Andy Lutomirski wrote:
> The problem here is that in some of the problematic cases the virtio
> driver may not even be loaded.  If someone runs an L1 guest with an
> IOMMU-bypassing virtio device and assigns it to L2 using vfio, then
> *boom* L1 crashes.  (Same if, say, DPDK gets used, I think.)
> 
> >
> > The only way out of this while keeping the "platform" stuff would be to
> > also bump some kind of version in the virtio config (or PCI header). I
> > have no other way to differenciate between "this is an old qemu that
> > doesn't do the 'bypass property' yet" from "this is a virtio device
> > that doesn't bypass".
> >
> > Any better idea ?
> 
> I'd suggest that, in the absence of the new DT binding, we assume that
> any PCI device with the virtio vendor ID is passthrough on powerpc.  I
> can do this in the virtio driver, but if it's in the platform code
> then vfio gets it right too (i.e. fails to load).

The problem is there isn't *a* virtio vendor ID. It's the RedHat vendor
ID which will be used by more than just virtio, so we need to
specifically list the devices.

Additionally, that still means that once we have a virtio device that
actually uses the iommu, powerpc will not work since the "workaround"
above will kick in.

The "in absence of the new DT binding" doesn't make that much sense.

Those platforms use device-trees defined since the dawn of ages by
actual open firmware implementations, they either have no iommu
representation in there (Macs, the platform code hooks it all up) or
have various properties related to the iommu but no concept of "bypass"
in there.

We can *add* a new property under some circumstances that indicates a
bypass on a per-device basis, however that doesn't completely solve it:

  - As I said above, what does the absence of that property mean ? An
old qemu that does bypass on all virtio or a new qemu trying to tell
you that the virtio device actually does use the iommu (or some other
environment that isn't qemu) ?

  - On things like macs, the device-tree is generated by openbios, it
would have to have some added logic to try to figure that out, which
means it needs to know *via different means* that some or all virtio
devices bypass the iommu.

I thus go back to my original statement, it's a LOT easier to handle if
the device itself is self describing, indicating whether it is set to
bypass a host iommu or not. For L1->L2, well, that wouldn't be the
first time qemu/VFIO plays tricks with the passed through device
configuration space...

Note that the above can be solved via some kind of compromise: The
device self describes the ability to honor the iommu, along with the
property (or ACPI table entry) that indicates whether or not it does.

IE. We could use the revision or ProgIf field of the config space for
example. Or something in virtio config. If it's an "old" device, we
know it always bypass. If it's a new device, we know it only bypasses
if the corresponding property is in. I still would have to sort out the
openbios case for mac among others but it's at least a workable
direction.

BTW. Don't you have a similar problem on x86 that today qemu claims
that everything honors the iommu in ACPI ?

Unless somebody can come up with a better idea...

Cheers,
Ben.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Benjamin Herrenschmidt
On Mon, 2015-11-09 at 18:18 -0800, Andy Lutomirski wrote:
> 
> Which leaves the special case of Xen, where even preexisting devices
> don't bypass the IOMMU.  Can we keep this specific to powerpc and
> sparc?  On x86, this problem is basically nonexistent, since the IOMMU
> is properly self-describing.
> 
> IOW, I think that on x86 we should assume that all virtio devices
> honor the IOMMU.

You don't like performances ? :-)

Cheers,
Ben.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Andy Lutomirski
On Mon, Nov 9, 2015 at 9:28 PM, Benjamin Herrenschmidt
 wrote:
> On Mon, 2015-11-09 at 18:18 -0800, Andy Lutomirski wrote:
>>
>> /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF.
>> */
>> static const struct pci_device_id virtio_pci_id_table[] = {
>> { PCI_DEVICE(0x1af4, PCI_ANY_ID) },
>> { 0 }
>> };
>>
>> Can we match on that range?
>
> We can, but the problem remains, how do we differenciate an existing
> device that does bypass vs. a newer one that needs the IOMMU and thus
> doesn't have the new "bypass" property in the device-tree.
>

We could do it the other way around: on powerpc, if a PCI device is in
that range and doesn't have the "bypass" property at all, then it's
assumed to bypass the IOMMU.  This means that everything that
currently works continues working.  If someone builds a physical
virtio device or uses another system in PCIe target mode speaking
virtio, then it won't work until they upgrade their firmware to set
bypass=0.  Meanwhile everyone using hypothetical new QEMU also gets
bypass=0 and no ambiguity.

vfio will presumably notice the bypass and correctly refuse to map any
current virtio devices.

Would that work?

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Andy Lutomirski
On Mon, Nov 9, 2015 at 9:26 PM, Benjamin Herrenschmidt
 wrote:
> On Mon, 2015-11-09 at 18:18 -0800, Andy Lutomirski wrote:
>>
>> Which leaves the special case of Xen, where even preexisting devices
>> don't bypass the IOMMU.  Can we keep this specific to powerpc and
>> sparc?  On x86, this problem is basically nonexistent, since the IOMMU
>> is properly self-describing.
>>
>> IOW, I think that on x86 we should assume that all virtio devices
>> honor the IOMMU.
>
> You don't like performances ? :-)

This should have basically no effect.  Every non-experimental x86
virtio setup in existence either doesn't work at all (Xen) or has DMA
ops that are no-ops.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Andy Lutomirski
On Mon, Nov 9, 2015 at 6:04 PM, Benjamin Herrenschmidt
 wrote:
> On Mon, 2015-11-09 at 16:46 -0800, Andy Lutomirski wrote:
>> The problem here is that in some of the problematic cases the virtio
>> driver may not even be loaded.  If someone runs an L1 guest with an
>> IOMMU-bypassing virtio device and assigns it to L2 using vfio, then
>> *boom* L1 crashes.  (Same if, say, DPDK gets used, I think.)
>>
>> >
>> > The only way out of this while keeping the "platform" stuff would be to
>> > also bump some kind of version in the virtio config (or PCI header). I
>> > have no other way to differenciate between "this is an old qemu that
>> > doesn't do the 'bypass property' yet" from "this is a virtio device
>> > that doesn't bypass".
>> >
>> > Any better idea ?
>>
>> I'd suggest that, in the absence of the new DT binding, we assume that
>> any PCI device with the virtio vendor ID is passthrough on powerpc.  I
>> can do this in the virtio driver, but if it's in the platform code
>> then vfio gets it right too (i.e. fails to load).
>
> The problem is there isn't *a* virtio vendor ID. It's the RedHat vendor
> ID which will be used by more than just virtio, so we need to
> specifically list the devices.

Really?

/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
static const struct pci_device_id virtio_pci_id_table[] = {
{ PCI_DEVICE(0x1af4, PCI_ANY_ID) },
{ 0 }
};

Can we match on that range?

>
> Additionally, that still means that once we have a virtio device that
> actually uses the iommu, powerpc will not work since the "workaround"
> above will kick in.

I don't know how to solve that problem, though, especially since the
vendor of such a device (especially if it's real hardware) might not
set any new bit.

>
> The "in absence of the new DT binding" doesn't make that much sense.
>
> Those platforms use device-trees defined since the dawn of ages by
> actual open firmware implementations, they either have no iommu
> representation in there (Macs, the platform code hooks it all up) or
> have various properties related to the iommu but no concept of "bypass"
> in there.
>
> We can *add* a new property under some circumstances that indicates a
> bypass on a per-device basis, however that doesn't completely solve it:
>
>   - As I said above, what does the absence of that property mean ? An
> old qemu that does bypass on all virtio or a new qemu trying to tell
> you that the virtio device actually does use the iommu (or some other
> environment that isn't qemu) ?
>
>   - On things like macs, the device-tree is generated by openbios, it
> would have to have some added logic to try to figure that out, which
> means it needs to know *via different means* that some or all virtio
> devices bypass the iommu.
>
> I thus go back to my original statement, it's a LOT easier to handle if
> the device itself is self describing, indicating whether it is set to
> bypass a host iommu or not. For L1->L2, well, that wouldn't be the
> first time qemu/VFIO plays tricks with the passed through device
> configuration space...

Which leaves the special case of Xen, where even preexisting devices
don't bypass the IOMMU.  Can we keep this specific to powerpc and
sparc?  On x86, this problem is basically nonexistent, since the IOMMU
is properly self-describing.

IOW, I think that on x86 we should assume that all virtio devices
honor the IOMMU.

>
> Note that the above can be solved via some kind of compromise: The
> device self describes the ability to honor the iommu, along with the
> property (or ACPI table entry) that indicates whether or not it does.
>
> IE. We could use the revision or ProgIf field of the config space for
> example. Or something in virtio config. If it's an "old" device, we
> know it always bypass. If it's a new device, we know it only bypasses
> if the corresponding property is in. I still would have to sort out the
> openbios case for mac among others but it's at least a workable
> direction.
>
> BTW. Don't you have a similar problem on x86 that today qemu claims
> that everything honors the iommu in ACPI ?

Only on a single experimental configuration, and that can apparently
just be fixed going forward without any real problems being caused.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Jan Kiszka
On 2015-11-10 03:18, Andy Lutomirski wrote:
> On Mon, Nov 9, 2015 at 6:04 PM, Benjamin Herrenschmidt
>> I thus go back to my original statement, it's a LOT easier to handle if
>> the device itself is self describing, indicating whether it is set to
>> bypass a host iommu or not. For L1->L2, well, that wouldn't be the
>> first time qemu/VFIO plays tricks with the passed through device
>> configuration space...
> 
> Which leaves the special case of Xen, where even preexisting devices
> don't bypass the IOMMU.  Can we keep this specific to powerpc and
> sparc?  On x86, this problem is basically nonexistent, since the IOMMU
> is properly self-describing.
> 
> IOW, I think that on x86 we should assume that all virtio devices
> honor the IOMMU.

>From the guest driver POV, that is OK because either there is no IOMMU
to program (the current situation with qemu), there can be one that
doesn't need it (the current situation with qemu and iommu=on) or there
is (Xen) or will be (future qemu) one that requires it.

> 
>>
>> Note that the above can be solved via some kind of compromise: The
>> device self describes the ability to honor the iommu, along with the
>> property (or ACPI table entry) that indicates whether or not it does.
>>
>> IE. We could use the revision or ProgIf field of the config space for
>> example. Or something in virtio config. If it's an "old" device, we
>> know it always bypass. If it's a new device, we know it only bypasses
>> if the corresponding property is in. I still would have to sort out the
>> openbios case for mac among others but it's at least a workable
>> direction.
>>
>> BTW. Don't you have a similar problem on x86 that today qemu claims
>> that everything honors the iommu in ACPI ?
> 
> Only on a single experimental configuration, and that can apparently
> just be fixed going forward without any real problems being caused.

BTW, I once tried to describe the current situation on QEMU x86 with
IOMMU enabled via ACPI. While you can easily add IOMMU device exceptions
to the static tables, the fun starts when considering device hotplug for
virtio. Unless I missed some trick, ACPI doesn't seem like being
designed for that level of flexibility.

You would have to reserve a complete PCI bus, declare that one as not
being IOMMU-governed, and then only add new virtio devices to that bus.
Possible, but a lot of restrictions that existing management software
would have to be aware of as well.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Benjamin Herrenschmidt
So ...

I've finally tried to sort that out for powerpc and I can't find a way
to make that work that isn't a complete pile of stinking shit.

I'm very tempted to go back to my original idea: virtio itself should
indicate it's "bypassing ability" via the virtio config space or some
other bit (like the ProgIf of the PCI header etc...).

I don't see how I can make it work otherwise.

The problem with the statement "it's a platform matter" is that:

  - It's not entirely correct. It's actually a feature of a specific
virtio implementation (qemu's) that it bypasses all the platform IOMMU
facilities.

  - The platforms (on powerpc there's at least 3 or 4 that have qemu
emulation and support some form of PCI) don't have an existing way to
convey the information that a device bypasses the IOMMU (if any).

  - Even if we add such a mechanism (new property in the device-tree),
we end up with a big turd: Because we need to be compatible with older
qemus, we essentially need a quirk that will make all virtio devices
assume that property is present. That will of course break whenever we
try to use another implementation of virtio on powerpc which doesn't
bypass the iommu.

We don't have a way to differenciate a virtio device that does the
bypass from one that doesn't and the backward compatibility requirement
forces us to treat all existing virtio devices as doing bypass.

The only way out of this while keeping the "platform" stuff would be to
also bump some kind of version in the virtio config (or PCI header). I
have no other way to differenciate between "this is an old qemu that
doesn't do the 'bypass property' yet" from "this is a virtio device
that doesn't bypass".

Any better idea ?

Cheers,
Ben.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Andy Lutomirski
On Mon, Nov 9, 2015 at 2:58 PM, Benjamin Herrenschmidt
 wrote:
> So ...
>
> I've finally tried to sort that out for powerpc and I can't find a way
> to make that work that isn't a complete pile of stinking shit.
>
> I'm very tempted to go back to my original idea: virtio itself should
> indicate it's "bypassing ability" via the virtio config space or some
> other bit (like the ProgIf of the PCI header etc...).
>
> I don't see how I can make it work otherwise.
>
> The problem with the statement "it's a platform matter" is that:
>
>   - It's not entirely correct. It's actually a feature of a specific
> virtio implementation (qemu's) that it bypasses all the platform IOMMU
> facilities.
>
>   - The platforms (on powerpc there's at least 3 or 4 that have qemu
> emulation and support some form of PCI) don't have an existing way to
> convey the information that a device bypasses the IOMMU (if any).
>
>   - Even if we add such a mechanism (new property in the device-tree),
> we end up with a big turd: Because we need to be compatible with older
> qemus, we essentially need a quirk that will make all virtio devices
> assume that property is present. That will of course break whenever we
> try to use another implementation of virtio on powerpc which doesn't
> bypass the iommu.
>
> We don't have a way to differenciate a virtio device that does the
> bypass from one that doesn't and the backward compatibility requirement
> forces us to treat all existing virtio devices as doing bypass.

The problem here is that in some of the problematic cases the virtio
driver may not even be loaded.  If someone runs an L1 guest with an
IOMMU-bypassing virtio device and assigns it to L2 using vfio, then
*boom* L1 crashes.  (Same if, say, DPDK gets used, I think.)

>
> The only way out of this while keeping the "platform" stuff would be to
> also bump some kind of version in the virtio config (or PCI header). I
> have no other way to differenciate between "this is an old qemu that
> doesn't do the 'bypass property' yet" from "this is a virtio device
> that doesn't bypass".
>
> Any better idea ?

I'd suggest that, in the absence of the new DT binding, we assume that
any PCI device with the virtio vendor ID is passthrough on powerpc.  I
can do this in the virtio driver, but if it's in the platform code
then vfio gets it right too (i.e. fails to load).

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization