Re: [PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64

2016-10-21 Thread Auger Eric
Hi Will,

On 20/10/2016 19:32, Will Deacon wrote:
> Hi Eric,
> 
> Thanks for posting this.
> 
> On Wed, Oct 12, 2016 at 01:22:08PM +, Eric Auger wrote:
>> This is the second respin on top of Robin's series [1], addressing Alex' 
>> comments.
>>
>> Major changes are:
>> - MSI-doorbell API now is moved to DMA IOMMU API following Alex suggestion
>>   to put all API pieces at the same place (so eventually in the IOMMU
>>   subsystem)
>> - new iommu_domain_msi_resv struct and accessor through DOMAIN_ATTR_MSI_RESV
>>   domain with mirror VFIO capability
>> - more robustness I think in the VFIO layer
>> - added "iommu/iova: fix __alloc_and_insert_iova_range" since with the 
>> current
>>   code I failed allocating an IOVA page in a single page domain with upper 
>> part
>>   reserved
>>
>> IOVA range exclusion will be handled in a separate series
>>
>> The priority really is to discuss and freeze the API and especially the MSI
>> doorbell's handling. Do we agree to put that in DMA IOMMU?
>>
>> Note: the size computation does not take into account possible page overlaps
>> between doorbells but it would add quite a lot of complexity i think.
>>
>> Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.
> 
> Marc, Robin and I sat down and had a look at the series and, whilst it's
> certainly addressing a problem that we desperately want to see fixed, we
> think that it's slightly over-engineering in places and could probably
> be simplified in the interest of getting something upstream that can be
> used as a base, on which the ABI can be extended as concrete use-cases
> become clear.
> 
> Stepping back a minute, we're trying to reserve some of the VFIO virtual
> address space so that it can be used by devices to map their MSI doorbells
> using the SMMU. With your patches, this requires that (a) the kernel
> tells userspace about the size and alignment of the doorbell region
> (MSI_RESV) and (b) userspace tells the kernel the VA-range that can be
> used (RESERVED_MSI_IOVA).
> 
> However, this is all special-cased for MSI doorbells and there are
> potentially other regions of the VFIO address space that are reserved
> and need to be communicated to userspace as well. We already know of
> hardware where the PCI RC intercepts p2p accesses before they make it
> to the SMMU, and other hardware where the MSI doorbell is at a fixed
> address. This means that we need a mechanism to communicate *fixed*
> regions of virtual address space that are reserved by VFIO. I don't
> even particularly care if VFIO_MAP_DMA enforces that, but we do need
> a way to tell userspace "hey, you don't want to put memory here because
> it won't work well with devices".

I think we all agree on this. Exposing an API to the user space
reporting *fixed* reserved IOVA ranges is a requirement anyway. The
problem was quite clearly stated by Alex in
http://lkml.iu.edu/hypermail/linux/kernel/1610.0/03308.html
(VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE)

I started working on this VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
capability but to me and I think according to Alex, it was a different
API from MSI_RESV.

> 
> In that case, we end up with something like your MSI_RESV capability,
> but actually specifying a virtual address range that is simply not to
> be used by MAP_DMA -- we don't say anything about MSIs. Now, taking this
> to its logical conclusion, we no longer need to distinguish between
> remappable reserved regions and fixed reserved regions in the ABI.
> Instead, we can have the kernel allocate the virtual address space for
> the remappable reserved regions (probably somewhere in the bottom 4GB)
> and expose them via the capability.


If I understand correctly you want the host to arbitrarily choose where
it puts the iovas reserved for MSI and not ask the userspace.

Well so we are back to the discussions we had in Dec 2015 (see Marc's
answer in http://thread.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858).

- So I guess you will init an iova_domain seomewhere below the 4GB to
allocate the MSIs. what size are you going to choose. Don't you have the
same need to dimension the iova range.
- we still need to assess the MSI assignment safety. How will we compute
safety for VFIO?

 This simplifies things in the
> following ways:
> 
>   * You don't need to keep track of MSI vs DMA addresses in the VFIO rbtree
right: I guess you rely on iommu_map to return an error in case the iova
is already mapped somewhere else.
>   * You don't need to try collapsing doorbells into a single region
why? at host level I guess you will init a single iova domain?
>   * You don't need a special MAP flavour to map MSI doorbells
right
>   * The ABI is reusable for PCI p2p and fixed doorbells
right

Aren't we moving the issue at user-space? Currently QEMU mach-virt
address space is fully static. Adapting mach-virt to adjust to host
constraints is not straightforward. It is simple to reject the
assignment in case of collision but more difficult to react 

Re: [PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64

2016-10-17 Thread Punit Agrawal
Auger Eric  writes:

> Hi Punit,
>
> On 14/10/2016 13:24, Punit Agrawal wrote:
>> Hi Eric,
>> 
>> I am a bit late in joining, but I've tried to familiarise
>> myself with earlier discussions on the series.
>> 
>> Eric Auger  writes:
>> 
>>> This is the second respin on top of Robin's series [1], addressing Alex' 
>>> comments.
>>>
>>> Major changes are:
>>> - MSI-doorbell API now is moved to DMA IOMMU API following Alex suggestion
>>>   to put all API pieces at the same place (so eventually in the IOMMU
>>>   subsystem)
>> 
>> IMHO, this is headed in the opposite direction, i.e., away from the
>> owner of the information - the doorbells are the property of the MSI
>> controller. The MSI controllers know the location, size and interrupt
>> remapping capability as well. On the consumer side, VFIO needs access to
>> the doorbells to allow userspace to carve out a region in the IOVA.
>> 
>> I quite liked what you had in v13, though I think you can go further
>> though. Instead of adding new doorbell API [un]registration calls, how
>> about adding a callback to the irq_domain_ops? The callback will be
>> populated for irqdomains registered by MSI controllers.
>
> Thank you for jumping into that thread. Any help/feedback is greatly
> appreciated.
>
> Regarding your suggestion, the irq_domain looks dedicated to the
> translation between linux irq and HW irq. I tend to think adding an ops
> to retrieve the MSI doorbell info at that level looks far from the
> original goal of the infrastructure. Obviously the fact there is a list
> of such domain is tempting but I preferred to add a separate struct and
> separate list.

Guilty as charged - the suggestion was definitely borne out of
convenience. I originally looked at adding this functionality to the MSI
domains but couldn't find an obvious way to enumerate all of them.

Also, the way the domain hierarchy is setup for MSI controllers, the
actual doorbell is not available to the MSI domains.

>
> In the v14 release I moved the "doorbell API" in the dma-iommu API since
> Alex recommended to offer a unified API where all pieces would be at the
> same place.
>
> Anyway I will follow the guidance of maintainers.

Ok, makes sense.

>
>
>> 
>> From VFIO, we can calculate the required aperture reservation by
>> iterating over the irqdomains (something like irq_domain_for_each). The
>> same callback can also provide information about support for interrupt
>> remapping.
>> 
>> For systems where there are no separate MSI controllers, i.e., the IOMMU
>> has a fixed reservation, no MSI callbacks will be populated - which
>> tells userspace that no separate MSI reservation is required. IIUC, this
>> was one of Alex' concerns on the prior version.
>
> I'am working on a separate series to report to the user-space the usable
> IOVA range(s).

Is that the alternative to reporting the aperture size that needs to be
reserved?

>
> Thanks
>
> Eric
>> 
>> Thoughts, opinions?
>> 
>> Punit
>> 
>>> - new iommu_domain_msi_resv struct and accessor through DOMAIN_ATTR_MSI_RESV
>>>   domain with mirror VFIO capability
>>> - more robustness I think in the VFIO layer
>>> - added "iommu/iova: fix __alloc_and_insert_iova_range" since with the 
>>> current
>>>   code I failed allocating an IOVA page in a single page domain with upper 
>>> part
>>>   reserved
>>>
>>> IOVA range exclusion will be handled in a separate series
>>>
>>> The priority really is to discuss and freeze the API and especially the MSI
>>> doorbell's handling. Do we agree to put that in DMA IOMMU?
>>>
>>> Note: the size computation does not take into account possible page overlaps
>>> between doorbells but it would add quite a lot of complexity i think.
>>>
>>> Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.
>>>
>>> dependency:
>>> the series depends on Robin's generic-v7 branch:
>>> [1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
>>> http://www.spinics.net/lists/arm-kernel/msg531110.html
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> Git: complete series available at
>>> https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-v14
>>>
>>> the above branch includes a temporary patch to work around a ThunderX pci
>>> bus reset crash (which I think unrelated to this series):
>>> "vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
>>> Do not take this one for other platforms.
>>>
>>>
>>> Eric Auger (15):
>>>   iommu/iova: fix __alloc_and_insert_iova_range
>>>   iommu: Introduce DOMAIN_ATTR_MSI_RESV
>>>   iommu/dma: MSI doorbell alloc/free
>>>   iommu/dma: Introduce iommu_calc_msi_resv
>>>   iommu/arm-smmu: Implement domain_get_attr for DOMAIN_ATTR_MSI_RESV
>>>   irqchip/gic-v2m: Register the MSI doorbell
>>>   irqchip/gicv3-its: Register the MSI doorbell
>>>   vfio: Introduce a vfio_dma type field
>>>   vfio/type1: vfio_find_dma accepting a type argument
>>>   vfio/type1: Implement recursive vfio_find_dma_from_node
>>>   vfio/type1: 

Re: [PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64

2016-10-17 Thread Auger Eric
Hi Punit,

On 14/10/2016 13:24, Punit Agrawal wrote:
> Hi Eric,
> 
> I am a bit late in joining, but I've tried to familiarise
> myself with earlier discussions on the series.
> 
> Eric Auger  writes:
> 
>> This is the second respin on top of Robin's series [1], addressing Alex' 
>> comments.
>>
>> Major changes are:
>> - MSI-doorbell API now is moved to DMA IOMMU API following Alex suggestion
>>   to put all API pieces at the same place (so eventually in the IOMMU
>>   subsystem)
> 
> IMHO, this is headed in the opposite direction, i.e., away from the
> owner of the information - the doorbells are the property of the MSI
> controller. The MSI controllers know the location, size and interrupt
> remapping capability as well. On the consumer side, VFIO needs access to
> the doorbells to allow userspace to carve out a region in the IOVA.
> 
> I quite liked what you had in v13, though I think you can go further
> though. Instead of adding new doorbell API [un]registration calls, how
> about adding a callback to the irq_domain_ops? The callback will be
> populated for irqdomains registered by MSI controllers.

Thank you for jumping into that thread. Any help/feedback is greatly
appreciated.

Regarding your suggestion, the irq_domain looks dedicated to the
translation between linux irq and HW irq. I tend to think adding an ops
to retrieve the MSI doorbell info at that level looks far from the
original goal of the infrastructure. Obviously the fact there is a list
of such domain is tempting but I preferred to add a separate struct and
separate list.

In the v14 release I moved the "doorbell API" in the dma-iommu API since
Alex recommended to offer a unified API where all pieces would be at the
same place.

Anyway I will follow the guidance of maintainers.


> 
> From VFIO, we can calculate the required aperture reservation by
> iterating over the irqdomains (something like irq_domain_for_each). The
> same callback can also provide information about support for interrupt
> remapping.
> 
> For systems where there are no separate MSI controllers, i.e., the IOMMU
> has a fixed reservation, no MSI callbacks will be populated - which
> tells userspace that no separate MSI reservation is required. IIUC, this
> was one of Alex' concerns on the prior version.

I'am working on a separate series to report to the user-space the usable
IOVA range(s).

Thanks

Eric
> 
> Thoughts, opinions?
> 
> Punit
> 
>> - new iommu_domain_msi_resv struct and accessor through DOMAIN_ATTR_MSI_RESV
>>   domain with mirror VFIO capability
>> - more robustness I think in the VFIO layer
>> - added "iommu/iova: fix __alloc_and_insert_iova_range" since with the 
>> current
>>   code I failed allocating an IOVA page in a single page domain with upper 
>> part
>>   reserved
>>
>> IOVA range exclusion will be handled in a separate series
>>
>> The priority really is to discuss and freeze the API and especially the MSI
>> doorbell's handling. Do we agree to put that in DMA IOMMU?
>>
>> Note: the size computation does not take into account possible page overlaps
>> between doorbells but it would add quite a lot of complexity i think.
>>
>> Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.
>>
>> dependency:
>> the series depends on Robin's generic-v7 branch:
>> [1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
>> http://www.spinics.net/lists/arm-kernel/msg531110.html
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-v14
>>
>> the above branch includes a temporary patch to work around a ThunderX pci
>> bus reset crash (which I think unrelated to this series):
>> "vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
>> Do not take this one for other platforms.
>>
>>
>> Eric Auger (15):
>>   iommu/iova: fix __alloc_and_insert_iova_range
>>   iommu: Introduce DOMAIN_ATTR_MSI_RESV
>>   iommu/dma: MSI doorbell alloc/free
>>   iommu/dma: Introduce iommu_calc_msi_resv
>>   iommu/arm-smmu: Implement domain_get_attr for DOMAIN_ATTR_MSI_RESV
>>   irqchip/gic-v2m: Register the MSI doorbell
>>   irqchip/gicv3-its: Register the MSI doorbell
>>   vfio: Introduce a vfio_dma type field
>>   vfio/type1: vfio_find_dma accepting a type argument
>>   vfio/type1: Implement recursive vfio_find_dma_from_node
>>   vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
>>   vfio: Allow reserved msi iova registration
>>   vfio/type1: Check doorbell safety
>>   iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
>>   vfio/type1: Introduce MSI_RESV capability
>>
>> Robin Murphy (1):
>>   iommu/dma: Allow MSI-only cookies
>>
>>  drivers/iommu/Kconfig|   4 +-
>>  drivers/iommu/arm-smmu-v3.c  |  10 +-
>>  drivers/iommu/arm-smmu.c |  10 +-
>>  drivers/iommu/dma-iommu.c| 184 ++
>>  drivers/iommu/iova.c |   2 +-
>>  drivers/irqchip/irq-gic-v2m.c

Re: [PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64

2016-10-14 Thread Punit Agrawal
Hi Eric,

I am a bit late in joining, but I've tried to familiarise
myself with earlier discussions on the series.

Eric Auger  writes:

> This is the second respin on top of Robin's series [1], addressing Alex' 
> comments.
>
> Major changes are:
> - MSI-doorbell API now is moved to DMA IOMMU API following Alex suggestion
>   to put all API pieces at the same place (so eventually in the IOMMU
>   subsystem)

IMHO, this is headed in the opposite direction, i.e., away from the
owner of the information - the doorbells are the property of the MSI
controller. The MSI controllers know the location, size and interrupt
remapping capability as well. On the consumer side, VFIO needs access to
the doorbells to allow userspace to carve out a region in the IOVA.

I quite liked what you had in v13, though I think you can go further
though. Instead of adding new doorbell API [un]registration calls, how
about adding a callback to the irq_domain_ops? The callback will be
populated for irqdomains registered by MSI controllers.

>From VFIO, we can calculate the required aperture reservation by
iterating over the irqdomains (something like irq_domain_for_each). The
same callback can also provide information about support for interrupt
remapping.

For systems where there are no separate MSI controllers, i.e., the IOMMU
has a fixed reservation, no MSI callbacks will be populated - which
tells userspace that no separate MSI reservation is required. IIUC, this
was one of Alex' concerns on the prior version.

Thoughts, opinions?

Punit

> - new iommu_domain_msi_resv struct and accessor through DOMAIN_ATTR_MSI_RESV
>   domain with mirror VFIO capability
> - more robustness I think in the VFIO layer
> - added "iommu/iova: fix __alloc_and_insert_iova_range" since with the current
>   code I failed allocating an IOVA page in a single page domain with upper 
> part
>   reserved
>
> IOVA range exclusion will be handled in a separate series
>
> The priority really is to discuss and freeze the API and especially the MSI
> doorbell's handling. Do we agree to put that in DMA IOMMU?
>
> Note: the size computation does not take into account possible page overlaps
> between doorbells but it would add quite a lot of complexity i think.
>
> Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.
>
> dependency:
> the series depends on Robin's generic-v7 branch:
> [1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
> http://www.spinics.net/lists/arm-kernel/msg531110.html
>
> Best Regards
>
> Eric
>
> Git: complete series available at
> https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-v14
>
> the above branch includes a temporary patch to work around a ThunderX pci
> bus reset crash (which I think unrelated to this series):
> "vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
> Do not take this one for other platforms.
>
>
> Eric Auger (15):
>   iommu/iova: fix __alloc_and_insert_iova_range
>   iommu: Introduce DOMAIN_ATTR_MSI_RESV
>   iommu/dma: MSI doorbell alloc/free
>   iommu/dma: Introduce iommu_calc_msi_resv
>   iommu/arm-smmu: Implement domain_get_attr for DOMAIN_ATTR_MSI_RESV
>   irqchip/gic-v2m: Register the MSI doorbell
>   irqchip/gicv3-its: Register the MSI doorbell
>   vfio: Introduce a vfio_dma type field
>   vfio/type1: vfio_find_dma accepting a type argument
>   vfio/type1: Implement recursive vfio_find_dma_from_node
>   vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
>   vfio: Allow reserved msi iova registration
>   vfio/type1: Check doorbell safety
>   iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
>   vfio/type1: Introduce MSI_RESV capability
>
> Robin Murphy (1):
>   iommu/dma: Allow MSI-only cookies
>
>  drivers/iommu/Kconfig|   4 +-
>  drivers/iommu/arm-smmu-v3.c  |  10 +-
>  drivers/iommu/arm-smmu.c |  10 +-
>  drivers/iommu/dma-iommu.c| 184 ++
>  drivers/iommu/iova.c |   2 +-
>  drivers/irqchip/irq-gic-v2m.c|  10 +-
>  drivers/irqchip/irq-gic-v3-its.c |  13 ++
>  drivers/vfio/vfio_iommu_type1.c  | 279 
> +--
>  include/linux/dma-iommu.h|  59 +
>  include/linux/iommu.h|   8 ++
>  include/uapi/linux/vfio.h|  30 -
>  11 files changed, 587 insertions(+), 22 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64

2016-10-12 Thread Eric Auger
This is the second respin on top of Robin's series [1], addressing Alex' 
comments.

Major changes are:
- MSI-doorbell API now is moved to DMA IOMMU API following Alex suggestion
  to put all API pieces at the same place (so eventually in the IOMMU
  subsystem)
- new iommu_domain_msi_resv struct and accessor through DOMAIN_ATTR_MSI_RESV
  domain with mirror VFIO capability
- more robustness I think in the VFIO layer
- added "iommu/iova: fix __alloc_and_insert_iova_range" since with the current
  code I failed allocating an IOVA page in a single page domain with upper part
  reserved

IOVA range exclusion will be handled in a separate series

The priority really is to discuss and freeze the API and especially the MSI
doorbell's handling. Do we agree to put that in DMA IOMMU?

Note: the size computation does not take into account possible page overlaps
between doorbells but it would add quite a lot of complexity i think.

Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.

dependency:
the series depends on Robin's generic-v7 branch:
[1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
http://www.spinics.net/lists/arm-kernel/msg531110.html

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-v14

the above branch includes a temporary patch to work around a ThunderX pci
bus reset crash (which I think unrelated to this series):
"vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
Do not take this one for other platforms.


Eric Auger (15):
  iommu/iova: fix __alloc_and_insert_iova_range
  iommu: Introduce DOMAIN_ATTR_MSI_RESV
  iommu/dma: MSI doorbell alloc/free
  iommu/dma: Introduce iommu_calc_msi_resv
  iommu/arm-smmu: Implement domain_get_attr for DOMAIN_ATTR_MSI_RESV
  irqchip/gic-v2m: Register the MSI doorbell
  irqchip/gicv3-its: Register the MSI doorbell
  vfio: Introduce a vfio_dma type field
  vfio/type1: vfio_find_dma accepting a type argument
  vfio/type1: Implement recursive vfio_find_dma_from_node
  vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
  vfio: Allow reserved msi iova registration
  vfio/type1: Check doorbell safety
  iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
  vfio/type1: Introduce MSI_RESV capability

Robin Murphy (1):
  iommu/dma: Allow MSI-only cookies

 drivers/iommu/Kconfig|   4 +-
 drivers/iommu/arm-smmu-v3.c  |  10 +-
 drivers/iommu/arm-smmu.c |  10 +-
 drivers/iommu/dma-iommu.c| 184 ++
 drivers/iommu/iova.c |   2 +-
 drivers/irqchip/irq-gic-v2m.c|  10 +-
 drivers/irqchip/irq-gic-v3-its.c |  13 ++
 drivers/vfio/vfio_iommu_type1.c  | 279 +--
 include/linux/dma-iommu.h|  59 +
 include/linux/iommu.h|   8 ++
 include/uapi/linux/vfio.h|  30 -
 11 files changed, 587 insertions(+), 22 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu