Re: [PATCH v9 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes

2016-06-09 Thread Auger Eric
Alex,
> On Wed, 8 Jun 2016 10:29:35 +0200
> Auger Eric <eric.au...@linaro.org> wrote:
> 
>> Dear all,
>> Le 20/05/2016 à 18:01, Eric Auger a écrit :
>>> Alex, Robin,
>>>
>>> While my 3 part series primarily addresses the problematic of mapping
>>> MSI doorbells into arm-smmu, it fails in :
>>>
>>> 1) determining whether the MSI controller is downstream or upstream to
>>> the IOMMU,  
>>> => indicates whether the MSI doorbell must be mapped
>>> => participates in the decision about 2)  
>>>
>>> 2) determining whether it is safe to assign a PCIe device.
>>>
>>> I think we share this understanding with Robin. All above of course
>>> stands for ARM.
>>>
>>> I get stuck with those 2 issues and I have few questions about iommu
>>> group setup, PCIe, iommu dt/ACPI description. I would be grateful to you
>>> if you could answer part of those questions and advise about the
>>> strategy to fix those.  
>>
>> gentle reminder about the questions below; hope I did not miss any reply.
>> If anybody has some time to spent on this topic...
>>
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> QUESTIONS:
>>>
>>> 1) Robin, you pointed some host controllers which also are MSI
>>> controllers
>>> (http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268). In
>>> that case MSIs never reach the IOMMU. I failed in finding anything about
>>> MSIs in PCIe ACS spec. What should be the iommu groups in that
>>> situation. Isn't the upstreamed code able to see some DMA transfers are
>>> not properly isolated and alias devices in the same group? According to
>>> your security warning, Alex, I would think the code does not recognize
>>> it, can you confirm please?  
>> my current understanding is end points would be in separate groups (assuming
>> ACS support) although MSI controller frame is not properly protected.
> 
> We don't currently consider MSI differently from other DMA and we don't
> currently have any sort of concept of a device within the intermediate
> fabric as being a DMA target.  We expect fabric devices to only be
> transaction routers.  We use ACS to determine whether there's any
> possibility of DMA being redirected before it reaches the IOMMU, but it
> seems that a DMA being consumed by an interrupt controller before it
> reaches the IOMMU would be another cause for an isolation breach.
>  
OK thank you for the confirmation
>>> 2) can other PCIe components be MSI controllers?
> 
> I'm not even entirely sure what this means.  Would a DMA write from an
> endpoint target the MMIO space of an intermediate, fabric device?
With the example provided by Robin we have a host controller acting as
an MSI controller. I wondered whether we could have some other fabric
devices (downstream to the host controller in PCIe terminology) also
likely to act as MSI controllers.
>  
>>> 3) Am I obliged to consider arbitrary topologies where an MSI controller
>>> stands between the PCIe host and the iommu? in the PCIe space or
>>> platform space? If this only relates to PCIe couldn' I check if an MSI
>>> controller exists in the PCIe tree?  
>> In my last series, I consider the assignment of platform device unsafe as
>> soon as there is a GICv2m. This is a change in the user experience compared 
>> to
>> what we have before.
> 
> If the MSI controller is downstream of our DMA translation, it doesn't
> seem like we have much choice but to mark it unsafe.  The endpoint is
> fully able to attempt to exploit it.
OK the orginal question was related to non PCIe topologies:

- we know some PCIe fabric topologies where the PCIe host controller
implements MSI controller.
- Shall we be prepared to address the same kind of issues with platform
MSI controllers. Are there some SOCs where we would put an unsafe MSI
platform controller before IOMMU translation. Or do we consider it is a
platform topology we don't support for assignment?

>  
>>> 4) Robin suggested in a private thread to enumerate through a list of
>>> "registered" doorbells and if any belongs to an unsafe MSI controller,
>>> consider the assignment is unsafe. This would be a first step before
>>> doing something more complex. Alex, would that be acceptable to you for
>>> issue #2?  
>> I implemented this technique in my last series waiting for more discussion
>> on 4, 5.
> 
> Seems sufficient.  I don't mind taking a broad swing versus all the
> extra complexity of defining which devices are safe vs unsa

Re: [RESEND PATCH v2 0/6] vfio-pci: Add support for mmapping MSI-X table

2016-06-08 Thread Auger Eric
Hi Yongji,

Le 02/06/2016 à 08:09, Yongji Xie a écrit :
> Current vfio-pci implementation disallows to mmap the page
> containing MSI-X table in case that users can write directly
> to MSI-X table and generate an incorrect MSIs.
> 
> However, this will cause some performance issue when there
> are some critical device registers in the same page as the 
> MSI-X table. We have to handle the mmio access to these
> registers in QEMU emulation rather than in guest.
> 
> To solve this issue, this series allows to expose MSI-X table
> to userspace when hardware enables the capability of interrupt
> remapping which can ensure that a given PCI device can only
> shoot the MSIs assigned for it. And we introduce a new bus_flags
> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
> for different archs.
> 
> The patch 3 are based on the proposed patchset[1].
You may have noticed I sent a respin of [1] yesterday:
http://www.gossamer-threads.com/lists/linux/kernel/2455187.

Unfortunately you will see I removed the patch defining the new
msi_domain_info MSI_FLAG_IRQ_REMAPPING flag you rely on in this series.
I did so because I was not using it anymore. At the beginning this was
used to detect whether the MSI assignment was safe but this
method was covering cases where the MSI controller was
upstream to the IOMMU. So now I rely on a mechanism where MSI controller
are supposed to register their MSI doorbells and tag whether it is safe.

I don't know yet how this change will be welcomed though. Depending
on reviews/discussions, might happen we revert to the previous flag.

If you need the feature you can embed the used patches in your series and
follow the review process separately. Sorry for the setback.

Best Regards

Eric
> 
> Changelog v2: 
> - Make the commit log more clear
> - Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
>   so that we could clearly know what the function does
> - Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
>   of iommu_bus_notifier()
> - Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
>   table so that we can know whether we allow to mmap MSI-X table
>   in QEMU
> 
> [1] 
> https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html
> 
> Yongji Xie (6):
>   PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
>   PCI: Set PCI_BUS_FLAGS_MSI_REMAP if MSI controller enables IRQ remapping
>   PCI: Set PCI_BUS_FLAGS_MSI_REMAP if IOMMU have capability of IRQ remapping
>   iommu: Set PCI_BUS_FLAGS_MSI_REMAP on iommu driver initialization
>   pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
>   vfio-pci: Allow to expose MSI-X table to userspace if interrupt remapping 
> is enabled
> 
>  arch/powerpc/platforms/powernv/pci-ioda.c |8 
>  drivers/iommu/iommu.c |8 
>  drivers/pci/msi.c |   15 +++
>  drivers/pci/probe.c   |7 +++
>  drivers/vfio/pci/vfio_pci.c   |   17 ++---
>  drivers/vfio/pci/vfio_pci_rdwr.c  |3 ++-
>  include/linux/msi.h   |5 -
>  include/linux/pci.h   |1 +
>  8 files changed, 59 insertions(+), 5 deletions(-)
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 4/8] vfio/type1: handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots

2016-06-07 Thread Auger Eric
Hello,
Le 07/06/2016 à 18:44, kbuild test robot a écrit :
> Hi,
> 
> [auto build test ERROR on vfio/next]
> [also build test ERROR on v4.7-rc2 next-20160607]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Eric-Auger/KVM-PCIe-MSI-passthrough-on-ARM-ARM64-kernel-part-3-3-vfio-changes/20160608-001148
> base:   https://github.com/awilliam/linux-vfio.git next
> config: x86_64-rhel (attached as .config)
> compiler: gcc-4.9 (Debian 4.9.3-14) 4.9.3
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>>> drivers/vfio/vfio_iommu_type1.c:39:29: fatal error: linux/msi-iommu.h: No 
>>> such file or directory
> #include 
this is due to the dependency on part2 (MSI layer changes).

Best Regards

Eric
> ^
>compilation terminated.
> 
> vim +39 drivers/vfio/vfio_iommu_type1.c
> 
> 33#include 
> 34#include 
> 35#include 
> 36#include 
> 37#include 
> 38#include 
>   > 39#include 
> 40
> 41#define DRIVER_VERSION  "0.2"
> 42#define DRIVER_AUTHOR   "Alex Williamson 
> "
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: request pcie devices to enable ACS

2016-06-13 Thread Auger Eric
Wei,
Le 13/06/2016 à 13:18, Robin Murphy a écrit :
> On 13/06/16 10:20, Wei Chen wrote:
>> The PCIe ACS capability will affect the layout of iommu groups.
>> Generally speaking, if the path from root port to the PCIe device
>> is ACS enabled, the iommu will create a single iommu group for this
>> PCIe device. If all PCIe devices on the path are ACS enabled then
>> Linux can determine this path is ACS enabled.
>>
>> Linux use two PCIe configuration registers to determine the ACS
>> status of PCIe devices:
>> ACS Capability Register and ACS Control Register.
>>
>> The first register is used to check the implementation of ACS function
>> of a PCIe device, the second register is used to check the enable status
>> of ACS function. If one PCIe device has implemented and enabled the ACS
>> function then Linux will determine this PCIe device enabled ACS.
>>
>>  From the Chapter:6.12 of PCI Express Base Specification Revision 3.1a,
>> we can find that when a PCIe device implements ACS function, the enable
>> status is set to disabled by default and can be enabled by ACS-aware
>> software.
>>
>> ACS will affect the iommu groups topology, so, the iommu driver is
>> ACS-aware software. This patch adds a call to pci_request_acs() to the
>> arm-smmu driver to enable the ACS function in PCIe devices that support
>> it.
nit: I would add ", when they get probed."

Besides Reviewed-by: Eric Auger 

Best Regards

Eric
>>
>> Signed-off-by: Wei Chen 
> 
> Makes sense to me:
> 
> Reviewed-by: Robin Murphy 
> 
> p.s. The confidential disclaimer is a good way to get patches ignored
> here on the lists - please check with Steve about getting set up on the
> appropriate SMTP server.
> 
> Robin.
> 
>> ---
>>   drivers/iommu/arm-smmu-v3.c | 2 ++
>>   drivers/iommu/arm-smmu.c| 4 +++-
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 94b6821..30ea899 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -2686,6 +2686,8 @@ static int __init arm_smmu_init(void)
>>  if (ret)
>>  return ret;
>>
>> +   pci_request_acs();
>> +
>>  return bus_set_iommu(_bus_type, _smmu_ops);
>>   }
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 9345a3f..ab365ec 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -2096,8 +2096,10 @@ static int __init arm_smmu_init(void)
>>   #endif
>>
>>   #ifdef CONFIG_PCI
>> -   if (!iommu_present(_bus_type))
>> +   if (!iommu_present(_bus_type)) {
>> +   pci_request_acs();
>>  bus_set_iommu(_bus_type, _smmu_ops);
>> +   }
>>   #endif
>>
>>  return 0;
>> -- 
>> 2.7.4
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose
>> the contents to any other person, use it for any purpose, or store or
>> copy the information in any medium. Thank you.
>>
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe

2016-07-22 Thread Auger Eric
Hi Thomas,
On 22/07/2016 14:44, Thomas Gleixner wrote:
> On Thu, 21 Jul 2016, Auger Eric wrote:
>> On 20/07/2016 10:12, Thomas Gleixner wrote:
>>> On Tue, 19 Jul 2016, Eric Auger wrote:
>>>> +bool msi_doorbell_safe(void)
>>>> +{
>>>> +  struct irqchip_doorbell *db;
>>>> +  bool irq_remapping = true;
>>>> +
>>>> +  mutex_lock(_doorbell_mutex);
>>>> +  list_for_each_entry(db, _doorbell_list, next) {
>>>> +  irq_remapping &= db->info.irq_remapping;
>>>
>>> db->info.irq_remapping is set in msi_doorbell_register(). So you can keep 
>>> book
>>> about that there. No need to iterate here.
>> Yes makes sense to store the info at registration time. Currently this
>> function is not in any fast path but that's cleaner from a general
>> perspective. I will need to do such iteration at un-registration though.
> 
> Two simple counter should be sufficient.
> 
>   nr_registered_bells;
>   nr_remapped_bells;

Yes definitively smarter to use counters! mental viscosity.

Thanks

Eric
> 
> 
> 
> Thanks,
> 
>   tglx
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/iova: validate iova_domain input to put_iova_domain

2016-07-14 Thread Auger Eric

Hi Robin, Nate,
On 14/07/2016 12:36, Robin Murphy wrote:
> On 14/07/16 09:34, Joerg Roedel wrote:
>> On Wed, Jul 13, 2016 at 02:49:32PM -0400, Nate Watterson wrote:
>>> Passing a NULL or uninitialized iova_domain into put_iova_domain
>>> will currently crash the kernel when the unconfigured iova_domain
>>> data members are accessed. To prevent this from occurring, this patch
>>> adds a check to make sure that the domain is non-NULL and that the
>>> domain granule is non-zero. The granule can be used to check if the
>>> domain was properly initialized because calling init_iova_domain
>>> with a granule of zero would have already triggered a BUG statement
>>> crashing the kernel.
>>
>> Have you seen real crashes happening because of this?

I also saw the crash happening with my PCIe passthrough series (not
upstreamed)
[PATCH v10 0/8] [PATCH v10 0/8] KVM PCIe/MSI passthrough on ARM/ARM64:
kernel part 1/3: iommu changes  https://lkml.org/lkml/2016/6/7/676

patch [PATCH v10 8/8] iommu/arm-smmu: get/put the msi cookie
also uses iommu_put_dma_cookie


and the uninitialised lock crash happens if the group gets destroyed
before the iommu_dma_init_domain is called, which can also happen for me.

> 
> It _can_ happen via the iommu-dma code if something goes wrong
> initialising a group - the IOVA domain gets allocated at the same time
> as the default IOMMU domain, but isn't initialised until later once the
> device in question gets ity dma ops set up. If adding the device to the
> group fails, everything gets torn down again and iommu_put_dma_cookie()
> ends up trying to take an uninitialised lock .
Cant' we allow the granule check also with UNMANAGED type?

Thanks

Eric

> 
> However, I think the appropriate fix for that particular situation would
> be more like this:
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ea5a9ebf0f78..d00d22930a6b 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -65,10 +65,11 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
>  {
> struct iova_domain *iovad = domain->iova_cookie;
> 
> -   if (!iovad)
> +   if (domain->type != IOMMU_DOMAIN_DMA || !iovad)
> return;
> 
> -   put_iova_domain(iovad);
> +   if (iovad->granule)
> +   put_iova_domain(iovad);
> kfree(iovad);
> domain->iova_cookie = NULL;
>  }
> 
> (It probably should have been that way from the start; mea culpa)
> 
> Robin.
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested

2016-07-26 Thread Auger Eric
Hi Thomas,

On 26/07/2016 11:04, Thomas Gleixner wrote:
> Eric,
> 
> On Mon, 25 Jul 2016, Auger Eric wrote:
>> On 20/07/2016 11:09, Thomas Gleixner wrote:
>>> On Tue, 19 Jul 2016, Eric Auger wrote:
>>>> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>>>>  {
>>>>int ret = 0;
>>>>  
>>>> -  if (erase)
>>>> +  if (erase) {
>>>>memset(msg, 0, sizeof(*msg));
>>>> -  else
>>>> +  } else {
>>>> +  struct device *dev;
>>>> +
>>>>ret = irq_chip_compose_msi_msg(irq_data, msg);
>>>> +  if (ret)
>>>> +  return ret;
>>>> +
>>>> +  dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
>>>> +  WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
>>>
>>> What the heck is this call doing? And why is there only a WARN_ON and not a
>>> proper error return code handling?
>>
>> iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I
>> of this series. This helper function detects the physical address found in
>> the MSI message has a corresponding allocated IOVA. This happens if the MSI
>> doorbell is accessed through an IOMMU and this IOMMU do not bypass the MSI
>> addresses (ARM case). Allocation of this IOVA was performed in the previous
>> patch.
>>
>> So, if this is the case, the physical address is swapped with the IOVA
>> address. That way the PCIe device will send the MSI with this IOVA and
>> the address will be translated by the IOMMU into the target MSI doorbell PA.
>>
>> Hope this clarifies
> 
> No, it does not. You are explaining in great length what that function is
> doing, but you are not explaining WHY your don't do a proper return code
> handling and just do a WARN_ON() and happily proceed. If that function fails
> then the interrupt will not be functional, so WHY on earth are you continuing?
Oh sorry I focused on the function's goal. Originally I could not return an
error since there is a BUG_ON(ret) afterwards. And typically the userspace can
willingly omit to pass IPA range that map MSIs. But now we have this 2 phases 
where
we first map the MSIs on pci_enable_msi_range and use the IOVA at compose time
I need to analyze again if the userspace can induce a BUG_ON.

Thanks

Eric
> 
> Thanks,
> 
>   tglx
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs

2016-07-26 Thread Auger Eric
Hi Thomas,

On 26/07/2016 11:00, Thomas Gleixner wrote:
> B1;2802;0cEric,
> 
> On Mon, 25 Jul 2016, Auger Eric wrote:
>> On 20/07/2016 11:04, Thomas Gleixner wrote:
>>> On Tue, 19 Jul 2016, Eric Auger wrote:
>>>> +  if (ret) {
>>>> +  for (; i >= 0; i--) {
>>>> +  struct irq_data *d = irq_get_irq_data(virq + i);
>>>> +
>>>> +  msi_handle_doorbell_mappings(d, false);
>>>> +  }
>>>> +  irq_domain_free_irqs(virq, desc->nvec_used);
>>>> +  desc->irq = 0;
>>>> +  goto error;
>>>
>>> How is that supposed to work? You clear desc->irq and then you call
>>> ops->handle_error.
>>
>> if I don't clear the desc->irq I enter an infinite loop in
>> pci_enable_msix_range.
>>
>> This happens because msix_capability_init and pcie_enable_msix returns 1.
>> In msix_capability_init, at out_avail: we enumerate the msi_desc which have
>> a non zero irq, hence the returned value equal to 1.
>>
>> Currently the only handle_error ops I found, pci_msi_domain_handle_error
>> does not use irq field so works although questionable.
> 
> The logic here is: If the allocation does not succeed for the requested number
> of interrupts, we tell the caller how many interrupts we were able to set up.
> So the caller can decide what to do.
> 
> In your case you don't want to have a partial allocation, so instead of
> playing silly games with desc->irq you should add a flag which tells the PCI
> code that you are not interested in a partial allocation and that it should
> return an error code instead.

In that case can we consider we even succeeded in allocating 1 MSI? In case the
IOMMU mapping fails, the MSI transaction will never reach the target MSI frame
so it is not usable. So when you mean "partial" I understand we did not succeed
in allocating maxvec IRQs, correct? Here we succeeded in allocating 0 IRQ and 
still
msi_capability_init returns 1.

msi_capability_init doc-comment says "a positive return value indicates the 
number of
interrupts which could have been allocated."

I understand allocation success currently only depends on the fact virq was 
allocated
and set to desc->irq. But with that IOMMU stuff doesn't the criteria changes?


> Something like PCI_DEV_FLAGS_MSI_NO_PARTIAL_ALLOC should do the trick.
> 
>> As for the irq_domain_free_irqs I think I can remove it since handled later.
> 
> Not only the free_irqs(). You should let the teardown function handle
> everything including your doorbell mapping teardown. It's nothing special and
> free_msi_irqs() at the end of msix_capability_init() will take care of it.
Yep I was forced to call free_irqs myself since free_msi_irqs was doing nothing
due the fact I resetted the irq field. Wrong thing loop ;-)

Thanks

Eric

> 
> Thanks,
> 
>   tglx
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs

2016-07-25 Thread Auger Eric
Hi Thomas,

On 20/07/2016 11:04, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>>  /**
>> + * msi_handle_doorbell_mappings: in case the irq data corresponds to an
>> + * MSI that requires iommu mapping, traverse the irq domain hierarchy
>> + * to retrieve the doorbells to handle and iommu_map/unmap them according
>> + * to @map boolean.
>> + *
>> + * @data: irq data handle
>> + * @map: mapping if true, unmapping if false
>> + */
> 
> 
> Please run that through the kernel doc generator. It does not work that way.
> 
> The format is:
> 
> /**
>  * function_name - Short function description
>  * @arg1: Description of arg1
>  * @argument2:Description of argument2
>  *
>  * Long explanation including documentation of the return values.
>  */
> 
>> +static int msi_handle_doorbell_mappings(struct irq_data *data, bool map)
>> +{
>> +const struct irq_chip_msi_doorbell_info *dbinfo;
>> +struct iommu_domain *domain;
>> +struct irq_chip *chip;
>> +struct device *dev;
>> +dma_addr_t iova;
>> +int ret = 0, cpu;
>> +
>> +while (data) {
>> +dev = msi_desc_to_dev(irq_data_get_msi_desc(data));
>> +domain = iommu_msi_domain(dev);
>> +if (domain) {
>> +chip = irq_data_get_irq_chip(data);
>> +if (chip->msi_doorbell_info)
>> +break;
>> +}
>> +data = data->parent_data;
>> +}
> 
> Please split that out into a seperate function
> 
> struct irq_data *msi_get_doorbell_info(data)
> {
>   .
>   if (chip->msi_doorbell_info)
>   return chip->msi_get_doorbell_info(data);
>   }
>   return NULL;
> }
> 
>info = msi_get_doorbell_info(data);
>.
> 
>> +if (!data)
>> +return 0;
>> +
>> +dbinfo = chip->msi_doorbell_info(data);
>> +if (!dbinfo)
>> +return -EINVAL;
>> +
>> +if (!dbinfo->doorbell_is_percpu) {
>> +if (!map) {
>> +iommu_msi_put_doorbell_iova(domain,
>> +dbinfo->global_doorbell);
>> +return 0;
>> +}
>> +return iommu_msi_get_doorbell_iova(domain,
>> +   dbinfo->global_doorbell,
>> +   dbinfo->size, dbinfo->prot,
>> +   );
>> +}
> 
> You can spare an indentation level with a helper function
> 
>   if (!dbinfo->doorbell_is_percpu)
>   return msi_map_global_doorbell(domain, dbinfo);
> 
>> +
>> +/* percpu doorbells */
>> +for_each_possible_cpu(cpu) {
>> +phys_addr_t __percpu *db_addr =
>> +per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
>> +
>> +if (!map) {
>> +iommu_msi_put_doorbell_iova(domain, *db_addr);
>> +} else {
>> +
>> +ret = iommu_msi_get_doorbell_iova(domain, *db_addr,
>> +  dbinfo->size,
>> +  dbinfo->prot, );
>> +if (ret)
>> +return ret;
>> +}
>> +}
> 
> Same here:
> 
>   for_each_possible_cpu(cpu) {
>   ret = msi_map_percpu_doorbell(domain, cpu);
>   if (ret)
>   return ret;
>   }
>   return 0;
>  
> Hmm?
> 
>> +
>> +return 0;
>> +}
>> +
>> +/**
>>   * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
>>   * @domain: The domain to allocate from
>>   * @dev:Pointer to device struct of the device for which the interrupts
>> @@ -352,17 +423,29 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, 
>> struct device *dev,
>>  
>>  virq = __irq_domain_alloc_irqs(domain, virq, desc->nvec_used,
>> dev_to_node(dev), , false);
>> -if (virq < 0) {
>> -ret = -ENOSPC;
>> -if (ops->handle_error)
>> -ret = ops->handle_error(domain, desc, ret);
>> -if (ops->msi_finish)
>> -ops->msi_finish(, ret);
>> -return ret;
>> -}
>> +if (virq < 0)
>> +goto error;
>>  
>>  for (i = 0; i < desc->nvec_used; i++)
>>  irq_set_msi_desc_off(virq, i, desc);
>> +
>> +for (i = 0; i < desc->nvec_used; i++) {
>> +struct irq_data *d = irq_get_irq_data(virq + i);
>> +
>> +ret = msi_handle_doorbell_mappings(d, true);
>> +if (ret)
>> +break;
>> +}
>> +if (ret) {
>> +for (; i >= 0; i--) {
>> +struct 

Re: [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested

2016-07-25 Thread Auger Eric
Hi Thomas,

On 20/07/2016 11:09, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
> 
> First of all - valid for all patches:
> 
> Subject: sys/subsys: Sentence starts with an uppercase letter
OK understood.
> 
> Now for this particular one:
> 
> genirq/msi: use the MSI doorbell's IOVA when requested
> 
>> On MSI message composition we now use the MSI doorbell's IOVA in
>> place of the doorbell's PA in case the device is upstream to an
>> IOMMU that requires MSI addresses to be mapped. The doorbell's
>> allocation and mapping happened on an early stage (pci_enable_msi).
> 
> This changelog is completely useless. At least I cannot figure out what that
> patch actually does. And the implementation is not self explaining either.

>  
>> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>>  {
>>  int ret = 0;
>>  
>> -if (erase)
>> +if (erase) {
>>  memset(msg, 0, sizeof(*msg));
>> -else
>> +} else {
>> +struct device *dev;
>> +
>>  ret = irq_chip_compose_msi_msg(irq_data, msg);
>> +if (ret)
>> +return ret;
>> +
>> +dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
>> +WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
> 
> What the heck is this call doing? And why is there only a WARN_ON and not a
> proper error return code handling?

iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I of
this series. This helper function detects the physical address found in the
MSI message has a corresponding allocated IOVA. This happens if the MSI doorbell
is accessed through an IOMMU and this IOMMU do not bypass the MSI addresses
(ARM case). Allocation of this IOVA was performed in the previous patch.

So, if this is the case, the physical address is swapped with the IOVA
address. That way the PCIe device will send the MSI with this IOVA and
the address will be translated by the IOMMU into the target MSI doorbell PA.

Hope this clarifies

Thanks

Eric 
> 
> Thanks,
> 
>   tglx
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 4/8] vfio/type1: handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots

2016-07-25 Thread Auger Eric
Hi,

On 24/07/2016 03:41, kbuild test robot wrote:
> Hi,
> 
> [auto build test ERROR on vfio/next]
> [also build test ERROR on v4.7-rc7 next-20160722]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Eric-Auger/KVM-PCIe-MSI-passthrough-on-ARM-ARM64-kernel-part-3-3-vfio-changes/20160724-082318
> base:   https://github.com/awilliam/linux-vfio.git next
> config: x86_64-rhel (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>>> drivers/vfio/vfio_iommu_type1.c:39:29: fatal error: linux/msi-iommu.h: No 
>>> such file or directory
> #include 
> ^
>compilation terminated.
> 
> vim +39 drivers/vfio/vfio_iommu_type1.c
> 
> 33#include 
> 34#include 
> 35#include 
> 36#include 
> 37#include 
> 38#include 
>   > 39#include 
Dependency on part I of this series.

Thanks

Eric
> 40
> 41#define DRIVER_VERSION  "0.2"
> 42#define DRIVER_AUTHOR   "Alex Williamson 
> "
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Don't put uninitialised IOVA domains

2016-07-27 Thread Auger Eric
Hi,
On 27/07/2016 17:46, Robin Murphy wrote:
> Due to the limitations of having to wait until we see a device's DMA
> restrictions before we know how we want an IOVA domain initialised,
> there is a window for error if a DMA ops domain is allocated but later
> freed without ever being used. In that case, init_iova_domain() was
> never called, so calling put_iova_domain() from iommu_put_dma_cookie()
> ends up trying to take an uninitialised lock and crashing.
> 
> Make things robust by skipping the call unless the IOVA domain actually
> has been initialised, as we probably should have done from the start.
> 
> Reported-by: Nate Watterson 
> Signed-off-by: Robin Murphy 
> ---
> 
> I'm not sure this warrants a cc stable, as with the code currently in
> mainline it's only at all likely if other things have already failed
> elsewhere in a manner they should not be expected to.
> 
>  drivers/iommu/dma-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ea5a9ebf0f78..97a23082e18a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -68,7 +68,8 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
>   if (!iovad)
>   return;
>  
> - put_iova_domain(iovad);
> + if (iovad->granule)
> + put_iova_domain(iovad);
>   kfree(iovad);
>   domain->iova_cookie = NULL;
>  }
> 
Reviewed-by: Eric Auger 
Tested-by: Eric Auger 

Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 02/11] genirq/msi: msi_compose wrapper

2016-08-10 Thread Auger Eric
Hi Thomas,

On 09/08/2016 11:19, Thomas Gleixner wrote:
> On Tue, 2 Aug 2016, Eric Auger wrote:
> 
>> Currently the MSI message is composed by directly calling
>> irq_chip_compose_msi_msg and erased by setting the memory to zero.
>>
>> On some platforms, we will need to complexify this composition to
>> properly handle MSI emission through IOMMU. Also we will need to track
>> when the MSI message is erased.
> 
> I just can't find how you do that. After applying the series the
> 
>> +if (erase)
>> +memset(msg, 0, sizeof(*msg));
> 
> branch is still just a memset(). The wrapper is fine for the compose side, but
> having the extra argument just to wrap the memset() for no gain is silly.

Yes you're right: this was true in the first releases of the series
where the iommu mapping/unmapping were done at composition & erase time.
Now the mapping/unmapping is done on msi_domain_alloc/free_irqs, this is
not mandated anymore. I will keep the wrapper for the compose side and
remove the rest + update the commit message accordingly.

Thank you for your time.

Eric
> 
> Thanks,
> 
>   tglx
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages

2016-07-21 Thread Auger Eric
Hi Thomas,

On 19/07/2016 16:38, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>> msi_doorbell_pages sum up the number of iommu pages of a given order
> 
> adding () to the function name would make it immediately clear that
> msi_doorbell_pages is a function.
> 
>> +/**
>> + * msi_doorbell_pages: compute the number of iommu pages of size 1 << order
>> + * requested to map all the registered doorbells
>> + *
>> + * @order: iommu page order
>> + */
> 
> Why are you adding the kernel doc to the header and not to the implementation?

I am confused by this comment. I was told in the past that it was better
to put the comments in the API header. On your side do you want me to
move all function kernel-doc comments to the implementation.

Looking at kernel-doc-nano-HOWTO.txt, I was not able to find any
indication about the best choice.

I will now run the kernel-doc script to check the conformance of my
comments.

Thank you for your patience!

Best Regards

Eric
> 
>> +int msi_doorbell_pages(unsigned int order);
>> +
>>  #else
>>  
>>  static inline struct irq_chip_msi_doorbell_info *
>> @@ -47,6 +55,12 @@ msi_doorbell_register_global(phys_addr_t base, size_t 
>> size,
>>  static inline void
>>  msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db) {}
>>  
>> +static inline int
>> +msi_doorbell_pages(unsigned int order)
> 
> What's the point of this line break? 
> 
>> +{
>> +return 0;
>> +}
>> +
>>  #endif /* CONFIG_MSI_DOORBELL */
>>  
>>  #endif
>> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
>> index 0ff541e..a5bde37 100644
>> --- a/kernel/irq/msi-doorbell.c
>> +++ b/kernel/irq/msi-doorbell.c
>> @@ -60,3 +60,55 @@ void msi_doorbell_unregister_global(struct 
>> irq_chip_msi_doorbell_info *dbinfo)
>>  mutex_unlock(_doorbell_mutex);
>>  }
>>  EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
>> +
>> +static int compute_db_mapping_requirements(phys_addr_t addr, size_t size,
>> +   unsigned int order)
>> +{
>> +phys_addr_t offset, granule;
>> +unsigned int nb_pages;
>> +
>> +granule = (uint64_t)(1 << order);
>> +offset = addr & (granule - 1);
>> +size = ALIGN(size + offset, granule);
>> +nb_pages = size >> order;
>> +
>> +return nb_pages;
>> +}
>> +
>> +static int
>> +compute_dbinfo_mapping_requirements(struct irq_chip_msi_doorbell_info 
>> *dbinfo,
>> +unsigned int order)
> 
> I'm sure you can find even longer function names which require more line
> breaks.
> 
>> +{
>> +int ret = 0;
>> +
>> +if (!dbinfo->doorbell_is_percpu) {
>> +ret = compute_db_mapping_requirements(dbinfo->global_doorbell,
>> +  dbinfo->size, order);
>> +} else {
>> +phys_addr_t __percpu *pbase;
>> +int cpu;
>> +
>> +for_each_possible_cpu(cpu) {
>> +pbase = per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
>> +ret += compute_db_mapping_requirements(*pbase,
>> +   dbinfo->size,
>> +   order);
>> +}
>> +}
>> +return ret;
>> +}
>> +
>> +int msi_doorbell_pages(unsigned int order)
>> +{
>> +struct irqchip_doorbell *db;
>> +int ret = 0;
>> +
>> +mutex_lock(_doorbell_mutex);
>> +list_for_each_entry(db, _doorbell_list, next) {
> 
> Pointless braces
> 
>> +ret += compute_dbinfo_mapping_requirements(>info, order);
>> +}
>> +mutex_unlock(_doorbell_mutex);
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_pages);
> 
> So here is a general rant about your naming choices.
> 
>struct irqchip_doorbell
>struct irq_chip_msi_doorbell_info
> 
>struct irq_chip {
> *(*msi_doorbell_info);
>}
> 
>irqchip_doorbell_mutex
> 
>msi_doorbell_register_global
>msi_doorbell_unregister_global
> 
>msi_doorbell_pages
> 
> This really sucks. Your public functions start sensibly with msi_doorbell.
> 
> Though what is the _global postfix for the register/unregister functions for?
> Are there _private functions in the pipeline?
> 
> msi_doorbell_pages() is not telling me what it does. msi_calc_doorbell_pages()
> would describe it right away.
> 
> You doorbell info structure can really do with:
> 
> struct msi_doorbell_info;
> 
> And the wrapper struct around it is fine with:
> 
> struct msi_doorbell;
> 
> Thanks,
> 
>   tglx
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe

2016-07-21 Thread Auger Eric
Hi,

On 20/07/2016 10:12, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>> +bool msi_doorbell_safe(void)
>> +{
>> +struct irqchip_doorbell *db;
>> +bool irq_remapping = true;
>> +
>> +mutex_lock(_doorbell_mutex);
>> +list_for_each_entry(db, _doorbell_list, next) {
>> +irq_remapping &= db->info.irq_remapping;
> 
> db->info.irq_remapping is set in msi_doorbell_register(). So you can keep book
> about that there. No need to iterate here.
Yes makes sense to store the info at registration time. Currently this
function is not in any fast path but that's cleaner from a general
perspective. I will need to do such iteration at un-registration though.

Thanks

Eric
> 
> Thanks,
> 
>   tglx
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 1/3: iommu changes

2016-07-20 Thread Auger Eric
Hi Dennis
On 20/07/2016 11:56, Dennis Chen wrote:
> Hi Eric,
> 
> On Tue, Jul 19, 2016 at 12:55:03PM +, Eric Auger wrote:
>> This series introduces the msi-iommu api used to:
>>
>> - allocate/free resources for MSI IOMMU mapping
>> - set the MSI iova window aperture
>> - map/unmap physical addresses onto MSI IOVAs.
>> - determine whether an msi needs to be iommu mapped
>> - overwrite an msi_msg PA address with its pre-allocated/mapped IOVA
>>
>> Also a new iommu domain attribute, DOMAIN_ATTR_MSI_GEOMETRY is introduced
>> to report the MSI iova window geometry (aperture and iommu-msi API support).
>>
>> Currently:
>> - iommu driver is supposed to allocate/free MSI mapping resources
>> - VFIO subsystem is supposed to set the MSI IOVA aperture.
>> - The MSI layer is supposed to allocate/free iova mappings and overwrite
>>   msi_msg with IOVA at composition time
>>
>> More details & context can be found at:
>> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/v4.7-rc7-passthrough-v11
>>
> Why can't I find this new series on your git tree:
> https://git.linaro.org/people/eric.auger/linux.git
you are not looking at the right git repo: see github one above.
> ?
> Also, do I need to download all the 3-part patches to test the PCIe NIC 
> passthru
> as I did on your v9 series?
Yes you need to take the 3 parts. You should have everything that is
needed on the above branch. In case you do not work on Cavium, you
should not cherry-pick
"vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"

Thanks

Eric
> 
> Thanks,
> Dennis 
>>
>> see part III for wrap-up details.
>>
>> History:
>> v10 -> v11:
>> - no change in the series, just incremented for consistency
>> - added a temporary patch in the branch:
>>   "iommu/iova: FIXUP! validate iova_domain input to put_iova_domain"
>>   originally sent by Nate and adapted for this use case. This is currently
>>   under discussion on the ML. The crash typically occurs in case unsafe
>>   interrupts are discovered while allow_unsafe_interrupts is not set.
>>
>> v9 -> v10:
>> - split error management in iommu_msi_set_aperture
>>
>> v8 -> v9:
>> - rename iommu_domain_msi_geometry programmable flag into iommu_msi_supported
>> - introduce msi_apperture_valid helper and use this instead of 
>> is_aperture_set
>>
>> v7 -> v8:
>> - The API is retargetted for MSI: renamed msi-iommu
>>   all "dma-reserved" namings removed
>> - now implemented upon dma-iommu (get, put, init), ie. reuse iova_cookie,
>>   and iova API
>> - msi mapping resources now are guaranteed to exist during the whole iommu
>>   domain's lifetime. No need to lock to garantee the cookie integrity
>> - removed alloc/free_reserved_reserved_iova_domain. We now have a single
>>   function that sets the aperture, looking like iommu_dma_init_domain.
>> - we now use a list instead of an RB-tree
>> - prot is not propagated anymore at domain creation due to the retargetting
>>   for MSI
>> - iommu_domain pointer removed from doorbell_mapping struct
>> - replaced DOMAIN_ATTR_MSI_MAPPING by DOMAIN_ATTR_MSI_GEOMETRY
>>
>> v6 -> v7:
>> - fixed known lock bugs and multiple page sized slots matching
>>   (I only have a single MSI frame made of a single page)
>> - reserved_iova_cookie now pointing to a struct that encapsulates the
>>   iova domain handle + protection attribute passed from VFIO (Alex' req)
>> - 2 new functions exposed: iommu_msi_mapping_translate_msg,
>>   iommu_msi_mapping_desc_to_domain: not sure this is the right location/proto
>>   though
>> - iommu_put_reserved_iova now takes a phys_addr_t
>> - everything now is cleanup on iommu_domain destruction
>>
>> RFC v5 -> patch v6:
>> - split to ease the review process
>> - in dma-reserved-api use a spin lock instead of a mutex (reported by
>>   Jean-Philippe)
>> - revisit iommu_get_reserved_iova API to pass a size parameter upon
>>   Marc's request
>> - Consistently use the page order passed when creating the iova domain.
>> - init reserved_binding_list (reported by Julien)
>>
>> RFC v4 -> RFC v5:
>> - take into account Thomas' comments on MSI related patches
>>   - split "msi: IOMMU map the doorbell address when needed"
>>   - increase readability and add comments
>>   - fix style issues
>>  - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
>>  - platform ITS now advertises IOMMU_CAP_INTR_REMAP
>>  - fix compilation issue with CONFIG_IOMMU API unset
>>  - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING
>>
>> RFC v3 -> v4:
>> - Move doorbell mapping/unmapping in msi.c
>> - fix ref count issue on set_affinity: in case of a change in the address
>>   the previous address is decremented
>> - doorbell map/unmap now is done on msi composition. Should allow the use
>>   case for platform MSI controllers
>> - create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
>>   to reserved IOVA management (looking like 

Re: [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration

2016-07-20 Thread Auger Eric
Hi Thomas,
On 19/07/2016 16:22, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +struct irqchip_doorbell {
>> +struct irq_chip_msi_doorbell_info info;
>> +struct list_head next;
> 
> Again, please align the struct members.
> 
>> +};
>> +
>> +static LIST_HEAD(irqchip_doorbell_list);
>> +static DEFINE_MUTEX(irqchip_doorbell_mutex);
>> +
>> +struct irq_chip_msi_doorbell_info *
>> +msi_doorbell_register_global(phys_addr_t base, size_t size,
>> + int prot, bool irq_remapping)
>> +{
>> +struct irqchip_doorbell *db;
>> +
>> +db = kmalloc(sizeof(*db), GFP_KERNEL);
>> +if (!db)
>> +return ERR_PTR(-ENOMEM);
>> +
>> +db->info.doorbell_is_percpu = false;
> 
> Please use kzalloc and get rid of zero initialization. If you add stuff to the
> struct then initialization will be automatically 0.
OK
> 
>> +void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info 
>> *dbinfo)
>> +{
>> +struct irqchip_doorbell *db, *tmp;
>> +
>> +mutex_lock(_doorbell_mutex);
>> +list_for_each_entry_safe(db, tmp, _doorbell_list, next) {
> 
> Why do you need that iterator? 
> 
> db = container_of(dbinfo, struct ., info);
> 
> Hmm?

definitively
> 
>> +if (dbinfo == >info) {
>> +list_del(>next);
>> +kfree(db);
> 
> Please move the kfree() outside of the lock region. It does not matter much
> here, but we really should stop doing random crap in locked regions.
OK

Thanks

Eric
> 
> Thanks,
> 
>   tglx
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages

2016-07-20 Thread Auger Eric
Hi Thomas,
On 19/07/2016 16:38, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>> msi_doorbell_pages sum up the number of iommu pages of a given order
> 
> adding () to the function name would make it immediately clear that
> msi_doorbell_pages is a function.
> 
>> +/**
>> + * msi_doorbell_pages: compute the number of iommu pages of size 1 << order
>> + * requested to map all the registered doorbells
>> + *
>> + * @order: iommu page order
>> + */
> 
> Why are you adding the kernel doc to the header and not to the implementation?
> 
>> +int msi_doorbell_pages(unsigned int order);
>> +
>>  #else
>>  
>>  static inline struct irq_chip_msi_doorbell_info *
>> @@ -47,6 +55,12 @@ msi_doorbell_register_global(phys_addr_t base, size_t 
>> size,
>>  static inline void
>>  msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db) {}
>>  
>> +static inline int
>> +msi_doorbell_pages(unsigned int order)
> 
> What's the point of this line break?

> 
>> +{
>> +return 0;
>> +}
>> +
>>  #endif /* CONFIG_MSI_DOORBELL */
>>  
>>  #endif
>> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
>> index 0ff541e..a5bde37 100644
>> --- a/kernel/irq/msi-doorbell.c
>> +++ b/kernel/irq/msi-doorbell.c
>> @@ -60,3 +60,55 @@ void msi_doorbell_unregister_global(struct 
>> irq_chip_msi_doorbell_info *dbinfo)
>>  mutex_unlock(_doorbell_mutex);
>>  }
>>  EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
>> +
>> +static int compute_db_mapping_requirements(phys_addr_t addr, size_t size,
>> +   unsigned int order)
>> +{
>> +phys_addr_t offset, granule;
>> +unsigned int nb_pages;
>> +
>> +granule = (uint64_t)(1 << order);
>> +offset = addr & (granule - 1);
>> +size = ALIGN(size + offset, granule);
>> +nb_pages = size >> order;
>> +
>> +return nb_pages;
>> +}
>> +
>> +static int
>> +compute_dbinfo_mapping_requirements(struct irq_chip_msi_doorbell_info 
>> *dbinfo,
>> +unsigned int order)
> 
> I'm sure you can find even longer function names which require more line
> breaks.
> 
>> +{
>> +int ret = 0;
>> +
>> +if (!dbinfo->doorbell_is_percpu) {
>> +ret = compute_db_mapping_requirements(dbinfo->global_doorbell,
>> +  dbinfo->size, order);
>> +} else {
>> +phys_addr_t __percpu *pbase;
>> +int cpu;
>> +
>> +for_each_possible_cpu(cpu) {
>> +pbase = per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
>> +ret += compute_db_mapping_requirements(*pbase,
>> +   dbinfo->size,
>> +   order);
>> +}
>> +}
>> +return ret;
>> +}
>> +
>> +int msi_doorbell_pages(unsigned int order)
>> +{
>> +struct irqchip_doorbell *db;
>> +int ret = 0;
>> +
>> +mutex_lock(_doorbell_mutex);
>> +list_for_each_entry(db, _doorbell_list, next) {
> 
> Pointless braces
> 
>> +ret += compute_dbinfo_mapping_requirements(>info, order);
>> +}
>> +mutex_unlock(_doorbell_mutex);
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_pages);
> 
> So here is a general rant about your naming choices.
> 
>struct irqchip_doorbell
>struct irq_chip_msi_doorbell_info
> 
>struct irq_chip {
> *(*msi_doorbell_info);
>}
> 
>irqchip_doorbell_mutex
> 
>msi_doorbell_register_global
>msi_doorbell_unregister_global
> 
>msi_doorbell_pages
> 
> This really sucks. Your public functions start sensibly with msi_doorbell.
> 
> Though what is the _global postfix for the register/unregister functions for?
> Are there _private functions in the pipeline?
global is to be opposed to per-cpu (doorbell). Currently gicv2m and
gicv3-its expose a single "global" doorbell and I have not yet coped
with irqchips exposing per-cpu doorbells.
> 
> msi_doorbell_pages() is not telling me what it does. msi_calc_doorbell_pages()
> would describe it right away.
> 
> You doorbell info structure can really do with:
> 
> struct msi_doorbell_info;
> 
> And the wrapper struct around it is fine with:
> 
> struct msi_doorbell;
Yes you're right I will revisit the names and fix all style issues you
reported.

Thank you for your time

Eric
> 
> Thanks,
> 
>   tglx
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 3/3: vfio changes

2016-08-07 Thread Auger Eric
Hi Diana,

On 05/08/2016 16:46, Diana Madalina Craciun wrote:
> Hi Eric,
> 
> I have tested these patches in a VFIO PCI scenario (using the ITS
> emulation) on a NXP LS2080 board. It worked fine with one e1000 card
> assigned to the guest. However, when I tried to assign two cards to the
> guest I got a crash. I narrowed down the problem to this code:
> 
> drivers/vfio/vfio_iommu_type1.c, vfio_iommu_type1_attach_group function
> 
>/*
>  * Try to match an existing compatible domain.  We don't want to
>  * preclude an IOMMU driver supporting multiple bus_types and being
>  * able to include different bus_types in the same IOMMU domain, so
>  * we test whether the domains use the same iommu_ops rather than
>  * testing if they're on the same bus_type.
>  */
> list_for_each_entry(d, >domain_list, next) {
> if (d->domain->ops == domain->domain->ops &&
> d->prot == domain->prot) {
> iommu_detach_group(domain->domain, iommu_group);
> if (!iommu_attach_group(d->domain, iommu_group)) {
> list_add(>next, >group_list);
> iommu_domain_free(domain->domain);
> kfree(domain);
> mutex_unlock(>lock);
> return 0;
> }
> 
> ret = iommu_attach_group(domain->domain, iommu_group);
> if (ret)
> goto out_domain;
> }
> }
> 
> The iommu_domain_free function eventually calls iommu_put_dma_cookie
> which calls put_iova_domain. This function (put_iova_domain) tries to
> acquire some spinlocks which were not yet initialized. They will be
> initialized later when the first DMA mapping will be performed. Because
> the spinlock was not yet initialized, it crashes when attempting to
> acquire it.
> 
> With the following fix I was able to successfully assign two cards to
> the guest:
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index ba764a0..b43e34f 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -457,6 +457,9 @@ void put_iova_domain(struct iova_domain *iovad)
> struct rb_node *node;
> unsigned long flags;
>  
> +   if (!iovad->start_pfn)
> +   return;
> +
> free_iova_rcaches(iovad);
> spin_lock_irqsave(>iova_rbtree_lock, flags);
> node = rb_first(>rbroot);
> 
> However, I am not sure if this is the right fix.

Thanks for testing. Sorry for the time you spent narrowing the above
issue. The problem is known and fixed by Robin's patch sent on iommu
list. The patch can be found on my branch - similar fix as the one you
did ;-) -:

iommu/dma: Don't put uninitialised IOVA domains

Thanks!

Eric
> 
> Thanks,
> 
> Diana
> 
> 
> On 08/02/2016 08:30 PM, Eric Auger wrote:
>> This series allows the user-space to register a reserved IOVA domain.
>> This completes the kernel integration of the whole functionality on top
>> of the 2 previous parts (v11).
>>
>> We reuse the VFIO DMA MAP ioctl with a new flag to bridge to the
>> msi-iommu API. The need for provisioning such MSI IOVA range is reported
>> through capability chain, using VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY.
>>
>> vfio_iommu_type1 checks if the MSI mapping is safe when attaching the
>> vfio group to the container (allow_unsafe_interrupts modality). This is
>> done in a very coarse way, looking at all the registered doorbells and
>> returning assignment is safe wrt IRQ if all the doorbells are safe.
>>
>> More details & context can be found at:
>> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>>
>> Best Regards
>>
>> Eric
>>
>> Testing:
>> - functional on ARM64 AMD Overdrive HW (single GICv2m frame) with
>>   Intel I350T2 (SR-IOV capable, igb/igbvf) and Intel 82574L (e1000e)
>> - functional on Cavium ThunderX (ARM GICv3 ITS) with Intel 82574L (e1000e)
>>
>> References:
>> [1] [RFC 0/2] VFIO: Add virtual MSI doorbell support
>> (https://lkml.org/lkml/2015/7/24/135)
>> [2] [RFC PATCH 0/6] vfio: Add interface to map MSI pages
>> 
>> (https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html)
>> [3] [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO
>> (http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858)
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/v4.7-rc7-passthrough-v12
>> previous: https://github.com/eauger/linux/tree/v4.7-rc7-passthrough-v11
>>
>> the above branch includes a temporary patch to work around a ThunderX pci
>> bus reset crash (which I think unrelated to this series):
>> "vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
>> Do not take this one for other platforms.
>>
>> History:
>> v11 -> v12:
>> - no functional change. Only adapt to renamings done in PART II
>>
>> v10 -> v11:
>> no change to this series, just incremented for consistency
>>
>> v9 -> v10:
>> Took into account Alex' comments:
>> - split "vfio/type1: vfio_find_dma accepting a type 

Re: [PATCH v12 09/11] genirq/msi: Introduce msi_desc flags

2016-08-09 Thread Auger Eric
Hi,

On 02/08/2016 19:23, Eric Auger wrote:
> This new flags member is meant to store additional information about
> the msi descriptor, starting with allocation status information.
> 
> MSI_DESC_FLAG_ALLOCATED bit tells the associated base IRQ is allocated.
> This information is currently used at deallocation time. We also
> introduce MSI_DESC_FLAG_FUNCTIONAL telling the MSIs are functional.
> 
> For the time being ALLOCATED and FUNCTIONAL are set at the same time
> but this is going to change in subsequent patch. Indeed in some situations
> some additional tasks need to be carried out for the MSI to be functional.
> For instance the MSI doorbell may need to be mapped in an IOMMU.
> 
> FUNCTIONAL value already gets used when enumerating the usable MSIs in
> msix_capability_init.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v12: new
> ---
>  drivers/pci/msi.c   |  2 +-
>  include/linux/msi.h | 14 ++
>  kernel/irq/msi.c|  7 ++-
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index a080f44..d7733ea 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -793,7 +793,7 @@ out_avail:
>   int avail = 0;
>  
>   for_each_pci_msi_entry(entry, dev) {
> - if (entry->irq != 0)
> + if (entry->flags & MSI_DESC_FLAG_FUNCTIONAL)
>   avail++;
>   }
>   if (avail != 0)
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 8b425c6..18f894f 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -47,6 +47,7 @@ struct fsl_mc_msi_desc {
>   * @nvec_used:   The number of vectors used
>   * @dev: Pointer to the device which uses this descriptor
>   * @msg: The last set MSI message cached for reuse
> + * @flags:   flags to describe the MSI descriptor status or features
>   *
>   * @masked:  [PCI MSI/X] Mask bits
>   * @is_msix: [PCI MSI/X] True if MSI-X
> @@ -67,6 +68,7 @@ struct msi_desc {
>   unsigned intnvec_used;
>   struct device   *dev;
>   struct msi_msg  msg;
> + u32 flags;
I will fix this bad alignment on next version

>  
>   union {
>   /* PCI MSI/X specific data */
> @@ -99,6 +101,18 @@ struct msi_desc {
>   };
>  };
>  
> +/* Flags for msi_desc */
> +enum {
> + /* the base IRQ is allocated */
> + MSI_DESC_FLAG_ALLOCATED =   (1 << 0),
> + /**
> +  * the MSI is functional; in some cases the fact the base IRQ is
> +  * allocated is not sufficient for the MSIs to be functional: for
> +  * example the MSI doorbell(s) may need to be IOMMU mapped.
> +  */
> + MSI_DESC_FLAG_FUNCTIONAL =  (1 << 1),
> +};
> +
>  /* Helpers to hide struct msi_desc implementation details */
>  #define msi_desc_to_dev(desc)((desc)->dev)
>  #define dev_to_msi_list(dev) (&(dev)->msi_list)
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 72bf4d6..9b93766 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -361,6 +361,9 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, 
> struct device *dev,
>   return ret;
>   }
>  
> + desc->flags |= MSI_DESC_FLAG_ALLOCATED;
> + desc->flags |= MSI_DESC_FLAG_FUNCTIONAL;
> +
>   for (i = 0; i < desc->nvec_used; i++)
>   irq_set_msi_desc_off(virq, i, desc);
>   }
> @@ -395,9 +398,11 @@ void msi_domain_free_irqs(struct irq_domain *domain, 
> struct device *dev)
>* enough that there is no IRQ associated to this
>* entry. If that's the case, don't do anything.
>*/
> - if (desc->irq) {
> + if (desc->flags & MSI_DESC_FLAG_ALLOCATED) {
>   irq_domain_free_irqs(desc->irq, desc->nvec_used);
>   desc->irq = 0;
> + desc->flags &= ~MSI_DESC_FLAG_ALLOCATED;
> + desc->flags &= ~MSI_DESC_FLAG_FUNCTIONAL;
Also I will combine those settings

Thanks

Eric
>   }
>   }
>  }
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug report] iommu: iommu_get_group_resv_regions

2017-02-03 Thread Auger Eric
Hi Dan,

On 03/02/2017 10:14, Dan Carpenter wrote:
> Hello Eric Auger,
> 
> The patch 6c65fb318e8b: "iommu: iommu_get_group_resv_regions" from
> Jan 19, 2017, leads to the following static checker warning:
> 
>   drivers/iommu/iommu.c:215 iommu_insert_device_resv_regions()
>   error: uninitialized symbol 'ret'.
> 
> drivers/iommu/iommu.c
>203  static int
>204  iommu_insert_device_resv_regions(struct list_head *dev_resv_regions,
>205   struct list_head *group_resv_regions)
>206  {
>207  struct iommu_resv_region *entry;
>208  int ret;
>209  
>210  list_for_each_entry(entry, dev_resv_regions, list) {
>211  ret = iommu_insert_resv_region(entry, 
> group_resv_regions);
>212  if (ret)
>213  break;
>214  }
>215  return ret;
> 
> On the one hand, it probably doesn't make sense that the dev_resv_regions
> would ever be empty, but on the other hand, there some code that assumes
> it is possible.  What I mean is that iommu_get_resv_regions() can
> basically do nothing if ->get_resv_regions() isn't implemented.
> 
> I guess we should probably set ret = -EINVAL here?

we should rather initialize ret to 0. The dev_resv_regions can be void
if a device has no reserved region or if the IOMMU does not implement
the ops. We only return -ENOMEM error if we failed allocating.

Do you want to send the fix or shall I do?

Thanks for reporting

Eric
-
> 
>216  }
>217  
>218  int iommu_get_group_resv_regions(struct iommu_group *group,
>219   struct list_head *head)
>220  {
>221  struct iommu_device *device;
>222  int ret = 0;
>223  
>224  mutex_lock(>mutex);
>225  list_for_each_entry(device, >devices, list) {
>226  struct list_head dev_resv_regions;
>227  
>228  INIT_LIST_HEAD(_resv_regions);
>229  iommu_get_resv_regions(device->dev, 
> _resv_regions);
>230  ret = 
> iommu_insert_device_resv_regions(_resv_regions, head);
>231  iommu_put_resv_regions(device->dev, 
> _resv_regions);
>232  if (ret)
>233  break;
>234  }
>235  mutex_unlock(>mutex);
>236  return ret;
>237  }
>238  EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);
> 
> 
> regards,
> dan carpenter
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 08/18] iommu/vt-d: Implement reserved region get/put callbacks

2017-01-23 Thread Auger Eric
Hi Will,

On 23/01/2017 12:46, Will Deacon wrote:
> [adding David Woodhouse, since he maintains this driver]

Thank you for adding David to the list.

Whoever is likely to pull this, please let me know if I need to respin
to add missed Will's Acked-by.

Thanks

Eric
> 
> Will
> 
> On Thu, Jan 19, 2017 at 08:57:53PM +, Eric Auger wrote:
>> This patch registers the [FEE0_h - FEF0_000h] 1MB MSI
>> range as a reserved region and RMRR regions as direct regions.
>>
>> This will allow to report those reserved regions in the
>> iommu-group sysfs.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v6 -> v7:
>> - report RMRR regions as direct regions
>> - Due to the usage of rcu_read_lock, the rmrr reserved region
>>   allocation is done on rmrr allocation.
>> - use IOMMU_RESV_RESERVED
>>
>> RFCv2 -> RFCv3:
>> - use get/put_resv_region callbacks.
>>
>> RFC v1 -> RFC v2:
>> - fix intel_iommu_add_reserved_regions name
>> - use IOAPIC_RANGE_START and IOAPIC_RANGE_END defines
>> - return if the MSI region is already registered;
>> ---
>>  drivers/iommu/intel-iommu.c | 92 
>> -
>>  1 file changed, 74 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 8a18525..bce59a5 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -440,6 +440,7 @@ struct dmar_rmrr_unit {
>>  u64 end_address;/* reserved end address */
>>  struct dmar_dev_scope *devices; /* target devices */
>>  int devices_cnt;/* target device count */
>> +struct iommu_resv_region *resv; /* reserved region handle */
>>  };
>>  
>>  struct dmar_atsr_unit {
>> @@ -4246,27 +4247,40 @@ static inline void init_iommu_pm_ops(void) {}
>>  int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
>>  {
>>  struct acpi_dmar_reserved_memory *rmrr;
>> +int prot = DMA_PTE_READ|DMA_PTE_WRITE;
>>  struct dmar_rmrr_unit *rmrru;
>> +size_t length;
>>  
>>  rmrru = kzalloc(sizeof(*rmrru), GFP_KERNEL);
>>  if (!rmrru)
>> -return -ENOMEM;
>> +goto out;
>>  
>>  rmrru->hdr = header;
>>  rmrr = (struct acpi_dmar_reserved_memory *)header;
>>  rmrru->base_address = rmrr->base_address;
>>  rmrru->end_address = rmrr->end_address;
>> +
>> +length = rmrr->end_address - rmrr->base_address + 1;
>> +rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
>> +  IOMMU_RESV_DIRECT);
>> +if (!rmrru->resv)
>> +goto free_rmrru;
>> +
>>  rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1),
>>  ((void *)rmrr) + rmrr->header.length,
>>  >devices_cnt);
>> -if (rmrru->devices_cnt && rmrru->devices == NULL) {
>> -kfree(rmrru);
>> -return -ENOMEM;
>> -}
>> +if (rmrru->devices_cnt && rmrru->devices == NULL)
>> +goto free_all;
>>  
>>  list_add(>list, _rmrr_units);
>>  
>>  return 0;
>> +free_all:
>> +kfree(rmrru->resv);
>> +free_rmrru:
>> +kfree(rmrru);
>> +out:
>> +return -ENOMEM;
>>  }
>>  
>>  static struct dmar_atsr_unit *dmar_find_atsr(struct acpi_dmar_atsr *atsr)
>> @@ -4480,6 +4494,7 @@ static void intel_iommu_free_dmars(void)
>>  list_for_each_entry_safe(rmrru, rmrr_n, _rmrr_units, list) {
>>  list_del(>list);
>>  dmar_free_dev_scope(>devices, >devices_cnt);
>> +kfree(rmrru->resv);
>>  kfree(rmrru);
>>  }
>>  
>> @@ -5203,6 +5218,45 @@ static void intel_iommu_remove_device(struct device 
>> *dev)
>>  iommu_device_unlink(iommu->iommu_dev, dev);
>>  }
>>  
>> +static void intel_iommu_get_resv_regions(struct device *device,
>> + struct list_head *head)
>> +{
>> +struct iommu_resv_region *reg;
>> +struct dmar_rmrr_unit *rmrr;
>> +struct device *i_dev;
>> +int i;
>> +
>> +rcu_read_lock();
>> +for_each_rmrr_units(rmrr) {
>> +for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
>> +  i, i_dev) {
>> +if (i_dev != device)
>> +continue;
>> +
>> +list_add_tail(>resv->list, head);
>> +}
>> +}
>> +rcu_read_unlock();
>> +
>> +reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
>> +  IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
>> +  0, IOMMU_RESV_RESERVED);
>> +if (!reg)
>> +return;
>> +list_add_tail(>list, head);
>> +}
>> +
>> +static void intel_iommu_put_resv_regions(struct device *dev,
>> + struct list_head *head)
>> +{
>> +struct iommu_resv_region *entry, *next;
>> +
>> +list_for_each_entry_safe(entry, next, head, list) {
>> + 

Re: [PATCH v8 00/18] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2017-01-18 Thread Auger Eric
Hi Tomasz,

On 13/01/2017 14:59, Tomasz Nowicki wrote:
> Hello Eric,
> 
> On 11.01.2017 10:41, Eric Auger wrote:
>> Following LPC discussions, we now report reserved regions through
>> the iommu-group sysfs reserved_regions attribute file.
>>
>> Reserved regions are populated through the IOMMU get_resv_region
>> callback (former get_dm_regions), now implemented by amd-iommu,
>> intel-iommu and arm-smmu:
>> - the intel-iommu reports the [0xfee0 - 0xfeef] MSI window
>>   as a reserved region and RMRR regions as direct-mapped regions.
>> - the amd-iommu reports device direct mapped regions, the MSI region
>>   and HT regions.
>> - the arm-smmu reports the MSI window (arbitrarily located at
>>   0x800 and 1MB large).
>>
>> Unsafe interrupt assignment is tested by enumerating all MSI irq
>> domains and checking MSI remapping is supported in the above hierarchy.
>> This check is done in case we detect the iommu translates MSI
>> (an IOMMU_RESV_MSI window exists). Otherwise the IRQ remapping
>> capability is checked at IOMMU level. Obviously this is a defensive
>> IRQ safety assessment: Assuming there are several MSI controllers
>> in the system and at least one does not implement IRQ remapping,
>> the assignment will be considered as unsafe (even if this controller
>> is not acessible from the assigned devices).
>>
>> The series first patch stems from Robin's branch:
>> http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/misc
>>
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/v4.10-rc3-reserved-v8
> 
> I tested the series on ThunderX with internal 10G VNIC and Intel IXGBE
> NIC. Please feel free to add my:
> Tested-by: Tomasz Nowicki 

Thank you for your review. I will respin tomorrow adding the comment we
discussed in [PATCH v8 14/18] irqdomain: irq_domain_check_msi_remap.

I will also apply Bharat's Tested-by, your T-b and your R-b's.

Thanks

Eric
> 
> Thanks,
> Tomasz
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 00/18] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2017-01-16 Thread Auger Eric
Hi Tomasz,

On 13/01/2017 14:59, Tomasz Nowicki wrote:
> Hello Eric,
> 
> On 11.01.2017 10:41, Eric Auger wrote:
>> Following LPC discussions, we now report reserved regions through
>> the iommu-group sysfs reserved_regions attribute file.
>>
>> Reserved regions are populated through the IOMMU get_resv_region
>> callback (former get_dm_regions), now implemented by amd-iommu,
>> intel-iommu and arm-smmu:
>> - the intel-iommu reports the [0xfee0 - 0xfeef] MSI window
>>   as a reserved region and RMRR regions as direct-mapped regions.
>> - the amd-iommu reports device direct mapped regions, the MSI region
>>   and HT regions.
>> - the arm-smmu reports the MSI window (arbitrarily located at
>>   0x800 and 1MB large).
>>
>> Unsafe interrupt assignment is tested by enumerating all MSI irq
>> domains and checking MSI remapping is supported in the above hierarchy.
>> This check is done in case we detect the iommu translates MSI
>> (an IOMMU_RESV_MSI window exists). Otherwise the IRQ remapping
>> capability is checked at IOMMU level. Obviously this is a defensive
>> IRQ safety assessment: Assuming there are several MSI controllers
>> in the system and at least one does not implement IRQ remapping,
>> the assignment will be considered as unsafe (even if this controller
>> is not acessible from the assigned devices).
>>
>> The series first patch stems from Robin's branch:
>> http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/misc
>>
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/v4.10-rc3-reserved-v8
> 
> I tested the series on ThunderX with internal 10G VNIC and Intel IXGBE
> NIC. Please feel free to add my:
> Tested-by: Tomasz Nowicki 
Many thanks!

Eric
> 
> Thanks,
> Tomasz
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 14/18] irqdomain: irq_domain_check_msi_remap

2017-01-17 Thread Auger Eric
Hi Tomasz,

On 17/01/2017 14:40, Tomasz Nowicki wrote:
> On 11.01.2017 10:41, Eric Auger wrote:
>> This new function checks whether all MSI irq domains
>> implement IRQ remapping. This is useful to understand
>> whether VFIO passthrough is safe with respect to interrupts.
>>
>> On ARM typically an MSI controller can sit downstream
>> to the IOMMU without preventing VFIO passthrough.
>> As such any assigned device can write into the MSI doorbell.
>> In case the MSI controller implements IRQ remapping, assigned
>> devices will not be able to trigger interrupts towards the
>> host. On the contrary, the assignment must be emphasized as
>> unsafe with respect to interrupts.
>>
>> Signed-off-by: Eric Auger 
>> Reviewed-by: Marc Zyngier 
>>
>> ---
>> v7 -> v8:
>> - remove goto in irq_domain_check_msi_remap
>> - Added Marc's R-b
>>
>> v5 -> v6:
>> - use irq_domain_hierarchical_is_msi_remap()
>> - comment rewording
>>
>> v4 -> v5:
>> - Handle DOMAIN_BUS_FSL_MC_MSI domains
>> - Check parents
>> ---
>>  include/linux/irqdomain.h |  1 +
>>  kernel/irq/irqdomain.c| 22 ++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
>> index bc2f571..188eced 100644
>> --- a/include/linux/irqdomain.h
>> +++ b/include/linux/irqdomain.h
>> @@ -222,6 +222,7 @@ struct irq_domain *irq_domain_add_legacy(struct
>> device_node *of_node,
>>   void *host_data);
>>  extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec
>> *fwspec,
>> enum irq_domain_bus_token bus_token);
>> +extern bool irq_domain_check_msi_remap(void);
>>  extern void irq_set_default_host(struct irq_domain *host);
>>  extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
>>irq_hw_number_t hwirq, int node,
>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>> index 876e131..d889751 100644
>> --- a/kernel/irq/irqdomain.c
>> +++ b/kernel/irq/irqdomain.c
>> @@ -278,6 +278,28 @@ struct irq_domain
>> *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
>>  EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);
>>
>>  /**
>> + * irq_domain_check_msi_remap - Check whether all MSI
>> + * irq domains implement IRQ remapping
>> + */
>> +bool irq_domain_check_msi_remap(void)
>> +{
>> +struct irq_domain *h;
>> +bool ret = true;
>> +
>> +mutex_lock(_domain_mutex);
>> +list_for_each_entry(h, _domain_list, link) {
>> +if (irq_domain_is_msi(h) &&
>> +!irq_domain_hierarchical_is_msi_remap(h)) {
>> +ret = false;
>> +break;
>> +}
>> +}
>> +mutex_unlock(_domain_mutex);
>> +return ret;
>> +}
> 
> Above function returns true, even though there is no MSI irq domains. Is
> it intentional ?
>From the VFIO integration point of view this is what we want. If there
is no MSI controller in the system, we have no vulnerability with
respect to IRQ assignment and we consider the system as safe. If
requested I can add a comment?

Thanks

Eric
> 
> Thanks,
> Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 19/19] iommu/dma: Add support for mapping MSIs

2016-08-31 Thread Auger Eric
Hi Robin,

On 26/08/2016 03:17, Robin Murphy wrote:
> Hi Eric,
> 
> On Fri, 26 Aug 2016 00:25:34 +0200
> Auger Eric <eric.au...@redhat.com> wrote:
> 
>> Hi Robin,
>>
>> On 23/08/2016 21:05, Robin Murphy wrote:
>>> When an MSI doorbell is located downstream of an IOMMU, attaching
>>> devices to a DMA ops domain and switching on translation leads to a
>>> rude shock when their attempt to write to the physical address
>>> returned by the irqchip driver faults (or worse, writes into some
>>> already-mapped buffer) and no interrupt is forthcoming.
>>>
>>> Address this by adding a hook for relevant irqchip drivers to call
>>> from their compose_msi_msg() callback, to swizzle the physical
>>> address with an appropriatly-mapped IOVA for any device attached to
>>> one of our DMA ops domains.
>>>
>>> CC: Thomas Gleixner <t...@linutronix.de>
>>> CC: Jason Cooper <ja...@lakedaemon.net>
>>> CC: Marc Zyngier <marc.zyng...@arm.com>
>>> CC: linux-ker...@vger.kernel.org
>>> Signed-off-by: Robin Murphy <robin.mur...@arm.com>
>>> ---
>>>  drivers/iommu/dma-iommu.c| 141
>>> ++-
>>> drivers/irqchip/irq-gic-v2m.c|   3 +
>>> drivers/irqchip/irq-gic-v3-its.c |   3 +
>>> include/linux/dma-iommu.h|   9 +++ 4 files changed, 141
>>> insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index 00c8a08d56e7..330cce60cad9 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -25,10 +25,29 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>>  
>>> +struct iommu_dma_msi_page {
>>> +   struct list_headlist;
>>> +   dma_addr_t  iova;
The iova address here corresponds to the page iova address and not to
the iova address mapped onto the phys_hi/phys_lo address. Might be worth
a comment since it is not obvious or populate with the right iova?
>>> +   u32 phys_lo;
>>> +   u32 phys_hi;
>>> +};
>>> +
>>> +struct iommu_dma_cookie {
>>> +   struct iova_domain  iovad;
>>> +   struct list_headmsi_page_list;
>>> +   spinlock_t  msi_lock;
>>> +};
>>> +
>>> +static inline struct iova_domain *cookie_iovad(struct iommu_domain
>>> *domain) +{
>>> +   return &((struct iommu_dma_cookie
>>> *)domain->iova_cookie)->iovad; +}
>>> +
>>>  int iommu_dma_init(void)
>>>  {
>>> return iova_cache_get();
>>> @@ -43,15 +62,19 @@ int iommu_dma_init(void)
>>>   */
>>>  int iommu_get_dma_cookie(struct iommu_domain *domain)
>>>  {
>>> -   struct iova_domain *iovad;
>>> +   struct iommu_dma_cookie *cookie;
>>>  
>>> if (domain->iova_cookie)
>>> return -EEXIST;
>>>  
>>> -   iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
>>> -   domain->iova_cookie = iovad;
>>> +   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
>>> +   if (!cookie)
>>> +   return -ENOMEM;
>>>  
>>> -   return iovad ? 0 : -ENOMEM;
>>> +   spin_lock_init(>msi_lock);
>>> +   INIT_LIST_HEAD(>msi_page_list);
>>> +   domain->iova_cookie = cookie;
>>> +   return 0;
>>>  }
>>>  EXPORT_SYMBOL(iommu_get_dma_cookie);
>>>  
>>> @@ -63,14 +86,20 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
>>>   */
>>>  void iommu_put_dma_cookie(struct iommu_domain *domain)
>>>  {
>>> -   struct iova_domain *iovad = domain->iova_cookie;
>>> +   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>> +   struct iommu_dma_msi_page *msi, *tmp;
>>>  
>>> -   if (!iovad)
>>> +   if (!cookie)
>>> return;
>>>  
>>> -   if (iovad->granule)
>>> -   put_iova_domain(iovad);
>>> -   kfree(iovad);
>>> +   if (cookie->iovad.granule)
>>> +   put_iova_domain(>iovad);
>>> +
>>> +   list_for_each_entry_safe(msi, tmp, >msi_page_list,
>>> list) {
>>> +   list_del(>list);
>>> +   kfree(msi);
>>> +   }
>>> +   kfree(cookie);
>>> domain->iova_cookie

Re: [PATCH v5 18/19] iommu/arm-smmu: Set domain geometry

2016-08-31 Thread Auger Eric
Hi,

On 23/08/2016 21:05, Robin Murphy wrote:
> For non-aperture-based IOMMUs, the domain geometry seems to have become
> the de-facto way of indicating the input address space size. That is
> quite a useful thing from the users' perspective, so let's do the same.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu-v3.c | 2 ++
>  drivers/iommu/arm-smmu.c| 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 72b996aa7460..9c56bd194dc2 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1574,6 +1574,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
> *domain)
>   return -ENOMEM;
>  
>   domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> + domain->geometry.aperture_end = (1UL << ias) - 1;
> + domain->geometry.force_aperture = true;
>   smmu_domain->pgtbl_ops = pgtbl_ops;
>  
>   ret = finalise_stage_fn(smmu_domain, _cfg);
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 85bc74d8fca0..112918d787eb 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -913,6 +913,8 @@ static int arm_smmu_init_domain_context(struct 
> iommu_domain *domain,
>  
>   /* Update the domain's page sizes to reflect the page table format */
>   domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> + domain->geometry.aperture_end = (1UL << ias) - 1;
> + domain->geometry.force_aperture = true;
>  
>   /* Initialise the context bank with our page table cfg */
>   arm_smmu_init_context_bank(smmu_domain, _cfg);
> 
Reviewed-by: Eric Auger 


Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Auger Eric
Hi,
On 14/09/2016 12:35, Robin Murphy wrote:
> On 14/09/16 09:41, Auger Eric wrote:
>> Hi,
>>
>> On 12/09/2016 18:13, Robin Murphy wrote:
>>> Hi all,
>>>
>>> To any more confusing fixups and crazily numbered extra patches, here's
>>> a quick v7 with everything rebased into the right order. The significant
>>> change this time is to implement iommu_fwspec properly from the start,
>>> which ends up being far simpler and more robust than faffing about
>>> introducing it somewhere 'less intrusive' to move toward core code later.
>>>
>>> New branch in the logical place:
>>>
>>> git://linux-arm.org/linux-rm iommu/generic-v7
>>
>> For information, as discussed privately with Robin I experience some
>> regressions with the former and now deprecated dt description.
>>
>> on my AMD Overdrive board and my old dt description I now only see a
>> single group:
>>
>> /sys/kernel/iommu_groups/
>> /sys/kernel/iommu_groups/0
>> /sys/kernel/iommu_groups/0/devices
>> /sys/kernel/iommu_groups/0/devices/e070.xgmac
>>
>> whereas I formerly see
>>
>> /sys/kernel/iommu_groups/
>> /sys/kernel/iommu_groups/3
>> /sys/kernel/iommu_groups/3/devices
>> /sys/kernel/iommu_groups/3/devices/:00:00.0
>> /sys/kernel/iommu_groups/1
>> /sys/kernel/iommu_groups/1/devices
>> /sys/kernel/iommu_groups/1/devices/e070.xgmac
>> /sys/kernel/iommu_groups/4
>> /sys/kernel/iommu_groups/4/devices
>> /sys/kernel/iommu_groups/4/devices/:00:02.2
>> /sys/kernel/iommu_groups/4/devices/:01:00.1
>> /sys/kernel/iommu_groups/4/devices/:00:02.0
>> /sys/kernel/iommu_groups/4/devices/:01:00.0
>> /sys/kernel/iommu_groups/2
>> /sys/kernel/iommu_groups/2/devices
>> /sys/kernel/iommu_groups/2/devices/e090.xgmac
>> /sys/kernel/iommu_groups/0
>> /sys/kernel/iommu_groups/0/devices
>> /sys/kernel/iommu_groups/0/devices/f000.pcie
>>
>> This is the group topology without ACS override. Applying the non
>> upstreamed "pci: Enable overrides for missing ACS capabilities" I used
>> to see separate groups for each PCIe components. Now I don't see any
>> difference with and without ACS override.
> 
> OK, having reproduced on my Juno, the problem looks to be that
> of_for_each_phandle() leaves err set to -ENOENT after successfully
> walking a phandle list, which makes __find_legacy_master_phandle()
> always bail out after the first SMMU.
> 
> Can you confirm that the following diff fixes things for you?

Well it improves but there are still differences in the group topology.
The PFs now are in group 0.

root@trusty:~# lspci -nk
00:00.0 0600: 1022:1a00
Subsystem: 1022:1a00
00:02.0 0600: 1022:1a01
00:02.2 0604: 1022:1a02
Kernel driver in use: pcieport
01:00.0 0200: 8086:1521 (rev 01)
Subsystem: 8086:0002
Kernel driver in use: igb
01:00.1 0200: 8086:1521 (rev 01)
Subsystem: 8086:0002
Kernel driver in use: igb


with your series + fix:
/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/3
/sys/kernel/iommu_groups/3/devices
/sys/kernel/iommu_groups/3/devices/:00:00.0
/sys/kernel/iommu_groups/1
/sys/kernel/iommu_groups/1/devices
/sys/kernel/iommu_groups/1/devices/e070.xgmac
/sys/kernel/iommu_groups/4
/sys/kernel/iommu_groups/4/devices
/sys/kernel/iommu_groups/4/devices/:00:02.2
/sys/kernel/iommu_groups/4/devices/:00:02.0
/sys/kernel/iommu_groups/2
/sys/kernel/iommu_groups/2/devices
/sys/kernel/iommu_groups/2/devices/e090.xgmac
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/:01:00.1
/sys/kernel/iommu_groups/0/devices/f000.pcie
/sys/kernel/iommu_groups/0/devices/:01:00.0

Before (4.8-rc5):

/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/3
/sys/kernel/iommu_groups/3/devices
/sys/kernel/iommu_groups/3/devices/:00:00.0
/sys/kernel/iommu_groups/1
/sys/kernel/iommu_groups/1/devices
/sys/kernel/iommu_groups/1/devices/e070.xgmac
/sys/kernel/iommu_groups/4
/sys/kernel/iommu_groups/4/devices
/sys/kernel/iommu_groups/4/devices/:00:02.2
/sys/kernel/iommu_groups/4/devices/:01:00.1
/sys/kernel/iommu_groups/4/devices/:00:02.0
/sys/kernel/iommu_groups/4/devices/:01:00.0
/sys/kernel/iommu_groups/2
/sys/kernel/iommu_groups/2/devices
/sys/kernel/iommu_groups/2/devices/e090.xgmac
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/f000.pcie

Thanks

Eric

> 
> Robin
> 
> --->8---
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index fa892d25004d..ac4aab97c93a 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -477,7 +4

Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Auger Eric
Hi Will

On 14/09/2016 11:20, Will Deacon wrote:
> On Wed, Sep 14, 2016 at 10:41:57AM +0200, Auger Eric wrote:
>> Hi,
> 
> Hi Eric,
> 
>> On 12/09/2016 18:13, Robin Murphy wrote:
>>> To any more confusing fixups and crazily numbered extra patches, here's
>>> a quick v7 with everything rebased into the right order. The significant
>>> change this time is to implement iommu_fwspec properly from the start,
>>> which ends up being far simpler and more robust than faffing about
>>> introducing it somewhere 'less intrusive' to move toward core code later.
>>>
>>> New branch in the logical place:
>>>
>>> git://linux-arm.org/linux-rm iommu/generic-v7
>>
>> For information, as discussed privately with Robin I experience some
>> regressions with the former and now deprecated dt description.
> 
> Please can you share the DT you're using so we can reproduce this locally?

Done already

Thanks

Eric
> 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-14 Thread Auger Eric
Hi,

On 12/09/2016 18:13, Robin Murphy wrote:
> Hi all,
> 
> To any more confusing fixups and crazily numbered extra patches, here's
> a quick v7 with everything rebased into the right order. The significant
> change this time is to implement iommu_fwspec properly from the start,
> which ends up being far simpler and more robust than faffing about
> introducing it somewhere 'less intrusive' to move toward core code later.
> 
> New branch in the logical place:
> 
> git://linux-arm.org/linux-rm iommu/generic-v7

For information, as discussed privately with Robin I experience some
regressions with the former and now deprecated dt description.

on my AMD Overdrive board and my old dt description I now only see a
single group:

/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/e070.xgmac

whereas I formerly see

/sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/3
/sys/kernel/iommu_groups/3/devices
/sys/kernel/iommu_groups/3/devices/:00:00.0
/sys/kernel/iommu_groups/1
/sys/kernel/iommu_groups/1/devices
/sys/kernel/iommu_groups/1/devices/e070.xgmac
/sys/kernel/iommu_groups/4
/sys/kernel/iommu_groups/4/devices
/sys/kernel/iommu_groups/4/devices/:00:02.2
/sys/kernel/iommu_groups/4/devices/:01:00.1
/sys/kernel/iommu_groups/4/devices/:00:02.0
/sys/kernel/iommu_groups/4/devices/:01:00.0
/sys/kernel/iommu_groups/2
/sys/kernel/iommu_groups/2/devices
/sys/kernel/iommu_groups/2/devices/e090.xgmac
/sys/kernel/iommu_groups/0
/sys/kernel/iommu_groups/0/devices
/sys/kernel/iommu_groups/0/devices/f000.pcie

This is the group topology without ACS override. Applying the non
upstreamed "pci: Enable overrides for missing ACS capabilities" I used
to see separate groups for each PCIe components. Now I don't see any
difference with and without ACS override.

Thanks

Eric
> 
> Robin.
> 
> Mark Rutland (1):
>   Docs: dt: add PCI IOMMU map bindings
> 
> Robin Murphy (21):
>   of/irq: Break out msi-map lookup (again)
>   iommu/of: Handle iommu-map property for PCI
>   iommu: Introduce iommu_fwspec
>   Docs: dt: document ARM SMMUv3 generic binding usage
>   iommu/arm-smmu: Fall back to global bypass
>   iommu/arm-smmu: Implement of_xlate() for SMMUv3
>   iommu/arm-smmu: Support non-PCI devices with SMMUv3
>   iommu/arm-smmu: Set PRIVCFG in stage 1 STEs
>   iommu/arm-smmu: Handle stream IDs more dynamically
>   iommu/arm-smmu: Consolidate stream map entry state
>   iommu/arm-smmu: Keep track of S2CR state
>   iommu/arm-smmu: Refactor mmu-masters handling
>   iommu/arm-smmu: Streamline SMMU data lookups
>   iommu/arm-smmu: Add a stream map entry iterator
>   iommu/arm-smmu: Intelligent SMR allocation
>   iommu/arm-smmu: Convert to iommu_fwspec
>   Docs: dt: document ARM SMMU generic binding usage
>   iommu/arm-smmu: Wire up generic configuration support
>   iommu/arm-smmu: Set domain geometry
>   iommu/dma: Add support for mapping MSIs
>   iommu/dma: Avoid PCI host bridge windows
> 
>  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |   8 +-
>  .../devicetree/bindings/iommu/arm,smmu.txt |  63 +-
>  .../devicetree/bindings/pci/pci-iommu.txt  | 171 
>  arch/arm64/mm/dma-mapping.c|   2 +-
>  drivers/gpu/drm/exynos/exynos_drm_iommu.h  |   2 +-
>  drivers/iommu/Kconfig  |   2 +-
>  drivers/iommu/arm-smmu-v3.c| 386 +
>  drivers/iommu/arm-smmu.c   | 962 
> ++---
>  drivers/iommu/dma-iommu.c  | 161 +++-
>  drivers/iommu/iommu.c  |  56 ++
>  drivers/iommu/of_iommu.c   |  52 +-
>  drivers/irqchip/irq-gic-v2m.c  |   3 +
>  drivers/irqchip/irq-gic-v3-its.c   |   3 +
>  drivers/of/irq.c   |  78 +-
>  drivers/of/of_pci.c| 102 +++
>  include/linux/device.h |   3 +
>  include/linux/dma-iommu.h  |  12 +-
>  include/linux/iommu.h  |  38 +
>  include/linux/of_pci.h |  10 +
>  19 files changed, 1323 insertions(+), 791 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/pci/pci-iommu.txt
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-13 Thread Auger Eric
Hi Robin

On 12/09/2016 18:13, Robin Murphy wrote:
> Hi all,
> 
> To any more confusing fixups and crazily numbered extra patches, here's
> a quick v7 with everything rebased into the right order. The significant
> change this time is to implement iommu_fwspec properly from the start,
> which ends up being far simpler and more robust than faffing about
> introducing it somewhere 'less intrusive' to move toward core code later.
> 
> New branch in the logical place:
> 
> git://linux-arm.org/linux-rm iommu/generic-v7
I just tested your branch on AMD overdrive *without* updating the device
tree description according to the new syntax and I get a kernel oops.
See logs attached. Continuing my investigations ...

Best Regards

Eric
> 
> Robin.
> 
> Mark Rutland (1):
>   Docs: dt: add PCI IOMMU map bindings
> 
> Robin Murphy (21):
>   of/irq: Break out msi-map lookup (again)
>   iommu/of: Handle iommu-map property for PCI
>   iommu: Introduce iommu_fwspec
>   Docs: dt: document ARM SMMUv3 generic binding usage
>   iommu/arm-smmu: Fall back to global bypass
>   iommu/arm-smmu: Implement of_xlate() for SMMUv3
>   iommu/arm-smmu: Support non-PCI devices with SMMUv3
>   iommu/arm-smmu: Set PRIVCFG in stage 1 STEs
>   iommu/arm-smmu: Handle stream IDs more dynamically
>   iommu/arm-smmu: Consolidate stream map entry state
>   iommu/arm-smmu: Keep track of S2CR state
>   iommu/arm-smmu: Refactor mmu-masters handling
>   iommu/arm-smmu: Streamline SMMU data lookups
>   iommu/arm-smmu: Add a stream map entry iterator
>   iommu/arm-smmu: Intelligent SMR allocation
>   iommu/arm-smmu: Convert to iommu_fwspec
>   Docs: dt: document ARM SMMU generic binding usage
>   iommu/arm-smmu: Wire up generic configuration support
>   iommu/arm-smmu: Set domain geometry
>   iommu/dma: Add support for mapping MSIs
>   iommu/dma: Avoid PCI host bridge windows
> 
>  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |   8 +-
>  .../devicetree/bindings/iommu/arm,smmu.txt |  63 +-
>  .../devicetree/bindings/pci/pci-iommu.txt  | 171 
>  arch/arm64/mm/dma-mapping.c|   2 +-
>  drivers/gpu/drm/exynos/exynos_drm_iommu.h  |   2 +-
>  drivers/iommu/Kconfig  |   2 +-
>  drivers/iommu/arm-smmu-v3.c| 386 +
>  drivers/iommu/arm-smmu.c   | 962 
> ++---
>  drivers/iommu/dma-iommu.c  | 161 +++-
>  drivers/iommu/iommu.c  |  56 ++
>  drivers/iommu/of_iommu.c   |  52 +-
>  drivers/irqchip/irq-gic-v2m.c  |   3 +
>  drivers/irqchip/irq-gic-v3-its.c   |   3 +
>  drivers/of/irq.c   |  78 +-
>  drivers/of/of_pci.c| 102 +++
>  include/linux/device.h |   3 +
>  include/linux/dma-iommu.h  |  12 +-
>  include/linux/iommu.h  |  38 +
>  include/linux/of_pci.h |  10 +
>  19 files changed, 1323 insertions(+), 791 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/pci/pci-iommu.txt
> 



[3.185447] CPU features: detected feature: 32-bit EL0 Support
[3.185455] CPU: All CPU(s) started at EL2
[3.185475] alternatives: patching kernel code
[3.188549] devtmpfs: initialized
[3.189484] SMBIOS 3.0.0 present.
[3.189570] clocksource: jiffies: mask: 0x max_cycles: 0x, 
max_idle_ns: 764504178510 ns
[3.189750] pinctrl core: initialized pinctrl subsystem
[3.190113] NET: Registered protocol family 16
[3.205134] cpuidle: using governor menu
[3.205292] arm-smmu: deprecated "mmu-masters" DT property in use; DMA API 
support unavailable
[3.205319] arm-smmu e060.smmu: probing hardware configuration...
[3.205326] arm-smmu e060.smmu: SMMUv1 with:
[3.205334] arm-smmu e060.smmu:  stage 2 translation
[3.205341] arm-smmu e060.smmu:  non-coherent table walk
[3.205348] arm-smmu e060.smmu:  (IDR0.CTTW overridden by dma-coherent 
property)
[3.205358] arm-smmu e060.smmu:  stream matching with 32 register 
groups, mask 0x7fff
[3.205369] arm-smmu e060.smmu:  8 context banks (8 stage-2 only)
[3.205377] arm-smmu e060.smmu:  Supported page sizes: 0x60211000
[3.205384] arm-smmu e060.smmu:  Stage-2: 40-bit IPA -> 40-bit PA
[3.205489] arm-smmu: deprecated "mmu-masters" DT property in use; DMA API 
support unavailable
[3.205512] arm-smmu e080.smmu: probing hardware configuration...
[3.205519] arm-smmu e080.smmu: SMMUv1 with:
[3.205525] arm-smmu e080.smmu:  stage 2 translation
[3.205533] arm-smmu e080.smmu:  non-coherent table walk
[3.205539] arm-smmu e080.smmu:  (IDR0.CTTW overridden by dma-coherent 
property)
[3.205550] arm-smmu e080.smmu:  stream matching with 32 register 
groups, 

Re: [PATCH] iommu/io-pgtable-arm: Check for v7s-incapable systems

2016-09-13 Thread Auger Eric
Hi,
On 13/09/2016 15:26, Robin Murphy wrote:
> On machines with no 32-bit addressable RAM whatsoever, we shouldn't
> even touch the v7s format as it's never going to work.
> 
> Fixes: e5fc9753b1a8 ("iommu/io-pgtable: Add ARMv7 short descriptor support")
> Reported-by: Eric Auger 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index def8ca1c982d..b7759a48f4ed 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -633,6 +633,9 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
> io_pgtable_cfg *cfg,
>  {
>   struct arm_v7s_io_pgtable *data;
>  
> + if (upper_32_bits(PHYS_OFFSET))
> + return NULL;
> +
>   if (cfg->ias > ARM_V7S_ADDR_BITS || cfg->oas > ARM_V7S_ADDR_BITS)
>   return NULL;
>  
> 
Fixes the oops on AMD Overdrive
(CONFIG_IOMMU_IO_PGTABLE_ARMV7S_SELFTEST=y and no DMA_API)

Tested-by: Eric Auger 

Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-13 Thread Auger Eric
Hi Robin,

On 13/09/2016 14:40, Robin Murphy wrote:
> Hi Eric,
> 
> On 13/09/16 13:14, Auger Eric wrote:
>> Hi Robin
>>
>> On 12/09/2016 18:13, Robin Murphy wrote:
>>> Hi all,
>>>
>>> To any more confusing fixups and crazily numbered extra patches, here's
>>> a quick v7 with everything rebased into the right order. The significant
>>> change this time is to implement iommu_fwspec properly from the start,
>>> which ends up being far simpler and more robust than faffing about
>>> introducing it somewhere 'less intrusive' to move toward core code later.
>>>
>>> New branch in the logical place:
>>>
>>> git://linux-arm.org/linux-rm iommu/generic-v7
>> I just tested your branch on AMD overdrive *without* updating the device
>> tree description according to the new syntax and I get a kernel oops.
>> See logs attached. Continuing my investigations ...
> 
> Looking at that backtrace, it seems the offending commit is actually in
> Will's devel branch _underneath_ this series; what's blowing up there is
> the short-descriptor io-pgtable selftests, which you should be able to
> reproduce on anything back to 4.6-rc1 with
> CONFIG_IOMMU_IO_PGTABLE_ARMV7S_SELFTEST=y.
I confirm that when disabling the option, I don't get the oops anymore.

Thanks!

Eric
> 
> The short-descriptor code is never going to work on Seattle due to the
> lack of 32-bit addressable memory - in normal use it would fail
> gracefully because it couldn't allocate anything, but since the
> selftests bypass the DMA API and corresponding checks, you end up with
> nastiness happening via truncated addresses. A while back I did start
> looking into generalising the selftests to remove all the "if
> (!selftest_running)" special-casing; might be time to pick that up again.
> 
> Robin.
> 
>>
>> Best Regards
>>
>> Eric
>>>
>>> Robin.
>>>
>>> Mark Rutland (1):
>>>   Docs: dt: add PCI IOMMU map bindings
>>>
>>> Robin Murphy (21):
>>>   of/irq: Break out msi-map lookup (again)
>>>   iommu/of: Handle iommu-map property for PCI
>>>   iommu: Introduce iommu_fwspec
>>>   Docs: dt: document ARM SMMUv3 generic binding usage
>>>   iommu/arm-smmu: Fall back to global bypass
>>>   iommu/arm-smmu: Implement of_xlate() for SMMUv3
>>>   iommu/arm-smmu: Support non-PCI devices with SMMUv3
>>>   iommu/arm-smmu: Set PRIVCFG in stage 1 STEs
>>>   iommu/arm-smmu: Handle stream IDs more dynamically
>>>   iommu/arm-smmu: Consolidate stream map entry state
>>>   iommu/arm-smmu: Keep track of S2CR state
>>>   iommu/arm-smmu: Refactor mmu-masters handling
>>>   iommu/arm-smmu: Streamline SMMU data lookups
>>>   iommu/arm-smmu: Add a stream map entry iterator
>>>   iommu/arm-smmu: Intelligent SMR allocation
>>>   iommu/arm-smmu: Convert to iommu_fwspec
>>>   Docs: dt: document ARM SMMU generic binding usage
>>>   iommu/arm-smmu: Wire up generic configuration support
>>>   iommu/arm-smmu: Set domain geometry
>>>   iommu/dma: Add support for mapping MSIs
>>>   iommu/dma: Avoid PCI host bridge windows
>>>
>>>  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |   8 +-
>>>  .../devicetree/bindings/iommu/arm,smmu.txt |  63 +-
>>>  .../devicetree/bindings/pci/pci-iommu.txt  | 171 
>>>  arch/arm64/mm/dma-mapping.c|   2 +-
>>>  drivers/gpu/drm/exynos/exynos_drm_iommu.h  |   2 +-
>>>  drivers/iommu/Kconfig  |   2 +-
>>>  drivers/iommu/arm-smmu-v3.c| 386 +
>>>  drivers/iommu/arm-smmu.c   | 962 
>>> ++---
>>>  drivers/iommu/dma-iommu.c  | 161 +++-
>>>  drivers/iommu/iommu.c  |  56 ++
>>>  drivers/iommu/of_iommu.c   |  52 +-
>>>  drivers/irqchip/irq-gic-v2m.c  |   3 +
>>>  drivers/irqchip/irq-gic-v3-its.c   |   3 +
>>>  drivers/of/irq.c   |  78 +-
>>>  drivers/of/of_pci.c| 102 +++
>>>  include/linux/device.h |   3 +
>>>  include/linux/dma-iommu.h  |  12 +-
>>>  include/linux/iommu.h  |  38 +
>>>  include/linux/of_pci.h |  10 +
>>>  19 files changed, 1323 insertions(+), 791 deletions(-)
>>>  create mode 100644 Documentation/devicetree/bindings/pci/pci-iommu.txt
>>>
>>
>>
>>
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-15 Thread Auger Eric
Hi Robin,
On 15/09/2016 12:15, Robin Murphy wrote:
> On 15/09/16 10:29, Auger Eric wrote:
>> Hi Robin,
>>
>> On 14/09/2016 14:53, Robin Murphy wrote:
>>> On 14/09/16 13:32, Auger Eric wrote:
>>>> Hi,
>>>> On 14/09/2016 12:35, Robin Murphy wrote:
>>>>> On 14/09/16 09:41, Auger Eric wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 12/09/2016 18:13, Robin Murphy wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> To any more confusing fixups and crazily numbered extra patches, here's
>>>>>>> a quick v7 with everything rebased into the right order. The significant
>>>>>>> change this time is to implement iommu_fwspec properly from the start,
>>>>>>> which ends up being far simpler and more robust than faffing about
>>>>>>> introducing it somewhere 'less intrusive' to move toward core code 
>>>>>>> later.
>>>>>>>
>>>>>>> New branch in the logical place:
>>>>>>>
>>>>>>> git://linux-arm.org/linux-rm iommu/generic-v7
>>>>>>
>>>>>> For information, as discussed privately with Robin I experience some
>>>>>> regressions with the former and now deprecated dt description.
>>>>>>
>>>>>> on my AMD Overdrive board and my old dt description I now only see a
>>>>>> single group:
>>>>>>
>>>>>> /sys/kernel/iommu_groups/
>>>>>> /sys/kernel/iommu_groups/0
>>>>>> /sys/kernel/iommu_groups/0/devices
>>>>>> /sys/kernel/iommu_groups/0/devices/e070.xgmac
>>>>>>
>>>>>> whereas I formerly see
>>>>>>
>>>>>> /sys/kernel/iommu_groups/
>>>>>> /sys/kernel/iommu_groups/3
>>>>>> /sys/kernel/iommu_groups/3/devices
>>>>>> /sys/kernel/iommu_groups/3/devices/:00:00.0
>>>>>> /sys/kernel/iommu_groups/1
>>>>>> /sys/kernel/iommu_groups/1/devices
>>>>>> /sys/kernel/iommu_groups/1/devices/e070.xgmac
>>>>>> /sys/kernel/iommu_groups/4
>>>>>> /sys/kernel/iommu_groups/4/devices
>>>>>> /sys/kernel/iommu_groups/4/devices/:00:02.2
>>>>>> /sys/kernel/iommu_groups/4/devices/:01:00.1
>>>>>> /sys/kernel/iommu_groups/4/devices/:00:02.0
>>>>>> /sys/kernel/iommu_groups/4/devices/:01:00.0
>>>>>> /sys/kernel/iommu_groups/2
>>>>>> /sys/kernel/iommu_groups/2/devices
>>>>>> /sys/kernel/iommu_groups/2/devices/e090.xgmac
>>>>>> /sys/kernel/iommu_groups/0
>>>>>> /sys/kernel/iommu_groups/0/devices
>>>>>> /sys/kernel/iommu_groups/0/devices/f000.pcie
>>>>>>
>>>>>> This is the group topology without ACS override. Applying the non
>>>>>> upstreamed "pci: Enable overrides for missing ACS capabilities" I used
>>>>>> to see separate groups for each PCIe components. Now I don't see any
>>>>>> difference with and without ACS override.
>>>>>
>>>>> OK, having reproduced on my Juno, the problem looks to be that
>>>>> of_for_each_phandle() leaves err set to -ENOENT after successfully
>>>>> walking a phandle list, which makes __find_legacy_master_phandle()
>>>>> always bail out after the first SMMU.
>>>>>
>>>>> Can you confirm that the following diff fixes things for you?
>>>>
>>>> Well it improves but there are still differences in the group topology.
>>>> The PFs now are in group 0.
>>>>
>>>> root@trusty:~# lspci -nk
>>>> 00:00.0 0600: 1022:1a00
>>>> Subsystem: 1022:1a00
>>>> 00:02.0 0600: 1022:1a01
>>>> 00:02.2 0604: 1022:1a02
>>>> Kernel driver in use: pcieport
>>>> 01:00.0 0200: 8086:1521 (rev 01)
>>>> Subsystem: 8086:0002
>>>> Kernel driver in use: igb
>>>> 01:00.1 0200: 8086:1521 (rev 01)
>>>> Subsystem: 8086:0002
>>>> Kernel driver in use: igb
>>>>
>>>>
>>>> with your series + fix:
>>>> /sys/kernel/iommu_groups/
>>>> /sys/kernel/iommu_groups/3
>>>> /sys/kernel/iommu_groups/3/devices
>>>> /sys/

Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed

2016-10-04 Thread Auger Eric
Hi Robin,

On 04/10/2016 19:18, Robin Murphy wrote:
> On 02/10/16 10:56, Christoffer Dall wrote:
>> On Fri, Sep 30, 2016 at 02:24:40PM +0100, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> On 27/09/16 21:48, Eric Auger wrote:
 iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
 IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
 they reserve the IOVA window to prevent the iova allocator to
 allocate in those areas.

 Signed-off-by: Eric Auger 
 ---
  drivers/iommu/dma-iommu.c | 48 
 +++
  include/linux/dma-iommu.h | 18 ++
  2 files changed, 66 insertions(+)

 diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
 index 04bbc85..db21143 100644
 --- a/drivers/iommu/dma-iommu.c
 +++ b/drivers/iommu/dma-iommu.c
 @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct 
 iommu_domain *domain,
return 0;
  }
  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
 +
 +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
 +  phys_addr_t paddr, size_t size, int prot)
 +{
 +  struct iova_domain *iovad;
 +  unsigned long lo, hi;
 +  int ret;
 +
 +  if (domain->type != IOMMU_DOMAIN_MIXED)
 +  return -EINVAL;
 +
 +  if (!domain->iova_cookie)
 +  return -EINVAL;
 +
 +  iovad = cookie_iovad(domain);
 +
 +  lo = iova_pfn(iovad, iova);
 +  hi = iova_pfn(iovad, iova + size - 1);
 +  reserve_iova(iovad, lo, hi);
>>>
>>> This can't work reliably - reserve_iova() will (for good reason) merge
>>> any adjacent or overlapping entries, so any unmap is liable to free more
>>> IOVA space than actually gets unmapped, and things will get subtly out
>>> of sync and go wrong later.
>>>
>>> The more general issue with this whole approach, though, is that it
>>> effectively rules out userspace doing guest memory hotplug or similar,
>>> and I'm not we want to paint ourselves into that corner. Basically, as
>>> soon as a device is attached to a guest, the entirety of the unallocated
>>> IPA space becomes reserved, and userspace can never add anything further
>>> to it, because any given address *might* be in use for an MSI mapping.
>>
>> Ah, we didn't think of that when discussing this design at KVM Forum,
>> because the idea was that the IOVA allocator was in charge of that
>> resource, and the IOVA was a separate concept from the IPA space.
>>
>> I think what tripped us up, is that while the above is true for the MSI
>> configuration where we trap the bar and do the allocation at VFIO init
>> time, the guest device driver can program DMA to any address without
>> trapping, and therefore there's an inherent relationship between the
>> IOVA and the IPA space.  Is that right?
> 
> Yes, for anything the guest knows about and/or can touch directly, IOVA
> must equal IPA, or DMA is going to go horribly wrong. It's only direct
> interactions between device and host behind the guest's back where we
> (may) have some freedom with IOVA assignment.
> 
>>> I think it still makes most sense to stick with the original approach of
>>> cooperating with userspace to reserve a bounded area - it's just that we
>>> can then let automatic mapping take care of itself within that area.
>>
>> I was thinking that it's also possible to do it the other way around: To
>> let userspace say wherever memory may be hotplugged and do the
>> allocation within the remaining area, but I suppose that's pretty much
>> the same thing, and it should just depend on what's easiest to implement
>> and what userspace can best predict.
> 
> Indeed, if userspace *is* able to pre-emptively claim everything it
> might ever want, that does kind of implicitly solve the "tell me where I
> can put this" problem (assuming it doesn't simply claim the whole
> address space, of course), but I'm not so sure it works well if there
> are any specific restrictions (e.g. if some device is going to require
> the MSI range to be 32-bit addressable). It also fails to address the
> issue below...
> 
>>> Speaking of which, I've realised the same fundamental reservation
>>> problem already applies to PCI without ACS, regardless of MSIs. I just
>>> tried on my Juno with guest memory placed at 0x40, (i.e.
>>> matching the host PA of the 64-bit PCI window), and sure enough when the
>>> guest kicks off some DMA on the passed-through NIC, the root complex
>>> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
>>> claimed by the video card, and it fails. I guess this doesn't get hit in
>>> practice on x86 because the guest memory map is unlikely to be much
>>> different from the host's.
>>>
>>> It seems like we basically need a general way of communicating fixed and
>>> movable host reservations to userspace :/
>>>
>>
>> Yes, this makes sense to me.   Do we 

Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies

2016-10-07 Thread Auger Eric
Hi Alex,

On 06/10/2016 22:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:19 +
> Eric Auger  wrote:
> 
>> From: Robin Murphy 
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie.
>>
>> Signed-off-by: Robin Murphy 
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v1 -> v2:
>> - compared to Robin's version
>> - add NULL last param to iommu_dma_init_domain
>> - set the msi_geometry aperture
>> - I removed
>>   if (base < U64_MAX - size)
>>  reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>   don't get why we would reserve something out of the scope of the iova 
>> domain?
>>   what do I miss?
>> ---
>>  drivers/iommu/dma-iommu.c | 40 
>>  include/linux/dma-iommu.h |  9 +
>>  2 files changed, 49 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab866..11da1a0 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>>  msg->address_lo += lower_32_bits(msi_page->iova);
>>  }
>>  }
>> +
>> +/**
>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping 
>> only
> 
> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
> along those lines.  I'm not sure what we're get'ing.  Thanks,
This was chosen by analogy with legacy iommu_get_dma_cookie/
iommu_put_dma_cookie. But in practice it does both get &
iommu_dma_init_domain.

I plan to rename into iommu_setup_dma_msi_region if no objection

Thanks

Eric

> 
> Alex
> 
>> + * @domain: IOMMU domain to prepare
>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>> + * @size: Size of the desired MSI aperture
>> + *
>> + * Users who manage their own IOVA allocation and do not want DMA API 
>> support,
>> + * but would still like to take advantage of automatic MSI remapping, can 
>> use
>> + * this to initialise their own domain appropriately.
>> + */
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +dma_addr_t base, u64 size)
>> +{
>> +struct iommu_dma_cookie *cookie;
>> +struct iova_domain *iovad;
>> +int ret;
>> +
>> +if (domain->type == IOMMU_DOMAIN_DMA)
>> +return -EINVAL;
>> +
>> +ret = iommu_get_dma_cookie(domain);
>> +if (ret)
>> +return ret;
>> +
>> +ret = iommu_dma_init_domain(domain, base, size, NULL);
>> +if (ret) {
>> +iommu_put_dma_cookie(domain);
>> +return ret;
>> +}
>> +
>> +domain->msi_geometry.aperture_start = base;
>> +domain->msi_geometry.aperture_end = base + size - 1;
>> +
>> +cookie = domain->iova_cookie;
>> +iovad = >iovad;
>> +
>> +return 0;
>> +}
>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 32c5890..1c55413 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -67,6 +67,9 @@ int iommu_dma_mapping_error(struct device *dev, dma_addr_t 
>> dma_addr);
>>  /* The DMA API isn't _quite_ the whole story, though... */
>>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  
>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>> +dma_addr_t base, u64 size);
>> +
>>  #else
>>  
>>  struct iommu_domain;
>> @@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct 
>> msi_msg *msg)
>>  {
>>  }
>>  
>> +static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain 
>> *domain,
>> +dma_addr_t base, u64 size)
>> +{
>> +return -ENODEV;
>> +}
>> +
>>  #endif  /* CONFIG_IOMMU_DMA */
>>  #endif  /* __KERNEL__ */
>>  #endif  /* __DMA_IOMMU_H */
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 11/15] vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots

2016-10-07 Thread Auger Eric
Hi Alex,

On 06/10/2016 22:19, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:27 +
> Eric Auger  wrote:
> 
>> Before allowing the end-user to create VFIO_IOVA_RESERVED dma slots,
>> let's implement the expected behavior for removal and replay.
>>
>> As opposed to user dma slots, reserved IOVAs are not systematically bound
>> to PAs and PAs are not pinned. VFIO just initializes the IOVA "aperture".
>> IOVAs are allocated outside of the VFIO framework, by the MSI layer which
>> is responsible to free and unmap them. The MSI mapping resources are freeed
> 
> nit, extra 'e', "freed"
> 
>> by the IOMMU driver on domain destruction.
>>
>> On the creation of a new domain, the "replay" of a reserved slot simply
>> needs to set the MSI aperture on the new domain.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v12 -> v13:
>> - use dma-iommu iommu_get_dma_msi_region_cookie
>>
>> v9 -> v10:
>> - replay of a reserved slot sets the MSI aperture on the new domain
>> - use VFIO_IOVA_RESERVED_MSI enum value instead of VFIO_IOVA_RESERVED
>>
>> v7 -> v8:
>> - do no destroy anything anymore, just bypass unmap/unpin and iommu_map
>>   on replay
>> ---
>>  drivers/vfio/Kconfig|  1 +
>>  drivers/vfio/vfio_iommu_type1.c | 10 +-
>>  2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
>> index da6e2ce..673ec79 100644
>> --- a/drivers/vfio/Kconfig
>> +++ b/drivers/vfio/Kconfig
>> @@ -1,6 +1,7 @@
>>  config VFIO_IOMMU_TYPE1
>>  tristate
>>  depends on VFIO
>> +select IOMMU_DMA
>>  default n
>>  
>>  config VFIO_IOMMU_SPAPR_TCE
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 65a4038..5bc5fc9 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -387,7 +388,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma)
>>  struct vfio_domain *domain, *d;
>>  long unlocked = 0;
>>  
>> -if (!dma->size)
>> +if (!dma->size || dma->type != VFIO_IOVA_USER)
>>  return;
>>  /*
>>   * We use the IOMMU to track the physical addresses, otherwise we'd
>> @@ -724,6 +725,13 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>>  dma = rb_entry(n, struct vfio_dma, node);
>>  iova = dma->iova;
>>  
>> +if (dma->type == VFIO_IOVA_RESERVED_MSI) {
>> +ret = iommu_get_dma_msi_region_cookie(domain->domain,
>> + dma->iova, dma->size);
>> +WARN_ON(ret);
>> +continue;
>> +}
> 
> Why is this a passable error?  We consider an iommu_map() error on any
> entry a failure.
Yes I agree.

Thanks

Eric
> 
>> +
>>  while (iova < dma->iova + dma->size) {
>>  phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
>>  size_t size;
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains

2016-10-07 Thread Auger Eric
Hi Alex,

On 06/10/2016 22:42, Alex Williamson wrote:
> On Thu, 6 Oct 2016 14:20:40 -0600
> Alex Williamson  wrote:
> 
>> On Thu,  6 Oct 2016 08:45:31 +
>> Eric Auger  wrote:
>>
>>> This patch allows the user-space to retrieve the MSI geometry. The
>>> implementation is based on capability chains, now also added to
>>> VFIO_IOMMU_GET_INFO.
>>>
>>> The returned info comprise:
>>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>>   in the positive, the start/end of the aperture,
>>> - or whether the IOVA aperture need to be set by the userspace. In that
>>>   case, the size and alignment of the IOVA window to be provided are
>>>   returned.
>>>
>>> In case the userspace must provide the IOVA aperture, we currently report
>>> a size/alignment based on all the doorbells registered by the host kernel.
>>> This may exceed the actual needs.
>>>
>>> Signed-off-by: Eric Auger 
>>>
>>> ---
>>> v11 -> v11:
>>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
>>>
>>> v9 -> v10:
>>> - move cap_offset after iova_pgsizes
>>> - replace __u64 alignment by __u32 order
>>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>>>   fix alignment
>>> - call msi-doorbell API to compute the size/alignment
>>>
>>> v8 -> v9:
>>> - use iommu_msi_supported flag instead of programmable
>>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>>   capability chain, reporting the MSI geometry
>>>
>>> v7 -> v8:
>>> - use iommu_domain_msi_geometry
>>>
>>> v6 -> v7:
>>> - remove the computation of the number of IOVA pages to be provisionned.
>>>   This number depends on the domain/group/device topology which can
>>>   dynamically change. Let's rely instead rely on an arbitrary max depending
>>>   on the system
>>>
>>> v4 -> v5:
>>> - move msi_info and ret declaration within the conditional code
>>>
>>> v3 -> v4:
>>> - replace former vfio_domains_require_msi_mapping by
>>>   more complex computation of MSI mapping requirements, especially the
>>>   number of pages to be provided by the user-space.
>>> - reword patch title
>>>
>>> RFC v1 -> v1:
>>> - derived from
>>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>>> - renamed allow_msi_reconfig into require_msi_mapping
>>> - fixed VFIO_IOMMU_GET_INFO
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 78 
>>> -
>>>  include/uapi/linux/vfio.h   | 32 -
>>>  2 files changed, 108 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>>> b/drivers/vfio/vfio_iommu_type1.c
>>> index dc3ee5d..ce5e7eb 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -38,6 +38,8 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>> +#include 
>>>  
>>>  #define DRIVER_VERSION  "0.2"
>>>  #define DRIVER_AUTHOR   "Alex Williamson "
>>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct 
>>> vfio_iommu *iommu)
>>> return ret;
>>>  }
>>>  
>>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>>> +struct vfio_info_cap *caps)
>>> +{
>>> +   struct vfio_iommu_type1_info_cap_msi_geometry *vfio_msi_geometry;
>>> +   unsigned long order = __ffs(vfio_pgsize_bitmap(iommu));
>>> +   struct iommu_domain_msi_geometry msi_geometry;
>>> +   struct vfio_info_cap_header *header;
>>> +   struct vfio_domain *d;
>>> +   bool reserved;
>>> +   size_t size;
>>> +
>>> +   mutex_lock(>lock);
>>> +   /* All domains have same require_msi_map property, pick first */
>>> +   d = list_first_entry(>domain_list, struct vfio_domain, next);
>>> +   iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_GEOMETRY,
>>> + _geometry);
>>> +   reserved = !msi_geometry.iommu_msi_supported;
>>> +
>>> +   mutex_unlock(>lock);
>>> +
>>> +   size = sizeof(*vfio_msi_geometry);
>>> +   header = vfio_info_cap_add(caps, size,
>>> +  VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY, 1);
>>> +
>>> +   if (IS_ERR(header))
>>> +   return PTR_ERR(header);
>>> +
>>> +   vfio_msi_geometry = container_of(header,
>>> +   struct vfio_iommu_type1_info_cap_msi_geometry,
>>> +   header);
>>> +
>>> +   vfio_msi_geometry->flags = reserved;  
>>
>> Use the bit flag VFIO_IOMMU_MSI_GEOMETRY_RESERVED
>>
>>> +   if (reserved) {
>>> +   vfio_msi_geometry->aperture_start = msi_geometry.aperture_start;
>>> +   vfio_msi_geometry->aperture_end = msi_geometry.aperture_end;  
>>
>> But maybe nobody has set these, did you intend to use
>> iommu_domain_msi_aperture_valid(), which you defined early on but never
>> used?
>>
>>> +   return 0;
>>> +   }
>>> +
>>> +   vfio_msi_geometry->order = order;  
>>
>> I'm tempted to suggest that a user could do the same math on their 

Re: [PATCH v13 12/15] vfio: Allow reserved msi iova registration

2016-10-07 Thread Auger Eric
Hi Alex,

On 06/10/2016 22:19, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:28 +
> Eric Auger  wrote:
> 
>> The user is allowed to register a reserved MSI IOVA range by using the
>> DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>> This region is stored in the vfio_dma rb tree. At that point the iova
>> range is not mapped to any target address yet. The host kernel will use
>> those iova when needed, typically when MSIs are allocated.
>>
>> Signed-off-by: Eric Auger 
>> Signed-off-by: Bharat Bhushan 
>>
>> ---
>> v12 -> v13:
>> - use iommu_get_dma_msi_region_cookie
>>
>> v9 -> v10
>> - use VFIO_IOVA_RESERVED_MSI enum value
>>
>> v7 -> v8:
>> - use iommu_msi_set_aperture function. There is no notion of
>>   unregistration anymore since the reserved msi slot remains
>>   until the container gets closed.
>>
>> v6 -> v7:
>> - use iommu_free_reserved_iova_domain
>> - convey prot attributes downto dma-reserved-iommu iova domain creation
>> - reserved bindings teardown now performed on iommu domain destruction
>> - rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
>>  VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>> - change title
>> - pass the protection attribute to dma-reserved-iommu API
>>
>> v3 -> v4:
>> - use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>> - protect vfio_register_reserved_iova_range implementation with
>>   CONFIG_IOMMU_DMA_RESERVED
>> - handle unregistration by user-space and on vfio_iommu_type1 release
>>
>> v1 -> v2:
>> - set returned value according to alloc_reserved_iova_domain result
>> - free the iova domains in case any error occurs
>>
>> RFC v1 -> v1:
>> - takes into account Alex comments, based on
>>   [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>> - use the existing dma map/unmap ioctl interface with a flag to register
>>   a reserved IOVA range. A single reserved iova region is allowed.
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 77 
>> -
>>  include/uapi/linux/vfio.h   | 10 +-
>>  2 files changed, 85 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 5bc5fc9..c2f8bd9 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -442,6 +442,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma)
>>  vfio_lock_acct(-unlocked);
>>  }
>>  
>> +static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>> +dma_addr_t iova, size_t size)
>> +{
>> +struct vfio_domain *d;
>> +int ret = 0;
>> +
>> +list_for_each_entry(d, >domain_list, next) {
>> +ret = iommu_get_dma_msi_region_cookie(d->domain, iova, size);
>> +if (ret)
>> +break;
>> +}
>> +return ret;
> 
> Doesn't this need an unwind on failure loop?
At the moment the de-allocation is done by the smmu driver, on
domain_free ops, which calls iommu_put_dma_cookie. In case,
iommu_get_dma_msi_region_cookie fails on a given VFIO domain currently
there is no other way but destroying all VFIO domains and redo everything.

So yes I plan to unfold everything, ie call iommu_put_dma_cookie for
each domain.
> 
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  vfio_unmap_unpin(iommu, dma);
>> @@ -691,6 +705,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  return ret;
>>  }
>>  
>> +static int vfio_register_msi_range(struct vfio_iommu *iommu,
>> +   struct vfio_iommu_type1_dma_map *map)
>> +{
>> +dma_addr_t iova = map->iova;
>> +size_t size = map->size;
>> +int ret = 0;
>> +struct vfio_dma *dma;
>> +unsigned long order;
>> +uint64_t mask;
>> +
>> +/* Verify that none of our __u64 fields overflow */
>> +if (map->size != size || map->iova != iova)
>> +return -EINVAL;
>> +
>> +order =  __ffs(vfio_pgsize_bitmap(iommu));
>> +mask = ((uint64_t)1 << order) - 1;
>> +
>> +WARN_ON(mask & PAGE_MASK);
>> +
>> +if (!size || (size | iova) & mask)
>> +return -EINVAL;
>> +
>> +/* Don't allow IOVA address wrap */
>> +if (iova + size - 1 < iova)
>> +return -EINVAL;
>> +
>> +mutex_lock(>lock);
>> +
>> +if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>> +ret =  -EEXIST;
>> +goto unlock;
>> +}
>> +
>> +dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>> +if (!dma) {
>> +ret = -ENOMEM;
>> +goto unlock;
>> +}
>> +
>> +dma->iova = iova;
>> +dma->size = size;
>> +dma->type = VFIO_IOVA_RESERVED_MSI;
>> +
>> +ret = vfio_set_msi_aperture(iommu, iova, size);
>> +if (ret)
>> +goto free_unlock;
>> +
>> +vfio_link_dma(iommu, dma);
>> +goto 

Re: [PATCH v13 04/15] genirq/msi: Introduce the MSI doorbell API

2016-10-07 Thread Auger Eric
Hi Alex,

On 06/10/2016 22:17, Alex Williamson wrote:
> On Thu,  6 Oct 2016 08:45:20 +
> Eric Auger  wrote:
> 
>> We introduce a new msi-doorbell API that allows msi controllers
>> to allocate and register their doorbells. This is useful when
>> those doorbells are likely to be iommu mapped (typically on ARM).
>> The VFIO layer will need to gather information about those doorbells:
>> whether they are safe (ie. they implement irq remapping) and how
>> many IOMMU pages are requested to map all of them.
>>
>> This patch first introduces the dedicated msi_doorbell_info struct
>> and the registration/unregistration functions.
>>
>> A doorbell region is characterized by its physical address base, size,
>> and whether it its safe (ie. it implements IRQ remapping). A doorbell
>> can be per-cpu of global. We currently only care about global doorbells.
>  ^^ s/of/or/
OK
> 
>>
>> A function returns whether all doorbells are safe.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v12 -> v13:
>> - directly select MSI_DOORBELL in ARM_SMMU and ARM_SMMU_V3 configs
>> - remove prot attribute
>> - move msi_doorbell_info struct definition in msi-doorbell.c
>> - change the commit title
>> - change proto of the registration function
>> - msi_doorbell_safe now in this patch
>>
>> v11 -> v12:
>> - rename irqchip_doorbell into msi_doorbell, irqchip_doorbell_list
>>   into msi_doorbell_list and irqchip_doorbell_mutex into
>>   msi_doorbell_mutex
>> - fix style issues: align msi_doorbell struct members, kernel-doc comments
>> - use kzalloc
>> - use container_of in msi_doorbell_unregister_global
>> - compute nb_unsafe_doorbells on registration/unregistration
>> - registration simply returns NULL if allocation failed
>>
>> v10 -> v11:
>> - remove void *chip_data argument from register/unregister function
>> - remove lookup funtions since we restored the struct irq_chip
>>   msi_doorbell_info ops to realize this function
>> - reword commit message and title
>>
>> Conflicts:
>>  kernel/irq/Makefile
>>
>> Conflicts:
>>  drivers/iommu/Kconfig
>> ---
>>  drivers/iommu/Kconfig|  2 +
>>  include/linux/msi-doorbell.h | 77 ++
>>  kernel/irq/Kconfig   |  4 ++
>>  kernel/irq/Makefile  |  1 +
>>  kernel/irq/msi-doorbell.c| 98 
>> 
>>  5 files changed, 182 insertions(+)
>>  create mode 100644 include/linux/msi-doorbell.h
>>  create mode 100644 kernel/irq/msi-doorbell.c
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index 8ee54d7..0cc7fac 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -297,6 +297,7 @@ config SPAPR_TCE_IOMMU
>>  config ARM_SMMU
>>  bool "ARM Ltd. System MMU (SMMU) Support"
>>  depends on (ARM64 || ARM) && MMU
>> +select MSI_DOORBELL
>>  select IOMMU_API
>>  select IOMMU_IO_PGTABLE_LPAE
>>  select ARM_DMA_USE_IOMMU if ARM
>> @@ -310,6 +311,7 @@ config ARM_SMMU
>>  config ARM_SMMU_V3
>>  bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>  depends on ARM64
>> +select MSI_DOORBELL
>>  select IOMMU_API
>>  select IOMMU_IO_PGTABLE_LPAE
>>  select GENERIC_MSI_IRQ_DOMAIN
>> diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
>> new file mode 100644
>> index 000..c18a382
>> --- /dev/null
>> +++ b/include/linux/msi-doorbell.h
>> @@ -0,0 +1,77 @@
>> +/*
>> + * API to register/query MSI doorbells likely to be IOMMU mapped
>> + *
>> + * Copyright (C) 2016 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#ifndef _LINUX_MSI_DOORBELL_H
>> +#define _LINUX_MSI_DOORBELL_H
>> +
>> +struct msi_doorbell_info;
>> +
>> +#ifdef CONFIG_MSI_DOORBELL
>> +
>> +/**
>> + * msi_doorbell_register - allocate and register a global doorbell
>> + * @base: physical base address of the global doorbell
>> + * @size: size of the global doorbell
>> + * @prot: protection/memory attributes
>> + * @safe: true is irq_remapping implemented for this doorbell
>> + * @dbinfo: returned doorbell info
>> + *
>> + * Return: 0 on success, -ENOMEM on allocation failure
>> + */
>> +int msi_doorbell_register_global(phys_addr_t base, size_t size,
>> + bool safe,
>> + struct msi_doorbell_info **dbinfo);
>> +
> 
> Seems like alloc/free 

Re: [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU

2016-09-15 Thread Auger Eric
Hi Robin,

On 14/09/2016 14:53, Robin Murphy wrote:
> On 14/09/16 13:32, Auger Eric wrote:
>> Hi,
>> On 14/09/2016 12:35, Robin Murphy wrote:
>>> On 14/09/16 09:41, Auger Eric wrote:
>>>> Hi,
>>>>
>>>> On 12/09/2016 18:13, Robin Murphy wrote:
>>>>> Hi all,
>>>>>
>>>>> To any more confusing fixups and crazily numbered extra patches, here's
>>>>> a quick v7 with everything rebased into the right order. The significant
>>>>> change this time is to implement iommu_fwspec properly from the start,
>>>>> which ends up being far simpler and more robust than faffing about
>>>>> introducing it somewhere 'less intrusive' to move toward core code later.
>>>>>
>>>>> New branch in the logical place:
>>>>>
>>>>> git://linux-arm.org/linux-rm iommu/generic-v7
>>>>
>>>> For information, as discussed privately with Robin I experience some
>>>> regressions with the former and now deprecated dt description.
>>>>
>>>> on my AMD Overdrive board and my old dt description I now only see a
>>>> single group:
>>>>
>>>> /sys/kernel/iommu_groups/
>>>> /sys/kernel/iommu_groups/0
>>>> /sys/kernel/iommu_groups/0/devices
>>>> /sys/kernel/iommu_groups/0/devices/e070.xgmac
>>>>
>>>> whereas I formerly see
>>>>
>>>> /sys/kernel/iommu_groups/
>>>> /sys/kernel/iommu_groups/3
>>>> /sys/kernel/iommu_groups/3/devices
>>>> /sys/kernel/iommu_groups/3/devices/:00:00.0
>>>> /sys/kernel/iommu_groups/1
>>>> /sys/kernel/iommu_groups/1/devices
>>>> /sys/kernel/iommu_groups/1/devices/e070.xgmac
>>>> /sys/kernel/iommu_groups/4
>>>> /sys/kernel/iommu_groups/4/devices
>>>> /sys/kernel/iommu_groups/4/devices/:00:02.2
>>>> /sys/kernel/iommu_groups/4/devices/:01:00.1
>>>> /sys/kernel/iommu_groups/4/devices/:00:02.0
>>>> /sys/kernel/iommu_groups/4/devices/:01:00.0
>>>> /sys/kernel/iommu_groups/2
>>>> /sys/kernel/iommu_groups/2/devices
>>>> /sys/kernel/iommu_groups/2/devices/e090.xgmac
>>>> /sys/kernel/iommu_groups/0
>>>> /sys/kernel/iommu_groups/0/devices
>>>> /sys/kernel/iommu_groups/0/devices/f000.pcie
>>>>
>>>> This is the group topology without ACS override. Applying the non
>>>> upstreamed "pci: Enable overrides for missing ACS capabilities" I used
>>>> to see separate groups for each PCIe components. Now I don't see any
>>>> difference with and without ACS override.
>>>
>>> OK, having reproduced on my Juno, the problem looks to be that
>>> of_for_each_phandle() leaves err set to -ENOENT after successfully
>>> walking a phandle list, which makes __find_legacy_master_phandle()
>>> always bail out after the first SMMU.
>>>
>>> Can you confirm that the following diff fixes things for you?
>>
>> Well it improves but there are still differences in the group topology.
>> The PFs now are in group 0.
>>
>> root@trusty:~# lspci -nk
>> 00:00.0 0600: 1022:1a00
>> Subsystem: 1022:1a00
>> 00:02.0 0600: 1022:1a01
>> 00:02.2 0604: 1022:1a02
>> Kernel driver in use: pcieport
>> 01:00.0 0200: 8086:1521 (rev 01)
>> Subsystem: 8086:0002
>> Kernel driver in use: igb
>> 01:00.1 0200: 8086:1521 (rev 01)
>> Subsystem: 8086:0002
>> Kernel driver in use: igb
>>
>>
>> with your series + fix:
>> /sys/kernel/iommu_groups/
>> /sys/kernel/iommu_groups/3
>> /sys/kernel/iommu_groups/3/devices
>> /sys/kernel/iommu_groups/3/devices/:00:00.0
>> /sys/kernel/iommu_groups/1
>> /sys/kernel/iommu_groups/1/devices
>> /sys/kernel/iommu_groups/1/devices/e070.xgmac
>> /sys/kernel/iommu_groups/4
>> /sys/kernel/iommu_groups/4/devices
>> /sys/kernel/iommu_groups/4/devices/:00:02.2
>> /sys/kernel/iommu_groups/4/devices/:00:02.0
>> /sys/kernel/iommu_groups/2
>> /sys/kernel/iommu_groups/2/devices
>> /sys/kernel/iommu_groups/2/devices/e090.xgmac
>> /sys/kernel/iommu_groups/0
>> /sys/kernel/iommu_groups/0/devices
>> /sys/kernel/iommu_groups/0/devices/:01:00.1
>> /sys/kernel/iommu_groups/0/devices/f000.pcie
>> /sys/kernel/iommu_groups/0/devices/:01:00.0
>>
>> Before (4.8-rc5):
>>
>> /sys/kernel/iommu_groups/

Re: [PATCH v5 19/19] iommu/dma: Add support for mapping MSIs

2016-08-25 Thread Auger Eric
Hi Robin,

On 23/08/2016 21:05, Robin Murphy wrote:
> When an MSI doorbell is located downstream of an IOMMU, attaching
> devices to a DMA ops domain and switching on translation leads to a rude
> shock when their attempt to write to the physical address returned by
> the irqchip driver faults (or worse, writes into some already-mapped
> buffer) and no interrupt is forthcoming.
> 
> Address this by adding a hook for relevant irqchip drivers to call from
> their compose_msi_msg() callback, to swizzle the physical address with
> an appropriatly-mapped IOVA for any device attached to one of our DMA
> ops domains.
> 
> CC: Thomas Gleixner 
> CC: Jason Cooper 
> CC: Marc Zyngier 
> CC: linux-ker...@vger.kernel.org
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/dma-iommu.c| 141 
> ++-
>  drivers/irqchip/irq-gic-v2m.c|   3 +
>  drivers/irqchip/irq-gic-v3-its.c |   3 +
>  include/linux/dma-iommu.h|   9 +++
>  4 files changed, 141 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 00c8a08d56e7..330cce60cad9 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -25,10 +25,29 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
> +struct iommu_dma_msi_page {
> + struct list_headlist;
> + dma_addr_t  iova;
> + u32 phys_lo;
> + u32 phys_hi;
> +};
> +
> +struct iommu_dma_cookie {
> + struct iova_domain  iovad;
> + struct list_headmsi_page_list;
> + spinlock_t  msi_lock;
> +};
> +
> +static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
> +{
> + return &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad;
> +}
> +
>  int iommu_dma_init(void)
>  {
>   return iova_cache_get();
> @@ -43,15 +62,19 @@ int iommu_dma_init(void)
>   */
>  int iommu_get_dma_cookie(struct iommu_domain *domain)
>  {
> - struct iova_domain *iovad;
> + struct iommu_dma_cookie *cookie;
>  
>   if (domain->iova_cookie)
>   return -EEXIST;
>  
> - iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
> - domain->iova_cookie = iovad;
> + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
> + if (!cookie)
> + return -ENOMEM;
>  
> - return iovad ? 0 : -ENOMEM;
> + spin_lock_init(>msi_lock);
> + INIT_LIST_HEAD(>msi_page_list);
> + domain->iova_cookie = cookie;
> + return 0;
>  }
>  EXPORT_SYMBOL(iommu_get_dma_cookie);
>  
> @@ -63,14 +86,20 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
>   */
>  void iommu_put_dma_cookie(struct iommu_domain *domain)
>  {
> - struct iova_domain *iovad = domain->iova_cookie;
> + struct iommu_dma_cookie *cookie = domain->iova_cookie;
> + struct iommu_dma_msi_page *msi, *tmp;
>  
> - if (!iovad)
> + if (!cookie)
>   return;
>  
> - if (iovad->granule)
> - put_iova_domain(iovad);
> - kfree(iovad);
> + if (cookie->iovad.granule)
> + put_iova_domain(>iovad);
> +
> + list_for_each_entry_safe(msi, tmp, >msi_page_list, list) {
> + list_del(>list);
> + kfree(msi);
> + }
> + kfree(cookie);
>   domain->iova_cookie = NULL;
>  }
>  EXPORT_SYMBOL(iommu_put_dma_cookie);
> @@ -88,7 +117,7 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
>   */
>  int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
> size)
>  {
> - struct iova_domain *iovad = domain->iova_cookie;
> + struct iova_domain *iovad = cookie_iovad(domain);
>   unsigned long order, base_pfn, end_pfn;
>  
>   if (!iovad)
> @@ -155,7 +184,7 @@ int dma_direction_to_prot(enum dma_data_direction dir, 
> bool coherent)
>  static struct iova *__alloc_iova(struct iommu_domain *domain, size_t size,
>   dma_addr_t dma_limit)
>  {
> - struct iova_domain *iovad = domain->iova_cookie;
> + struct iova_domain *iovad = cookie_iovad(domain);
>   unsigned long shift = iova_shift(iovad);
>   unsigned long length = iova_align(iovad, size) >> shift;
>  
> @@ -171,7 +200,7 @@ static struct iova *__alloc_iova(struct iommu_domain 
> *domain, size_t size,
>  /* The IOVA allocator knows what we mapped, so just unmap whatever that was 
> */
>  static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t 
> dma_addr)
>  {
> - struct iova_domain *iovad = domain->iova_cookie;
> + struct iova_domain *iovad = cookie_iovad(domain);
>   unsigned long shift = iova_shift(iovad);
>   unsigned long pfn = dma_addr >> shift;
>   struct iova *iova = find_iova(iovad, pfn);
> @@ -294,7 +323,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
> size, gfp_t gfp,
>   void (*flush_page)(struct device *, const void *, phys_addr_t))
>  {
>  

Re: [RFC 05/11] iommu/dma: iommu_dma_(un)map_mixed

2016-10-03 Thread Auger Eric
Hi Robin,

On 30/09/2016 15:24, Robin Murphy wrote:
> Hi Eric,
> 
> On 27/09/16 21:48, Eric Auger wrote:
>> iommu_dma_map_mixed and iommu_dma_unmap_mixed operate on
>> IOMMU_DOMAIN_MIXED typed domains. On top of standard iommu_map/unmap
>> they reserve the IOVA window to prevent the iova allocator to
>> allocate in those areas.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/iommu/dma-iommu.c | 48 
>> +++
>>  include/linux/dma-iommu.h | 18 ++
>>  2 files changed, 66 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 04bbc85..db21143 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -759,3 +759,51 @@ int iommu_get_dma_msi_region_cookie(struct iommu_domain 
>> *domain,
>>  return 0;
>>  }
>>  EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>> +
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +phys_addr_t paddr, size_t size, int prot)
>> +{
>> +struct iova_domain *iovad;
>> +unsigned long lo, hi;
>> +int ret;
>> +
>> +if (domain->type != IOMMU_DOMAIN_MIXED)
>> +return -EINVAL;
>> +
>> +if (!domain->iova_cookie)
>> +return -EINVAL;
>> +
>> +iovad = cookie_iovad(domain);
>> +
>> +lo = iova_pfn(iovad, iova);
>> +hi = iova_pfn(iovad, iova + size - 1);
>> +reserve_iova(iovad, lo, hi);
> 
> This can't work reliably - reserve_iova() will (for good reason) merge
> any adjacent or overlapping entries, so any unmap is liable to free more
> IOVA space than actually gets unmapped, and things will get subtly out
> of sync and go wrong later.
OK. I did not notice that.
> 
> The more general issue with this whole approach, though, is that it
> effectively rules out userspace doing guest memory hotplug or similar,
> and I'm not we want to paint ourselves into that corner. Basically, as
> soon as a device is attached to a guest, the entirety of the unallocated
> IPA space becomes reserved, and userspace can never add anything further
> to it, because any given address *might* be in use for an MSI mapping.
I fully agree. My bad, I mixed up about how/when the PCI MMIO space was
iommu mapped. So we don't have any other solution than having the guest
providing unused and non reserved GPA. Back to the original approach then.
> 
> I think it still makes most sense to stick with the original approach of
> cooperating with userspace to reserve a bounded area - it's just that we
> can then let automatic mapping take care of itself within that area.
OK will respin asap.
> 
> Speaking of which, I've realised the same fundamental reservation
> problem already applies to PCI without ACS, regardless of MSIs. I just
> tried on my Juno with guest memory placed at 0x40, (i.e.
> matching the host PA of the 64-bit PCI window), and sure enough when the
> guest kicks off some DMA on the passed-through NIC, the root complex
> interprets the guest IPA as (unsupported) peer-to-peer DMA to a BAR
> claimed by the video card, and it fails. I guess this doesn't get hit in
> practice on x86 because the guest memory map is unlikely to be much
> different from the host's.
> 
> It seems like we basically need a general way of communicating fixed and
> movable host reservations to userspace :/

Yes I saw "iommu/dma: Avoid PCI host bridge windows". Well this looks
like a generalisation of the MSI geometry issue (they also face this one
on x86 with a non x86 guest). This will also hit the fact that on QEMU
the ARM guest memory map is static.

Thank you for your time

Best Regards

Eric
> 
> Robin.
> 
>> +ret = iommu_map(domain, iova, paddr, size, prot);
>> +if (ret)
>> +free_iova(iovad, lo);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_map_mixed);
>> +
>> +size_t iommu_dma_unmap_mixed(struct iommu_domain *domain, unsigned long 
>> iova,
>> + size_t size)
>> +{
>> +struct iova_domain *iovad;
>> +unsigned long lo;
>> +size_t ret;
>> +
>> +if (domain->type != IOMMU_DOMAIN_MIXED)
>> +return -EINVAL;
>> +
>> +if (!domain->iova_cookie)
>> +return -EINVAL;
>> +
>> +iovad = cookie_iovad(domain);
>> +lo = iova_pfn(iovad, iova);
>> +
>> +ret = iommu_unmap(domain, iova, size);
>> +if (ret == size)
>> +free_iova(iovad, lo);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_unmap_mixed);
>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>> index 1c55413..f2aa855 100644
>> --- a/include/linux/dma-iommu.h
>> +++ b/include/linux/dma-iommu.h
>> @@ -70,6 +70,12 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
>>  int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>  dma_addr_t base, u64 size);
>>  
>> +int iommu_dma_map_mixed(struct iommu_domain *domain, unsigned long iova,
>> +phys_addr_t 

Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Auger Eric
Hi Will,

On 08/11/2016 03:45, Will Deacon wrote:
> Hi all,
> 
> I figured this was a reasonable post to piggy-back on for the LPC minutes
> relating to guest MSIs on arm64.
> 
> On Thu, Nov 03, 2016 at 10:02:05PM -0600, Alex Williamson wrote:
>> We can always have QEMU reject hot-adding the device if the reserved
>> region overlaps existing guest RAM, but I don't even really see how we
>> advise users to give them a reasonable chance of avoiding that
>> possibility.  Apparently there are also ARM platforms where MSI pages
>> cannot be remapped to support the previous programmable user/VM
>> address, is it even worthwhile to support those platforms?  Does that
>> decision influence whether user programmable MSI reserved regions are
>> really a second class citizen to fixed reserved regions?  I expect
>> we'll be talking about this tomorrow morning, but I certainly haven't
>> come up with any viable solutions to this.  Thanks,
> 
> At LPC last week, we discussed guest MSIs on arm64 as part of the PCI
> microconference. I presented some slides to illustrate some of the issues
> we're trying to solve:
> 
>   http://www.willdeacon.ukfsn.org/bitbucket/lpc-16/msi-in-guest-arm64.pdf
> 
> Punit took some notes (thanks!) on the etherpad here:
> 
>   https://etherpad.openstack.org/p/LPC2016_PCI

Thanks to both of you for the minutes and slides. Unfortunately I could
not travel but my ears were burning ;-)
> 
> although the discussion was pretty lively and jumped about, so I've had
> to go from memory where the notes didn't capture everything that was
> said.
> 
> To summarise, arm64 platforms differ in their handling of MSIs when compared
> to x86:
> 
>   1. The physical memory map is not standardised (Jon pointed out that
>  this is something that was realised late on)
>   2. MSIs are usually treated the same as DMA writes, in that they must be
>  mapped by the SMMU page tables so that they target a physical MSI
>  doorbell
>   3. On some platforms, MSIs bypass the SMMU entirely (e.g. due to an MSI
>  doorbell built into the PCI RC)
>   4. Platforms typically have some set of addresses that abort before
>  reaching the SMMU (e.g. because the PCI identifies them as P2P).
> 
> All of this means that userspace (QEMU) needs to identify the memory
> regions corresponding to points (3) and (4) and ensure that they are
> not allocated in the guest physical (IPA) space. For platforms that can
> remap the MSI doorbell as in (2), then some space also needs to be
> allocated for that.
> 
> Rather than treat these as separate problems, a better interface is to
> tell userspace about a set of reserved regions, and have this include
> the MSI doorbell, irrespective of whether or not it can be remapped.
> Don suggested that we statically pick an address for the doorbell in a
> similar way to x86, and have the kernel map it there. We could even pick
> 0xfee0. If it conflicts with a reserved region on the platform (due
> to (4)), then we'd obviously have to (deterministically?) allocate it
> somewhere else, but probably within the bottom 4G.

This is tentatively achieved now with
[1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
(http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)
> 
> The next question is how to tell userspace about all of the reserved
> regions. Initially, the idea was to extend VFIO, however Alex pointed
> out a horrible scenario:
> 
>   1. QEMU spawns a VM on system 0
>   2. VM is migrated to system 1
>   3. QEMU attempts to passthrough a device using PCI hotplug
> 
> In this scenario, the guest memory map is chosen at step (1), yet there
> is no VFIO fd available to determine the reserved regions. Furthermore,
> the reserved regions may vary between system 0 and system 1. This pretty
> much rules out using VFIO to determine the reserved regions.Alex suggested
> that the SMMU driver can advertise the regions via /sys/class/iommu/. This
> would solve part of the problem, but migration between systems with
> different memory maps can still cause problems if the reserved regions
> of the new system conflict with the guest memory map chosen by QEMU.


OK so I understand we do not want anymore the VFIO chain capability API
(patch 5 of above series) but we prefer a sysfs approach instead.

I understand the sysfs approach which allows the userspace to get the
info earlier and independently on VFIO. Keeping in mind current QEMU
virt - which is not the only userspace - will not do much from this info
until we bring upheavals in virt address space management. So if I am
not wrong, at the moment the main action to be undertaken is the
rejection of the PCI hotplug in case we detect a collision.

I can respin [1]
- studying and taking into account Robin's comments about dm_regions
similarities
- removing the VFIO capability chain and replacing this by a sysfs API

Would that be OK?

What about Alex comments who wanted to report the usable memory ranges
instead of 

Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-08 Thread Auger Eric
Hi Will,
On 08/11/2016 20:02, Don Dutile wrote:
> On 11/08/2016 12:54 PM, Will Deacon wrote:
>> On Tue, Nov 08, 2016 at 03:27:23PM +0100, Auger Eric wrote:
>>> On 08/11/2016 03:45, Will Deacon wrote:
>>>> Rather than treat these as separate problems, a better interface is to
>>>> tell userspace about a set of reserved regions, and have this include
>>>> the MSI doorbell, irrespective of whether or not it can be remapped.
>>>> Don suggested that we statically pick an address for the doorbell in a
>>>> similar way to x86, and have the kernel map it there. We could even
>>>> pick
>>>> 0xfee0. If it conflicts with a reserved region on the platform (due
>>>> to (4)), then we'd obviously have to (deterministically?) allocate it
>>>> somewhere else, but probably within the bottom 4G.
>>> This is tentatively achieved now with
>>> [1] [RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
>>> (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1264506.html)
>>>
>> Yup, I saw that fly by. Hopefully some of the internals can be reused
>> with the current thinking on user ABI.
>>
>>>> The next question is how to tell userspace about all of the reserved
>>>> regions. Initially, the idea was to extend VFIO, however Alex pointed
>>>> out a horrible scenario:
>>>>
>>>>1. QEMU spawns a VM on system 0
>>>>2. VM is migrated to system 1
>>>>3. QEMU attempts to passthrough a device using PCI hotplug
>>>>
>>>> In this scenario, the guest memory map is chosen at step (1), yet there
>>>> is no VFIO fd available to determine the reserved regions. Furthermore,
>>>> the reserved regions may vary between system 0 and system 1. This
>>>> pretty
>>>> much rules out using VFIO to determine the reserved regions.Alex
>>>> suggested
>>>> that the SMMU driver can advertise the regions via
>>>> /sys/class/iommu/. This
>>>> would solve part of the problem, but migration between systems with
>>>> different memory maps can still cause problems if the reserved regions
>>>> of the new system conflict with the guest memory map chosen by QEMU.
>>>
>>> OK so I understand we do not want anymore the VFIO chain capability API
>>> (patch 5 of above series) but we prefer a sysfs approach instead.
>> Right.
>>
>>> I understand the sysfs approach which allows the userspace to get the
>>> info earlier and independently on VFIO. Keeping in mind current QEMU
>>> virt - which is not the only userspace - will not do much from this info
>>> until we bring upheavals in virt address space management. So if I am
>>> not wrong, at the moment the main action to be undertaken is the
>>> rejection of the PCI hotplug in case we detect a collision.
>> I don't think so; it should be up to userspace to reject the hotplug.
>> If userspace doesn't have support for the regions, then that's fine --
>> you just end up in a situation where the CPU page table maps memory
>> somewhere that the device can't see. In other words, you'll end up with
>> spurious DMA failures, but that's exactly what happens with current
>> systems
>> if you passthrough an overlapping region (Robin demonstrated this on
>> Juno).
>>
>> Additionally, you can imagine some future support where you can tell the
>> guest not to use certain regions of its memory for DMA. In this case, you
>> wouldn't want to refuse the hotplug in the case of overlapping regions.
>>
>> Really, I think the kernel side just needs to enumerate the fixed
>> reserved
>> regions, place the doorbell at a fixed address and then advertise these
>> via sysfs.
>>
>>> I can respin [1]
>>> - studying and taking into account Robin's comments about dm_regions
>>> similarities
>>> - removing the VFIO capability chain and replacing this by a sysfs API
>> Ideally, this would be reusable between different SMMU drivers so the
>> sysfs
>> entries have the same format etc.
>>
>>> Would that be OK?
>> Sounds good to me. Are you in a position to prototype something on the
>> qemu
>> side once we've got kernel-side agreement?
yes sure.
>>
>>> What about Alex comments who wanted to report the usable memory ranges
>>> instead of unusable memory ranges?
>>>
>>> Also did you have a chance to discuss the following items:
>>> 1) the VFIO irq safety assessment
>> The discussion

Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-09 Thread Auger Eric
Hi,

On 10/11/2016 00:59, Alex Williamson wrote:
> On Wed, 9 Nov 2016 23:38:50 +
> Will Deacon  wrote:
> 
>> On Wed, Nov 09, 2016 at 04:24:58PM -0700, Alex Williamson wrote:
>>> On Wed, 9 Nov 2016 22:25:22 +
>>> Will Deacon  wrote:
>>>   
 On Wed, Nov 09, 2016 at 03:17:09PM -0700, Alex Williamson wrote:  
> On Wed, 9 Nov 2016 20:31:45 +
> Will Deacon  wrote:
>> On Wed, Nov 09, 2016 at 08:23:03PM +0100, Christoffer Dall wrote:
>>>
>>> (I suppose it's technically possible to get around this issue by letting
>>> QEMU place RAM wherever it wants but tell the guest to never use a
>>> particular subset of its RAM for DMA, because that would conflict with
>>> the doorbell IOVA or be seen as p2p transactions.  But I think we all
>>> probably agree that it's a disgusting idea.)  
>>
>> Disgusting, yes, but Ben's idea of hotplugging on the host controller 
>> with
>> firmware tables describing the reserved regions is something that we 
>> could
>> do in the distant future. In the meantime, I don't think that VFIO should
>> explicitly reject overlapping mappings if userspace asks for them.
>
> I'm confused by the last sentence here, rejecting user mappings that
> overlap reserved ranges, such as MSI doorbell pages, is exactly how
> we'd reject hot-adding a device when we meet such a conflict.  If we
> don't reject such a mapping, we're knowingly creating a situation that
> potentially leads to data loss.  Minimally, QEMU would need to know
> about the reserved region, map around it through VFIO, and take
> responsibility (somehow) for making sure that region is never used for
> DMA.  Thanks,

 Yes, but my point is that it should be up to QEMU to abort the hotplug, not
 the host kernel, since there may be ways in which a guest can tolerate the
 overlapping region (e.g. by avoiding that range of memory for DMA).  
>>>
>>> The VFIO_IOMMU_MAP_DMA ioctl is a contract, the user ask to map a range
>>> of IOVAs to a range of virtual addresses for a given device.  If VFIO
>>> cannot reasonably fulfill that contract, it must fail.  It's up to QEMU
>>> how to manage the hotplug and what memory regions it asks VFIO to map
>>> for a device, but VFIO must reject mappings that it (or the SMMU by
>>> virtue of using the IOMMU API) know to overlap reserved ranges.  So I
>>> still disagree with the referenced statement.  Thanks,  
>>
>> I think that's a pity. Not only does it mean that both QEMU and the kernel
>> have more work to do (the former has to carve up its mapping requests,
>> whilst the latter has to check that it is indeed doing this), but it also
>> precludes the use of hugepage mappings on the IOMMU because of reserved
>> regions. For example, a 4k hole someplace may mean we can't put down 1GB
>> table entries for the guest memory in the SMMU.
>>
>> All this seems to do is add complexity and decrease performance. For what?
>> QEMU has to go read the reserved regions from someplace anyway. It's also
>> the way that VFIO works *today* on arm64 wrt reserved regions, it just has
>> no way to identify those holes at present.
> 
> Sure, that sucks, but how is the alternative even an option?  The user
> asked to map something, we can't, if we allow that to happen now it's a
> bug.  Put the MSI doorbells somewhere that this won't be an issue.  If
> the platform has it fixed somewhere that this is an issue, don't use
> that platform.  The correctness of the interface is more important than
> catering to a poorly designed system layout IMO.  Thanks,

Besides above problematic, I started to prototype the sysfs API. A first
issue I face is the reserved regions become global to the iommu instead
of characterizing the iommu_domain, ie. the "reserved_regions" attribute
file sits below an iommu instance (~
/sys/class/iommu/dmar0/intel-iommu/reserved_regions ||
/sys/class/iommu/arm-smmu0/arm-smmu/reserved_regions).

MSI reserved window can be considered global to the IOMMU. However PCIe
host bridge P2P regions rather are per iommu-domain.

Do you confirm the attribute file should contain both global reserved
regions and all per iommu_domain reserved regions?

Thoughts?

Thanks

Eric
> 
> Alex
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain

2016-11-10 Thread Auger Eric
Hi Robin,

On 10/11/2016 12:54, Robin Murphy wrote:
> Hi Eric,
> 
> On 10/11/16 11:22, Auger Eric wrote:
>> Hi Robin,
>>
>> On 04/11/2016 15:00, Robin Murphy wrote:
>>> Hi Eric,
>>>
>>> Thanks for posting this new series - the bottom-up approach is a lot
>>> easier to reason about :)
>>>
>>> On 04/11/16 11:24, Eric Auger wrote:
>>>> Introduce a new iommu_reserved_region struct. This embodies
>>>> an IOVA reserved region that cannot be used along with the IOMMU
>>>> API. The list is protected by a dedicated mutex.
>>>
>>> In the light of these patches, I think I'm settling into agreement that
>>> the iommu_domain is the sweet spot for accessing this information - the
>>> underlying magic address ranges might be properties of various bits of
>>> hardware many of which aren't the IOMMU itself, but they only start to
>>> matter at the point you start wanting to use an IOMMU domain at the
>>> higher level. Therefore, having a callback in the domain ops to pull
>>> everything together fits rather neatly.
>> Using get_dm_regions could have make sense but this approach now is
>> ruled out by sysfs API approach. If attribute file is bound to be used
>> before iommu domains are created, we cannot rely on any iommu_domain
>> based callback. Back to square 1?
> 
> I think it's still OK. The thing about these reserved regions is that as
> a property of the underlying hardware they must be common to any domain
> for a given group, therefore without loss of generality we can simply
> query group->domain->ops->get_dm_regions(), and can expect the reserved
> ones will be the same regardless of what domain that points to
> (identity-mapped IVMD/RMRR/etc.
Are they really? P2P reserved regions depend on iommu_domain right?

Now I did not consider default_domain usability, I acknowledge. I will
send a POC anyway.

 regions may not be, but we'd be
> filtering those out anyway). The default DMA domains need this
> information too, and since those are allocated at group creation,
> group->domain should always be non-NULL and interrogable.
> 
> Plus, the groups are already there in sysfs, and, being representative
> of device topology, would seem to be an ideal place to expose the
> addressing limitations relevant to the devices within them. This really
> feels like it's all falling into place (on the kernel end, at least, I'm
> sticking to the sidelines on the userspace discussion ;)).

Thanks

Eric
> 
> Robin.
> 
>>
>> Thanks
>>
>> Eric
>>>
>>>>
>>>> An iommu domain now owns a list of those.
>>>>
>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>>
>>>> ---
>>>> ---
>>>>  drivers/iommu/iommu.c |  2 ++
>>>>  include/linux/iommu.h | 17 +
>>>>  2 files changed, 19 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index 9a2f196..0af07492 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -1061,6 +1061,8 @@ static struct iommu_domain 
>>>> *__iommu_domain_alloc(struct bus_type *bus,
>>>>  
>>>>domain->ops  = bus->iommu_ops;
>>>>domain->type = type;
>>>> +  INIT_LIST_HEAD(>reserved_regions);
>>>> +  mutex_init(>resv_mutex);
>>>>/* Assume all sizes by default; the driver may override this later */
>>>>domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>>>>  
>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>>> index 436dc21..0f2eb64 100644
>>>> --- a/include/linux/iommu.h
>>>> +++ b/include/linux/iommu.h
>>>> @@ -84,6 +84,8 @@ struct iommu_domain {
>>>>void *handler_token;
>>>>struct iommu_domain_geometry geometry;
>>>>void *iova_cookie;
>>>> +  struct list_head reserved_regions;
>>>> +  struct mutex resv_mutex; /* protects the reserved region list */
>>>>  };
>>>>  
>>>>  enum iommu_cap {
>>>> @@ -131,6 +133,21 @@ struct iommu_dm_region {
>>>>int prot;
>>>>  };
>>>>  
>>>> +/**
>>>> + * struct iommu_reserved_region - descriptor for a reserved iova region
>>>> + * @list: Linked list pointers
>>>> + * @start: IOVA base address of the region
>>>> + * @length: Length of the region in 

Re: [RFC v2 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain

2016-11-10 Thread Auger Eric
Hi Robin,

On 04/11/2016 15:00, Robin Murphy wrote:
> Hi Eric,
> 
> Thanks for posting this new series - the bottom-up approach is a lot
> easier to reason about :)
> 
> On 04/11/16 11:24, Eric Auger wrote:
>> Introduce a new iommu_reserved_region struct. This embodies
>> an IOVA reserved region that cannot be used along with the IOMMU
>> API. The list is protected by a dedicated mutex.
> 
> In the light of these patches, I think I'm settling into agreement that
> the iommu_domain is the sweet spot for accessing this information - the
> underlying magic address ranges might be properties of various bits of
> hardware many of which aren't the IOMMU itself, but they only start to
> matter at the point you start wanting to use an IOMMU domain at the
> higher level. Therefore, having a callback in the domain ops to pull
> everything together fits rather neatly.
Using get_dm_regions could have make sense but this approach now is
ruled out by sysfs API approach. If attribute file is bound to be used
before iommu domains are created, we cannot rely on any iommu_domain
based callback. Back to square 1?

Thanks

Eric
> 
>>
>> An iommu domain now owns a list of those.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> ---
>>  drivers/iommu/iommu.c |  2 ++
>>  include/linux/iommu.h | 17 +
>>  2 files changed, 19 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 9a2f196..0af07492 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1061,6 +1061,8 @@ static struct iommu_domain 
>> *__iommu_domain_alloc(struct bus_type *bus,
>>  
>>  domain->ops  = bus->iommu_ops;
>>  domain->type = type;
>> +INIT_LIST_HEAD(>reserved_regions);
>> +mutex_init(>resv_mutex);
>>  /* Assume all sizes by default; the driver may override this later */
>>  domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>>  
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 436dc21..0f2eb64 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -84,6 +84,8 @@ struct iommu_domain {
>>  void *handler_token;
>>  struct iommu_domain_geometry geometry;
>>  void *iova_cookie;
>> +struct list_head reserved_regions;
>> +struct mutex resv_mutex; /* protects the reserved region list */
>>  };
>>  
>>  enum iommu_cap {
>> @@ -131,6 +133,21 @@ struct iommu_dm_region {
>>  int prot;
>>  };
>>  
>> +/**
>> + * struct iommu_reserved_region - descriptor for a reserved iova region
>> + * @list: Linked list pointers
>> + * @start: IOVA base address of the region
>> + * @length: Length of the region in bytes
>> + */
>> +struct iommu_reserved_region {
>> +struct list_headlist;
>> +dma_addr_t  start;
>> +size_t  length;
>> +};
> 
> Looking at this in context with the dm_region above, though, I come to
> the surprising realisation that these *are* dm_regions, even at the
> fundamental level - on the one hand you've got physical addresses which
> can't be remapped (because something is already using them), while on
> the other you've got physical addresses which can't be remapped (because
> the IOMMU is incapable). In fact for reserved regions *other* than our
> faked-up MSI region there's no harm if the IOMMU were to actually
> identity-map them.
> 
> Let's just add this to the existing infrastructure, either with some
> kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
> gets shared between the VFIO and DMA cases for free!
> 
> Robin.
> 
>> +
>> +#define iommu_reserved_region_for_each(resv, d) \
>> +list_for_each_entry(resv, &(d)->reserved_regions, list)
>> +
>>  #ifdef CONFIG_IOMMU_API
>>  
>>  /**
>>
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Summary of LPC guest MSI discussion in Santa Fe

2016-11-10 Thread Auger Eric
Hi Will, Alex,

On 10/11/2016 03:01, Will Deacon wrote:
> On Wed, Nov 09, 2016 at 05:55:17PM -0700, Alex Williamson wrote:
>> On Thu, 10 Nov 2016 01:14:42 +0100
>> Auger Eric <eric.au...@redhat.com> wrote:
>>> On 10/11/2016 00:59, Alex Williamson wrote:
>>>> On Wed, 9 Nov 2016 23:38:50 +
>>>> Will Deacon <will.dea...@arm.com> wrote:
>>>>> On Wed, Nov 09, 2016 at 04:24:58PM -0700, Alex Williamson wrote:  
>>>>>> The VFIO_IOMMU_MAP_DMA ioctl is a contract, the user ask to map a range
>>>>>> of IOVAs to a range of virtual addresses for a given device.  If VFIO
>>>>>> cannot reasonably fulfill that contract, it must fail.  It's up to QEMU
>>>>>> how to manage the hotplug and what memory regions it asks VFIO to map
>>>>>> for a device, but VFIO must reject mappings that it (or the SMMU by
>>>>>> virtue of using the IOMMU API) know to overlap reserved ranges.  So I
>>>>>> still disagree with the referenced statement.  Thanks,
>>>>>
>>>>> I think that's a pity. Not only does it mean that both QEMU and the kernel
>>>>> have more work to do (the former has to carve up its mapping requests,
>>>>> whilst the latter has to check that it is indeed doing this), but it also
>>>>> precludes the use of hugepage mappings on the IOMMU because of reserved
>>>>> regions. For example, a 4k hole someplace may mean we can't put down 1GB
>>>>> table entries for the guest memory in the SMMU.
>>>>>
>>>>> All this seems to do is add complexity and decrease performance. For what?
>>>>> QEMU has to go read the reserved regions from someplace anyway. It's also
>>>>> the way that VFIO works *today* on arm64 wrt reserved regions, it just has
>>>>> no way to identify those holes at present.  
>>>>
>>>> Sure, that sucks, but how is the alternative even an option?  The user
>>>> asked to map something, we can't, if we allow that to happen now it's a
>>>> bug.  Put the MSI doorbells somewhere that this won't be an issue.  If
>>>> the platform has it fixed somewhere that this is an issue, don't use
>>>> that platform.  The correctness of the interface is more important than
>>>> catering to a poorly designed system layout IMO.  Thanks,  
>>>
>>> Besides above problematic, I started to prototype the sysfs API. A first
>>> issue I face is the reserved regions become global to the iommu instead
>>> of characterizing the iommu_domain, ie. the "reserved_regions" attribute
>>> file sits below an iommu instance (~
>>> /sys/class/iommu/dmar0/intel-iommu/reserved_regions ||
>>> /sys/class/iommu/arm-smmu0/arm-smmu/reserved_regions).
>>>
>>> MSI reserved window can be considered global to the IOMMU. However PCIe
>>> host bridge P2P regions rather are per iommu-domain.
> 
> I don't think we can treat them as per-domain, given that we want to
> enumerate this stuff before we've decided to do a hotplug (and therefore
> don't have a domain).
That's the issue indeed. We need to wait for the PCIe device to be
connected to the iommu. Only on the VFIO SET_IOMMU we get the
comprehensive list of P2P regions that can impact IOVA mapping for this
iommu. This removes any advantage of sysfs API over previous VFIO
capability chain API for P2P IOVA range enumeration at early stage.

> 
>>>
>>> Do you confirm the attribute file should contain both global reserved
>>> regions and all per iommu_domain reserved regions?
>>>
>>> Thoughts?
>>
>> I don't think we have any business describing IOVA addresses consumed
>> by peer devices in an IOMMU sysfs file.  If it's a separate device it
>> should be exposed by examining the rest of the topology.  Regions
>> consumed by PCI endpoints and interconnects are already exposed in
>> sysfs.  In fact, is this perhaps a more accurate model for these MSI
>> controllers too?  Perhaps they should be exposed in the bus topology
>> somewhere as consuming the IOVA range.
Currently on x86 the P2P regions are not checked when allowing
passthrough. Aren't we more papist that the pope? As Don mentioned,
shouldn't we simply consider that a platform that does not support
proper ACS is not candidate for safe passthrough, like Juno?

At least we can state the feature also is missing on x86 and it would be
nice to report the risk to the userspace and urge him to opt-in.

To me taking into account those P2P still is controversial and induce
the bulk of the complexity. Consider

Re: [RFC v2 2/8] iommu/iova: fix __alloc_and_insert_iova_range

2016-11-10 Thread Auger Eric
Hi Joerg,

On 10/11/2016 16:22, Joerg Roedel wrote:
> On Fri, Nov 04, 2016 at 11:24:00AM +, Eric Auger wrote:
>> Fix the size check within start_pfn and limit_pfn.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> the issue was observed when playing with 1 page iova domain with
>> higher iova reserved.
>> ---
>>  drivers/iommu/iova.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index e23001b..ee29dbf 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -147,7 +147,7 @@ static int __alloc_and_insert_iova_range(struct 
>> iova_domain *iovad,
>>  if (!curr) {
>>  if (size_aligned)
>>  pad_size = iova_get_pad_size(size, limit_pfn);
>> -if ((iovad->start_pfn + size + pad_size) > limit_pfn) {
>> +if ((iovad->start_pfn + size + pad_size - 1) > limit_pfn) {
> 
> A >= check is more readable here.

Sure

Thanks

Eric
> 
> 
>   Joerg
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/8] iommu: Add a list of iommu_reserved_region in iommu_domain

2016-11-10 Thread Auger Eric
Hi Joerg, Robin,

On 10/11/2016 16:37, Joerg Roedel wrote:
> On Fri, Nov 04, 2016 at 11:24:02AM +, Eric Auger wrote:
>> Introduce a new iommu_reserved_region struct. This embodies
>> an IOVA reserved region that cannot be used along with the IOMMU
>> API. The list is protected by a dedicated mutex.
>>
>> An iommu domain now owns a list of those.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> ---
>>  drivers/iommu/iommu.c |  2 ++
>>  include/linux/iommu.h | 17 +
>>  2 files changed, 19 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 9a2f196..0af07492 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1061,6 +1061,8 @@ static struct iommu_domain 
>> *__iommu_domain_alloc(struct bus_type *bus,
>>  
>>  domain->ops  = bus->iommu_ops;
>>  domain->type = type;
>> +INIT_LIST_HEAD(>reserved_regions);
>> +mutex_init(>resv_mutex);
> 
> These regions are a property of the iommu-group, they are specific to a
> device or a group of devices, not to a particular domain where devics
> (iommu-groups) can come and go.
> 
> Further I agree with Robin that this is similar to the
> get_dm_regions/set_dm_regions approach, which should be changed/extended
> for this instead of adding something new.
OK I am currently respinning, taking this into account. I will put the
reserved region file attribute in the iommu-group sysfs dir.

Thanks

Eric
> 
> 
>   Joerg
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-10 Thread Auger Eric
Hi Joerg,

On 10/11/2016 16:46, Joerg Roedel wrote:
> On Fri, Nov 04, 2016 at 11:24:06AM +, Eric Auger wrote:
>> The function populates the list of reserved regions with the
>> PCI host bridge windows and the MSI IOVA range.
>>
>> At the moment an arbitray MSI IOVA window is set at 0x800
>> of size 1MB.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> RFC v1 -> v2: use defines for MSI IOVA base and length
>> ---
>>  drivers/iommu/arm-smmu.c | 66 
>> 
>>  1 file changed, 66 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c841eb7..c07ea41 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
>>  
>>  #define FSYNR0_WNR  (1 << 4)
>>  
>> +#define MSI_IOVA_BASE   0x800
>> +#define MSI_IOVA_LENGTH 0x10
>> +
>>  static int force_stage;
>>  module_param(force_stage, int, S_IRUGO);
>>  MODULE_PARM_DESC(force_stage,
>> @@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, 
>> struct of_phandle_args *args)
>>  return iommu_fwspec_add_ids(dev, , 1);
>>  }
>>  
>> +static int add_pci_window_reserved_regions(struct iommu_domain *domain,
>> +   struct pci_dev *dev)
>> +{
>> +struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
>> +struct iommu_reserved_region *region;
>> +struct resource_entry *window;
>> +phys_addr_t start;
>> +size_t length;
>> +
>> +resource_list_for_each_entry(window, >windows) {
>> +if (resource_type(window->res) != IORESOURCE_MEM &&
>> +resource_type(window->res) != IORESOURCE_IO)
>> +continue;
> 
> Why do you care about IO resources?
Effectively that's a draft implementation inspired from "iommu/dma:
Avoid PCI host bridge windows". Also not all PCI host bridge windows
induce issues; my understanding is only those not supporting ACS are a
problem.
> 
>> +
>> +start = window->res->start - window->offset;
>> +length = window->res->end - window->res->start + 1;
>> +
>> +iommu_reserved_region_for_each(region, domain) {
>> +if (region->start == start && region->length == length)
>> +continue;
>> +}
>> +region = kzalloc(sizeof(*region), GFP_KERNEL);
>> +if (!region)
>> +return -ENOMEM;
>> +
>> +region->start = start;
>> +region->length = length;
>> +
>> +list_add_tail(>list, >reserved_regions);
>> +}
>> +return 0;
>> +}
>> +
>> +static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
>> + struct device *device)
>> +{
>> +struct iommu_reserved_region *region;
>> +int ret = 0;
>> +
>> +/* An arbitrary 1MB region starting at 0x800 is reserved for MSIs */
>> +if (!domain->iova_cookie) {
>> +
>> +region = kzalloc(sizeof(*region), GFP_KERNEL);
>> +if (!region)
>> +return -ENOMEM;
>> +
>> +region->start = MSI_IOVA_BASE;
>> +region->length = MSI_IOVA_LENGTH;
>> +list_add_tail(>list, >reserved_regions);
>> +
>> +ret = iommu_get_dma_msi_region_cookie(domain,
>> +region->start, region->length);
>> +if (ret)
>> +return ret;
> 
> Gah, I hate this. Is there no other and simpler way to get the MSI
> region than allocating an iova-domain? In that regard, I also _hate_ the
> patch before introducing this weird iommu_get_dma_msi_region_cookie()
> function.
> 
> Allocation an iova-domain is pretty expensive, as it also includes
> per-cpu data-structures and all, so please don't do this just for the
> purpose of compiling a list of reserved regions.
It does not only serve the purpose to register the MSI IOVA region. We
also need to allocate an iova_domain where MSI IOVAs will be allocated
upon the request of the relevant MSI controllers. Do you mean you don't
like to use the iova allocator for this purpose?

Thanks

Eric
> 
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-11 Thread Auger Eric
Hi Joerg,

On 11/11/2016 12:42, Joerg Roedel wrote:
> On Thu, Nov 10, 2016 at 07:00:52PM +0100, Auger Eric wrote:
>> GICv2m and GICV3 ITS use dma-mapping iommu_dma_map_msi_msg to allocate
>> an MSI IOVA on-demand.
> 
> Yes, and it the right thing to do there because as a DMA-API
> implementation the dma-iommu code cares about the address space
> allocation.
> 
> As I understand it this is different in your case, as someone else is
> defining the address space layout. So why do you need to allocate it
> yourself?
Effectively in passthrough use case, the userspace defines the address
space layout and maps guest RAM PA=IOVA to PAs (using
VFIO_IOMMU_MAP_DMA). But this address space does not comprise the MSI
IOVAs. Userspace does not care about MSI IOMMU mapping. So the MSI IOVA
region must be allocated by either the VFIO driver or the IOMMU driver I
think. Who else could initialize the IOVA allocator domain?

That's true that we have a mix of unmanaged addresses and "managed"
addresses which is not neat. But how to manage otherwise?

Thanks

Eric
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-14 Thread Auger Eric
Hi Joerg,

On 14/11/2016 16:31, Joerg Roedel wrote:
> Hi Eric,
> 
> On Fri, Nov 11, 2016 at 05:45:19PM +0100, Auger Eric wrote:
>> On 11/11/2016 17:22, Joerg Roedel wrote:
>>> So I think we need a way to tell userspace about the reserved regions
>>> (per iommu-group) so that userspace knows where it can not map anything,
> 
>> Current plan is to expose that info through an iommu-group sysfs
>> attribute, as you and Robin advised.
> 
> Great.
> 
>>> and VFIO can enforce that. But the right struct here is not an
>>> iova-allocator rb-tree, a ordered linked list should be sufficient.
>> I plan a linked list to store the reserved regions (P2P regions, MSI
>> region, ...). get_dma_regions is called with a list local to a function
>> for that. Might be needed to move that list head in the iommu_group to
>> avoid calling the get_dm_regions again in the attribute show function?
> 
> You can re-use the get_dm_regions() call-back available in the iommu-ops
> already. Just rename it and add a flag to it which tells the iommu-core
> whether that region needs to be mapped or not.
> 
>> But to allocate the IOVAs within the MSI reserved region, I understand
>> you don't want us to use the iova.c allocator, is that correct? We need
>> an allocator though, even a very basic one based on bitmap or whatever.
>> There potentially have several different physical MSI frame pages to map.
> 
> I don't get this, what do you need and address-allocator for?

There are potentially several MSI doorbell physical pages in the SOC
that are accessed through the IOMMU (translated). Each of those must
have a corresponding IOVA and IOVA/PA mapping programmed in the IOMMU.
Else MSI will fault.

- step 1 was to define a usable IOVA range for MSI mapping. So now we
decided the base address and size would be hardcoded for ARM. The
get_dm_region can be used to retrieve that hardcoded region.
- Step2 is to allocate IOVAs within that range and map then for each of
those MSI doorbells. This is done in the MSI controller compose() callback.

I hope I succeeded in clarifying this time.

Robin sent today a new version of its cookie think using a dummy
allocator. I am currently integrating it.

Thanks

Eric
> 
> 
> 
>   Joerg
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 8/8] iommu/arm-smmu: implement add_reserved_regions callback

2016-11-14 Thread Auger Eric
Hi Joerg,

On 14/11/2016 17:20, Joerg Roedel wrote:
> On Mon, Nov 14, 2016 at 05:08:16PM +0100, Auger Eric wrote:
>> There are potentially several MSI doorbell physical pages in the SOC
>> that are accessed through the IOMMU (translated). Each of those must
>> have a corresponding IOVA and IOVA/PA mapping programmed in the IOMMU.
>> Else MSI will fault.
>>
>> - step 1 was to define a usable IOVA range for MSI mapping. So now we
>> decided the base address and size would be hardcoded for ARM. The
>> get_dm_region can be used to retrieve that hardcoded region.
>> - Step2 is to allocate IOVAs within that range and map then for each of
>> those MSI doorbells. This is done in the MSI controller compose() callback.
>>
>> I hope I succeeded in clarifying this time.
>>
>> Robin sent today a new version of its cookie think using a dummy
>> allocator. I am currently integrating it.
> 
> Okay, I understand. A simple bitmap-allocator would probably be
> sufficient, and doesn't have the overhead the iova allocator has. About
> how many pages are we talking here?
Very few actually. In the systems I have access to I only have a single
page. In most advanced systems we could imagine per-cpu doorbells but
this does not exist yet as far as I know.

Thanks

Eric
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-11-18 Thread Auger Eric
Hi Bharat,

On 18/11/2016 06:34, Bharat Bhushan wrote:
> Hi Eric,
> 
> Have you sent out QEMU side patches based on this new approach? In case I 
> missed please point me the patches? 
Upstream QEMU works fine for PCIe/MSI passthrough on ARM since mach virt
address space does not collide with fixed MSI region.

Thanks

Eric
> 
> Thanks
> -Bharat
> 
>> -Original Message-
>> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
>> boun...@lists.linux-foundation.org] On Behalf Of Eric Auger
>> Sent: Tuesday, November 15, 2016 6:39 PM
>> To: eric.au...@redhat.com; eric.auger@gmail.com;
>> christoffer.d...@linaro.org; marc.zyng...@arm.com;
>> robin.mur...@arm.com; alex.william...@redhat.com;
>> will.dea...@arm.com; j...@8bytes.org; t...@linutronix.de;
>> ja...@lakedaemon.net; linux-arm-ker...@lists.infradead.org
>> Cc: drjo...@redhat.com; k...@vger.kernel.org; punit.agra...@arm.com;
>> linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org;
>> pranav.sawargaon...@gmail.com
>> Subject: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and
>> IOVA reserved regions
>>
>> Following LPC discussions, we now report reserved regions through iommu-
>> group sysfs reserved_regions attribute file.
>>
>> Reserved regions are populated through the IOMMU get_resv_region
>> callback (former get_dm_regions), now implemented by amd-iommu, intel-
>> iommu and arm-smmu.
>>
>> The intel-iommu reports the [FEE0_h - FEF0_000h] MSI window as an
>> IOMMU_RESV_NOMAP reserved region.
>>
>> arm-smmu reports the MSI window (arbitrarily located at 0x800 and 1MB
>> large) and the PCI host bridge windows.
>>
>> The series integrates a not officially posted patch from Robin:
>> "iommu/dma: Allow MSI-only cookies".
>>
>> This series currently does not address IRQ safety assessment.
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
>>
>> History:
>> RFC v2 -> v3:
>> - switch to an iommu-group sysfs API
>> - use new dummy allocator provided by Robin
>> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>>   the reserved regions
>> - at the moment ARM MSI base address/size is left unchanged compared
>>   to v2
>> - we currently report reserved regions and not usable IOVA regions as
>>   requested by Alex
>>
>> RFC v1 -> v2:
>> - fix intel_add_reserved_regions
>> - add mutex lock/unlock in vfio_iommu_type1
>>
>>
>> Eric Auger (10):
>>   iommu/dma: Allow MSI-only cookies
>>   iommu: Rename iommu_dm_regions into iommu_resv_regions
>>   iommu: Add new reserved IOMMU attributes
>>   iommu: iommu_alloc_resv_region
>>   iommu: Do not map reserved regions
>>   iommu: iommu_get_group_resv_regions
>>   iommu: Implement reserved_regions iommu-group sysfs file
>>   iommu/vt-d: Implement reserved region get/put callbacks
>>   iommu/arm-smmu: Implement reserved region get/put callbacks
>>   vfio/type1: Get MSI cookie
>>
>>  drivers/iommu/amd_iommu.c   |  20 +++---
>>  drivers/iommu/arm-smmu.c|  52 +++
>>  drivers/iommu/dma-iommu.c   | 116 ++
>> ---
>>  drivers/iommu/intel-iommu.c |  50 ++
>>  drivers/iommu/iommu.c   | 141
>> 
>>  drivers/vfio/vfio_iommu_type1.c |  26 
>>  include/linux/dma-iommu.h   |   7 ++
>>  include/linux/iommu.h   |  49 ++
>>  8 files changed, 391 insertions(+), 70 deletions(-)
>>
>> --
>> 1.9.1
>>
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 3/8] iommu/dma: Allow MSI-only cookies

2016-11-14 Thread Auger Eric
Hi Robin,

On 14/11/2016 13:36, Robin Murphy wrote:
> On 04/11/16 11:24, Eric Auger wrote:
>> From: Robin Murphy 
>>
>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>> with regard to mapping MSI messages in systems where the MSI write is
>> subject to IOMMU translation. With the relevant infrastructure now in
>> place for managed DMA domains, it's actually really simple for other
>> users to piggyback off that and reap the benefits without giving up
>> their own IOVA management, and without having to reinvent their own
>> wheel in the MSI layer.
>>
>> Allow such users to opt into automatic MSI remapping by dedicating a
>> region of their IOVA space to a managed cookie.
>>
>> Signed-off-by: Robin Murphy 
>> Signed-off-by: Eric Auger 
> 
> OK, following the discussion elsewhere I've had a go at the less stupid,
> but more involved, version. Thoughts?

Conceptually I don't have any major objection with the minimalist
allocation scheme all the more so it follows Joerg's guidance. Maybe the
only thing is we do not check we don't overshoot the reserved msi-region.

Besides there are 2 issues reported below.

> 
> Robin.
> 
> ->8-
> From: Robin Murphy 
> Subject: [RFC PATCH] iommu/dma: Allow MSI-only cookies
> 
> IOMMU domain users such as VFIO face a similar problem to DMA API ops
> with regard to mapping MSI messages in systems where the MSI write is
> subject to IOMMU translation. With the relevant infrastructure now in
> place for managed DMA domains, it's actually really simple for other
> users to piggyback off that and reap the benefits without giving up
> their own IOVA management, and without having to reinvent their own
> wheel in the MSI layer.
> 
> Allow such users to opt into automatic MSI remapping by dedicating a
> region of their IOVA space to a managed cookie, and extend the mapping
> routine to implement a trivial linear allocator in such cases, to avoid
> the needless overhead of a full-blown IOVA domain.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/dma-iommu.c | 118 
> --
>  include/linux/dma-iommu.h |   6 +++
>  2 files changed, 100 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c5ab8667e6f2..33d66a8273c6 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -37,10 +37,19 @@ struct iommu_dma_msi_page {
>   phys_addr_t phys;
>  };
>  
> +enum iommu_dma_cookie_type {
> + IOMMU_DMA_IOVA_COOKIE,
> + IOMMU_DMA_MSI_COOKIE,
> +};
> +
>  struct iommu_dma_cookie {
> - struct iova_domain  iovad;
> - struct list_headmsi_page_list;
> - spinlock_t  msi_lock;
> + union {
> + struct iova_domain  iovad;
> + dma_addr_t  msi_iova;
> + };
> + struct list_headmsi_page_list;
> + spinlock_t  msi_lock;
> + enum iommu_dma_cookie_type  type;
>  };
>  
>  static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
> @@ -53,6 +62,19 @@ int iommu_dma_init(void)
>   return iova_cache_get();
>  }
>  
> +static struct iommu_dma_cookie *__cookie_alloc(enum iommu_dma_cookie_type 
> type)
> +{
> + struct iommu_dma_cookie *cookie;
> +
> + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
> + if (cookie) {
> + spin_lock_init(>msi_lock);
> + INIT_LIST_HEAD(>msi_page_list);
> + cookie->type = type;
> + }
> + return cookie;
> +}
> +
>  /**
>   * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
>   * @domain: IOMMU domain to prepare for DMA-API usage
> @@ -62,25 +84,53 @@ int iommu_dma_init(void)
>   */
>  int iommu_get_dma_cookie(struct iommu_domain *domain)
>  {
> - struct iommu_dma_cookie *cookie;
> -
>   if (domain->iova_cookie)
>   return -EEXIST;
>  
> - cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
> - if (!cookie)
> + domain->iova_cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
> + if (!domain->iova_cookie)
>   return -ENOMEM;
>  
> - spin_lock_init(>msi_lock);
> - INIT_LIST_HEAD(>msi_page_list);
> - domain->iova_cookie = cookie;
>   return 0;
>  }
>  EXPORT_SYMBOL(iommu_get_dma_cookie);
>  
>  /**
> + * iommu_get_msi_cookie - Acquire just MSI remapping resources
> + * @domain: IOMMU domain to prepare
> + * @base: Start address of IOVA region for MSI mappings
> + *
> + * Users who manage their own IOVA allocation and do not want DMA API 
> support,
> + * but would still like to take advantage of automatic MSI remapping, can use
> + * this to initialise their own domain appropriately. Users should reserve a
> + * contiguous IOVA region, starting at @base, large enough to accommodate the
> + * number of PAGE_SIZE mappings necessary to 

Re: [PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64

2016-10-21 Thread Auger Eric
Hi Will,

On 20/10/2016 19:32, Will Deacon wrote:
> Hi Eric,
> 
> Thanks for posting this.
> 
> On Wed, Oct 12, 2016 at 01:22:08PM +, Eric Auger wrote:
>> This is the second respin on top of Robin's series [1], addressing Alex' 
>> comments.
>>
>> Major changes are:
>> - MSI-doorbell API now is moved to DMA IOMMU API following Alex suggestion
>>   to put all API pieces at the same place (so eventually in the IOMMU
>>   subsystem)
>> - new iommu_domain_msi_resv struct and accessor through DOMAIN_ATTR_MSI_RESV
>>   domain with mirror VFIO capability
>> - more robustness I think in the VFIO layer
>> - added "iommu/iova: fix __alloc_and_insert_iova_range" since with the 
>> current
>>   code I failed allocating an IOVA page in a single page domain with upper 
>> part
>>   reserved
>>
>> IOVA range exclusion will be handled in a separate series
>>
>> The priority really is to discuss and freeze the API and especially the MSI
>> doorbell's handling. Do we agree to put that in DMA IOMMU?
>>
>> Note: the size computation does not take into account possible page overlaps
>> between doorbells but it would add quite a lot of complexity i think.
>>
>> Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.
> 
> Marc, Robin and I sat down and had a look at the series and, whilst it's
> certainly addressing a problem that we desperately want to see fixed, we
> think that it's slightly over-engineering in places and could probably
> be simplified in the interest of getting something upstream that can be
> used as a base, on which the ABI can be extended as concrete use-cases
> become clear.
> 
> Stepping back a minute, we're trying to reserve some of the VFIO virtual
> address space so that it can be used by devices to map their MSI doorbells
> using the SMMU. With your patches, this requires that (a) the kernel
> tells userspace about the size and alignment of the doorbell region
> (MSI_RESV) and (b) userspace tells the kernel the VA-range that can be
> used (RESERVED_MSI_IOVA).
> 
> However, this is all special-cased for MSI doorbells and there are
> potentially other regions of the VFIO address space that are reserved
> and need to be communicated to userspace as well. We already know of
> hardware where the PCI RC intercepts p2p accesses before they make it
> to the SMMU, and other hardware where the MSI doorbell is at a fixed
> address. This means that we need a mechanism to communicate *fixed*
> regions of virtual address space that are reserved by VFIO. I don't
> even particularly care if VFIO_MAP_DMA enforces that, but we do need
> a way to tell userspace "hey, you don't want to put memory here because
> it won't work well with devices".

I think we all agree on this. Exposing an API to the user space
reporting *fixed* reserved IOVA ranges is a requirement anyway. The
problem was quite clearly stated by Alex in
http://lkml.iu.edu/hypermail/linux/kernel/1610.0/03308.html
(VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE)

I started working on this VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
capability but to me and I think according to Alex, it was a different
API from MSI_RESV.

> 
> In that case, we end up with something like your MSI_RESV capability,
> but actually specifying a virtual address range that is simply not to
> be used by MAP_DMA -- we don't say anything about MSIs. Now, taking this
> to its logical conclusion, we no longer need to distinguish between
> remappable reserved regions and fixed reserved regions in the ABI.
> Instead, we can have the kernel allocate the virtual address space for
> the remappable reserved regions (probably somewhere in the bottom 4GB)
> and expose them via the capability.


If I understand correctly you want the host to arbitrarily choose where
it puts the iovas reserved for MSI and not ask the userspace.

Well so we are back to the discussions we had in Dec 2015 (see Marc's
answer in http://thread.gmane.org/gmane.comp.emulators.kvm.arm.devel/3858).

- So I guess you will init an iova_domain seomewhere below the 4GB to
allocate the MSIs. what size are you going to choose. Don't you have the
same need to dimension the iova range.
- we still need to assess the MSI assignment safety. How will we compute
safety for VFIO?

 This simplifies things in the
> following ways:
> 
>   * You don't need to keep track of MSI vs DMA addresses in the VFIO rbtree
right: I guess you rely on iommu_map to return an error in case the iova
is already mapped somewhere else.
>   * You don't need to try collapsing doorbells into a single region
why? at host level I guess you will init a single iova domain?
>   * You don't need a special MAP flavour to map MSI doorbells
right
>   * The ABI is reusable for PCI p2p and fixed doorbells
right

Aren't we moving the issue at user-space? Currently QEMU mach-virt
address space is fully static. Adapting mach-virt to adjust to host
constraints is not straightforward. It is simple to reject the
assignment in case of collision but more difficult to react 

Re: [PATCH v14 14/16] vfio/type1: Check doorbell safety

2016-11-03 Thread Auger Eric
Hi Diana,

On 03/11/2016 14:45, Diana Madalina Craciun wrote:
> Hi Eric,
> 
> On 10/12/2016 04:23 PM, Eric Auger wrote:
>> On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>> by the msi controller.
>>
>> Since we currently have no way to detect whether the MSI controller is
>> upstream or downstream to the IOMMU we rely on the MSI doorbell information
>> registered by the interrupt controllers. In case at least one doorbell
>> does not implement proper isolation, we state the assignment is unsafe
>> with regard to interrupts. This is a coarse assessment but should allow to
>> wait for a better system description.
>>
>> At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>> removed in next patch.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v13 -> v15:
>> - check vfio_msi_resv before checking whether msi doorbell is safe
>>
>> v9 -> v10:
>> - coarse safety assessment based on MSI doorbell info
>>
>> v3 -> v4:
>> - rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>>   and irq_remapping into safe_irq_domains
>>
>> v2 -> v3:
>> - protect vfio_msi_parent_irq_remapping_capable with
>>   CONFIG_GENERIC_MSI_IRQ_DOMAIN
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 30 +-
>>  1 file changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index e0c97ef..c18ba9d 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -442,6 +442,29 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma)
>>  }
>>  
>>  /**
>> + * vfio_msi_resv - Return whether any VFIO iommu domain requires
>> + * MSI mapping
>> + *
>> + * @iommu: vfio iommu handle
>> + *
>> + * Return: true of MSI mapping is needed, false otherwise
>> + */
>> +static bool vfio_msi_resv(struct vfio_iommu *iommu)
>> +{
>> +struct iommu_domain_msi_resv msi_resv;
>> +struct vfio_domain *d;
>> +int ret;
>> +
>> +list_for_each_entry(d, >domain_list, next) {
>> +ret = iommu_domain_get_attr(d->domain, DOMAIN_ATTR_MSI_RESV,
>> +_resv);
>> +if (!ret)
>> +return true;
>> +}
>> +return false;
>> +}
>> +
>> +/**
>>   * vfio_set_msi_aperture - Sets the msi aperture on all domains
>>   * requesting MSI mapping
>>   *
>> @@ -945,8 +968,13 @@ static int vfio_iommu_type1_attach_group(void 
>> *iommu_data,
>>  INIT_LIST_HEAD(>group_list);
>>  list_add(>next, >group_list);
>>  
>> +/*
>> + * to advertise safe interrupts either the IOMMU or the MSI controllers
>> + * must support IRQ remapping (aka. interrupt translation)
>> + */
>>  if (!allow_unsafe_interrupts &&
>> -!iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> +(!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
>> +!(vfio_msi_resv(iommu) && iommu_msi_doorbell_safe( {
>>  pr_warn("%s: No interrupt remapping support.  Use the module 
>> param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this 
>> platform\n",
>> __func__);
>>  ret = -EPERM;
> 
> I understand from the other discussions that you will respin these
> series, but anyway I have tested this version with GICV3 + ITS and it
> stops here. As I have a GICv3 I am not supposed to enable allow unsafe
> interrupts. What I see is that vfio_msi_resv returns false just because
> the iommu->domain_list list is empty. The newly created domain is
> actually added to the domain_list at the end of this function, so it
> seems normal for the list to be empty at this point.

Thanks for reporting the issue. You are fully right. I must have missed
that test. I should just check the current iommu_domain attribute I think.

waiting for a fix, please probe the vfio_iommu_type1 module with
allow_unsafe_interrupts=1

Thanks

Eric
> 
> Thanks,
> 
> Diana
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 03/15] iommu/dma: Allow MSI-only cookies

2016-10-10 Thread Auger Eric
Hi Robin,

On 10/10/2016 16:26, Robin Murphy wrote:
> Hi Alex, Eric,
> 
> On 06/10/16 21:17, Alex Williamson wrote:
>> On Thu,  6 Oct 2016 08:45:19 +
>> Eric Auger  wrote:
>>
>>> From: Robin Murphy 
>>>
>>> IOMMU domain users such as VFIO face a similar problem to DMA API ops
>>> with regard to mapping MSI messages in systems where the MSI write is
>>> subject to IOMMU translation. With the relevant infrastructure now in
>>> place for managed DMA domains, it's actually really simple for other
>>> users to piggyback off that and reap the benefits without giving up
>>> their own IOVA management, and without having to reinvent their own
>>> wheel in the MSI layer.
>>>
>>> Allow such users to opt into automatic MSI remapping by dedicating a
>>> region of their IOVA space to a managed cookie.
>>>
>>> Signed-off-by: Robin Murphy 
>>> Signed-off-by: Eric Auger 
>>>
>>> ---
>>>
>>> v1 -> v2:
>>> - compared to Robin's version
>>> - add NULL last param to iommu_dma_init_domain
>>> - set the msi_geometry aperture
>>> - I removed
>>>   if (base < U64_MAX - size)
>>>  reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
>>>   don't get why we would reserve something out of the scope of the iova 
>>> domain?
>>>   what do I miss?
>>> ---
>>>  drivers/iommu/dma-iommu.c | 40 
>>>  include/linux/dma-iommu.h |  9 +
>>>  2 files changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>>> index c5ab866..11da1a0 100644
>>> --- a/drivers/iommu/dma-iommu.c
>>> +++ b/drivers/iommu/dma-iommu.c
>>> @@ -716,3 +716,43 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg 
>>> *msg)
>>> msg->address_lo += lower_32_bits(msi_page->iova);
>>> }
>>>  }
>>> +
>>> +/**
>>> + * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping 
>>> only
>>
>> Should this perhaps be iommu_setup_dma_msi_region_cookie, or something
>> along those lines.  I'm not sure what we're get'ing.  Thanks,
> 
> What we're getting is private third-party resources for the iommu_domain
> given in the argument. It's a get/put rather than alloc/free model since
> we operate opaquely on the domain as a container, rather than on the
> actual resource in question (an IOVA allocator).
> 
> Since this particular use case is slightly different from the normal
> flow and has special initialisation requirements, it seemed a lot
> cleaner to simply combine that initialisation operation with the
> prerequisite "get" into a single call. Especially as it helps emphasise
> that this is not 'normal' DMA cookie usage.

I renamed iommu_get_dma_msi_region_cookie into
iommu_setup_dma_msi_region. Is it a problem for you?
> 
>>
>> Alex
>>
>>> + * @domain: IOMMU domain to prepare
>>> + * @base: Base address of IOVA region to use as the MSI remapping aperture
>>> + * @size: Size of the desired MSI aperture
>>> + *
>>> + * Users who manage their own IOVA allocation and do not want DMA API 
>>> support,
>>> + * but would still like to take advantage of automatic MSI remapping, can 
>>> use
>>> + * this to initialise their own domain appropriately.
>>> + */
>>> +int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
>>> +   dma_addr_t base, u64 size)
>>> +{
>>> +   struct iommu_dma_cookie *cookie;
>>> +   struct iova_domain *iovad;
>>> +   int ret;
>>> +
>>> +   if (domain->type == IOMMU_DOMAIN_DMA)
>>> +   return -EINVAL;
>>> +
>>> +   ret = iommu_get_dma_cookie(domain);
>>> +   if (ret)
>>> +   return ret;
>>> +
>>> +   ret = iommu_dma_init_domain(domain, base, size, NULL);
>>> +   if (ret) {
>>> +   iommu_put_dma_cookie(domain);
>>> +   return ret;
>>> +   }
> 
> It *is* necessary to explicitly reserve the upper part of the IOVA
> domain here - the aforementioned "special initialisation" - because
> dma_32bit_pfn is only an optimisation hint to prevent the allocator
> walking down from the very top of the the tree every time when devices
> with different DMA masks share a domain (I'm in two minds as to whether
> to tweak the way the iommu-dma code uses it in this respect, now that I
> fully understand things). The only actual upper limit to allocation is
> the DMA mask passed into each alloc_iova() call, so if we want to ensure
> IOVAs are really allocated within this specific region, we have to carve
> out everything above it.

thank you for the explanation. So I will restore the reserve then.

Thanks

Eric
> 
> Robin.
> 
>>> +
>>> +   domain->msi_geometry.aperture_start = base;
>>> +   domain->msi_geometry.aperture_end = base + size - 1;
>>> +
>>> +   cookie = domain->iova_cookie;
>>> +   iovad = >iovad;
>>> +
>>> +   return 0;
>>> +}
>>> +EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
>>> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
>>> index 32c5890..1c55413 100644
>>> --- 

Re: [PATCH v13 15/15] vfio/type1: Return the MSI geometry through VFIO_IOMMU_GET_INFO capability chains

2016-10-10 Thread Auger Eric
Hi Alex,
On 07/10/2016 22:38, Alex Williamson wrote:
> On Fri, 7 Oct 2016 19:10:27 +0200
> Auger Eric <eric.au...@redhat.com> wrote:
> 
>> Hi Alex,
>>
>> On 06/10/2016 22:42, Alex Williamson wrote:
>>> On Thu, 6 Oct 2016 14:20:40 -0600
>>> Alex Williamson <alex.william...@redhat.com> wrote:
>>>   
>>>> On Thu,  6 Oct 2016 08:45:31 +
>>>> Eric Auger <eric.au...@redhat.com> wrote:
>>>>  
>>>>> This patch allows the user-space to retrieve the MSI geometry. The
>>>>> implementation is based on capability chains, now also added to
>>>>> VFIO_IOMMU_GET_INFO.
>>>>>
>>>>> The returned info comprise:
>>>>> - whether the MSI IOVA are constrained to a reserved range (x86 case) and
>>>>>   in the positive, the start/end of the aperture,
>>>>> - or whether the IOVA aperture need to be set by the userspace. In that
>>>>>   case, the size and alignment of the IOVA window to be provided are
>>>>>   returned.
>>>>>
>>>>> In case the userspace must provide the IOVA aperture, we currently report
>>>>> a size/alignment based on all the doorbells registered by the host kernel.
>>>>> This may exceed the actual needs.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>>>
>>>>> ---
>>>>> v11 -> v11:
>>>>> - msi_doorbell_pages was renamed msi_doorbell_calc_pages
>>>>>
>>>>> v9 -> v10:
>>>>> - move cap_offset after iova_pgsizes
>>>>> - replace __u64 alignment by __u32 order
>>>>> - introduce __u32 flags in vfio_iommu_type1_info_cap_msi_geometry and
>>>>>   fix alignment
>>>>> - call msi-doorbell API to compute the size/alignment
>>>>>
>>>>> v8 -> v9:
>>>>> - use iommu_msi_supported flag instead of programmable
>>>>> - replace IOMMU_INFO_REQUIRE_MSI_MAP flag by a more sophisticated
>>>>>   capability chain, reporting the MSI geometry
>>>>>
>>>>> v7 -> v8:
>>>>> - use iommu_domain_msi_geometry
>>>>>
>>>>> v6 -> v7:
>>>>> - remove the computation of the number of IOVA pages to be provisionned.
>>>>>   This number depends on the domain/group/device topology which can
>>>>>   dynamically change. Let's rely instead rely on an arbitrary max 
>>>>> depending
>>>>>   on the system
>>>>>
>>>>> v4 -> v5:
>>>>> - move msi_info and ret declaration within the conditional code
>>>>>
>>>>> v3 -> v4:
>>>>> - replace former vfio_domains_require_msi_mapping by
>>>>>   more complex computation of MSI mapping requirements, especially the
>>>>>   number of pages to be provided by the user-space.
>>>>> - reword patch title
>>>>>
>>>>> RFC v1 -> v1:
>>>>> - derived from
>>>>>   [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state
>>>>> - renamed allow_msi_reconfig into require_msi_mapping
>>>>> - fixed VFIO_IOMMU_GET_INFO
>>>>> ---
>>>>>  drivers/vfio/vfio_iommu_type1.c | 78 
>>>>> -
>>>>>  include/uapi/linux/vfio.h   | 32 -
>>>>>  2 files changed, 108 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>> index dc3ee5d..ce5e7eb 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -38,6 +38,8 @@
>>>>>  #include 
>>>>>  #include 
>>>>>  #include 
>>>>> +#include 
>>>>> +#include 
>>>>>  
>>>>>  #define DRIVER_VERSION  "0.2"
>>>>>  #define DRIVER_AUTHOR   "Alex Williamson <alex.william...@redhat.com>"
>>>>> @@ -1101,6 +1103,55 @@ static int vfio_domains_have_iommu_cache(struct 
>>>>> vfio_iommu *iommu)
>>>>>   return ret;
>>>>>  }
>>>>>  
>>>>> +static int compute_msi_geometry_caps(struct vfio_iommu *iommu,
>>>>> +  

Re: [PATCH v14 04/16] iommu/dma: MSI doorbell alloc/free

2016-10-17 Thread Auger Eric
Hi Punit,

On 14/10/2016 13:25, Punit Agrawal wrote:
> Hi Eric,
> 
> One query and a comment below.
> 
> Eric Auger  writes:
> 
>> We introduce the capability to (un)register MSI doorbells.
>>
>> A doorbell region is characterized by its physical address base, size,
>> and whether it its safe (ie. it implements IRQ remapping). A doorbell
>> can be per-cpu or global. We currently only care about global doorbells.
>>
>> A function returns whether all registered doorbells are safe.
>>
>> MSI controllers likely to work along with IOMMU that translate MSI
>> transaction must register their doorbells to allow device assignment
>> with MSI support.  Otherwise the MSI transactions will cause IOMMU faults.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v13 -> v14:
>> - previously in msi-doorbell.h/c
>> ---
>>  drivers/iommu/dma-iommu.c | 75 
>> +++
>>  include/linux/dma-iommu.h | 41 ++
>>  2 files changed, 116 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index d45f9a0..d8a7d86 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -43,6 +43,38 @@ struct iommu_dma_cookie {
>>  spinlock_t  msi_lock;
>>  };
>>  
>> +/**
>> + * struct iommu_msi_doorbell_info - MSI doorbell region descriptor
>> + * @percpu_doorbells: per cpu doorbell base address
>> + * @global_doorbell: base address of the doorbell
>> + * @doorbell_is_percpu: is the doorbell per cpu or global?
>> + * @safe: true if irq remapping is implemented
>> + * @size: size of the doorbell
>> + */
>> +struct iommu_msi_doorbell_info {
>> +union {
>> +phys_addr_t __percpu*percpu_doorbells;
> 
> Out of curiosity, have you come across systems that have per-cpu
> doorbells? I couldn't find a system that'd help solidify my
> understanding on it's usage.

This came out after a discussion With Marc. However at the moment I am
not aware of any MSI controller featuring per-cpu doorbell. Not sure
whether it stays relevant to keep this notion at that stage.

> 
>> +phys_addr_t global_doorbell;
>> +};
>> +booldoorbell_is_percpu;
>> +boolsafe;
> 
> Although you've got the comment above, 'safe' doesn't quite convey it's
> purpose. Can this be renamed to something more descriptive -
> 'intr_remapping' or 'intr_isolation' perhaps?

Yes definitively

Thanks

Eric
> 
> Thanks,
> Punit
> 
> 
> [...]
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64

2016-10-17 Thread Auger Eric
Hi Punit,

On 14/10/2016 13:24, Punit Agrawal wrote:
> Hi Eric,
> 
> I am a bit late in joining, but I've tried to familiarise
> myself with earlier discussions on the series.
> 
> Eric Auger  writes:
> 
>> This is the second respin on top of Robin's series [1], addressing Alex' 
>> comments.
>>
>> Major changes are:
>> - MSI-doorbell API now is moved to DMA IOMMU API following Alex suggestion
>>   to put all API pieces at the same place (so eventually in the IOMMU
>>   subsystem)
> 
> IMHO, this is headed in the opposite direction, i.e., away from the
> owner of the information - the doorbells are the property of the MSI
> controller. The MSI controllers know the location, size and interrupt
> remapping capability as well. On the consumer side, VFIO needs access to
> the doorbells to allow userspace to carve out a region in the IOVA.
> 
> I quite liked what you had in v13, though I think you can go further
> though. Instead of adding new doorbell API [un]registration calls, how
> about adding a callback to the irq_domain_ops? The callback will be
> populated for irqdomains registered by MSI controllers.

Thank you for jumping into that thread. Any help/feedback is greatly
appreciated.

Regarding your suggestion, the irq_domain looks dedicated to the
translation between linux irq and HW irq. I tend to think adding an ops
to retrieve the MSI doorbell info at that level looks far from the
original goal of the infrastructure. Obviously the fact there is a list
of such domain is tempting but I preferred to add a separate struct and
separate list.

In the v14 release I moved the "doorbell API" in the dma-iommu API since
Alex recommended to offer a unified API where all pieces would be at the
same place.

Anyway I will follow the guidance of maintainers.


> 
> From VFIO, we can calculate the required aperture reservation by
> iterating over the irqdomains (something like irq_domain_for_each). The
> same callback can also provide information about support for interrupt
> remapping.
> 
> For systems where there are no separate MSI controllers, i.e., the IOMMU
> has a fixed reservation, no MSI callbacks will be populated - which
> tells userspace that no separate MSI reservation is required. IIUC, this
> was one of Alex' concerns on the prior version.

I'am working on a separate series to report to the user-space the usable
IOVA range(s).

Thanks

Eric
> 
> Thoughts, opinions?
> 
> Punit
> 
>> - new iommu_domain_msi_resv struct and accessor through DOMAIN_ATTR_MSI_RESV
>>   domain with mirror VFIO capability
>> - more robustness I think in the VFIO layer
>> - added "iommu/iova: fix __alloc_and_insert_iova_range" since with the 
>> current
>>   code I failed allocating an IOVA page in a single page domain with upper 
>> part
>>   reserved
>>
>> IOVA range exclusion will be handled in a separate series
>>
>> The priority really is to discuss and freeze the API and especially the MSI
>> doorbell's handling. Do we agree to put that in DMA IOMMU?
>>
>> Note: the size computation does not take into account possible page overlaps
>> between doorbells but it would add quite a lot of complexity i think.
>>
>> Tested on AMD Overdrive (single GICv2m frame) with I350 VF assignment.
>>
>> dependency:
>> the series depends on Robin's generic-v7 branch:
>> [1] [PATCH v7 00/22] Generic DT bindings for PCI IOMMUs and ARM SMMU
>> http://www.spinics.net/lists/arm-kernel/msg531110.html
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/generic-v7-pcie-passthru-v14
>>
>> the above branch includes a temporary patch to work around a ThunderX pci
>> bus reset crash (which I think unrelated to this series):
>> "vfio: pci: HACK! workaround thunderx pci_try_reset_bus crash"
>> Do not take this one for other platforms.
>>
>>
>> Eric Auger (15):
>>   iommu/iova: fix __alloc_and_insert_iova_range
>>   iommu: Introduce DOMAIN_ATTR_MSI_RESV
>>   iommu/dma: MSI doorbell alloc/free
>>   iommu/dma: Introduce iommu_calc_msi_resv
>>   iommu/arm-smmu: Implement domain_get_attr for DOMAIN_ATTR_MSI_RESV
>>   irqchip/gic-v2m: Register the MSI doorbell
>>   irqchip/gicv3-its: Register the MSI doorbell
>>   vfio: Introduce a vfio_dma type field
>>   vfio/type1: vfio_find_dma accepting a type argument
>>   vfio/type1: Implement recursive vfio_find_dma_from_node
>>   vfio/type1: Handle unmap/unpin and replay for VFIO_IOVA_RESERVED slots
>>   vfio: Allow reserved msi iova registration
>>   vfio/type1: Check doorbell safety
>>   iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP
>>   vfio/type1: Introduce MSI_RESV capability
>>
>> Robin Murphy (1):
>>   iommu/dma: Allow MSI-only cookies
>>
>>  drivers/iommu/Kconfig|   4 +-
>>  drivers/iommu/arm-smmu-v3.c  |  10 +-
>>  drivers/iommu/arm-smmu.c |  10 +-
>>  drivers/iommu/dma-iommu.c| 184 ++
>>  drivers/iommu/iova.c |   2 +-
>>  drivers/irqchip/irq-gic-v2m.c

Re: [RFC v3 09/10] iommu/arm-smmu: Implement reserved region get/put callbacks

2016-12-07 Thread Auger Eric
Hi Robin,

On 07/12/2016 19:24, Robin Murphy wrote:
> On 07/12/16 15:02, Auger Eric wrote:
>> Hi Robin,
>> On 06/12/2016 19:55, Robin Murphy wrote:
>>> On 15/11/16 13:09, Eric Auger wrote:
>>>> The get() populates the list with the PCI host bridge windows
>>>> and the MSI IOVA range.
>>>>
>>>> At the moment an arbitray MSI IOVA window is set at 0x800
>>>> of size 1MB. This will allow to report those info in iommu-group
>>>> sysfs?
>>
>>
>> First thank you for reviewing the series. This is definitively helpful!
>>>>
>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>>
>>>> ---
>>>>
>>>> RFC v2 -> v3:
>>>> - use existing get/put_resv_regions
>>>>
>>>> RFC v1 -> v2:
>>>> - use defines for MSI IOVA base and length
>>>> ---
>>>>  drivers/iommu/arm-smmu.c | 52 
>>>> 
>>>>  1 file changed, 52 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>> index 8f72814..81f1a83 100644
>>>> --- a/drivers/iommu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm-smmu.c
>>>> @@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
>>>>  
>>>>  #define FSYNR0_WNR(1 << 4)
>>>>  
>>>> +#define MSI_IOVA_BASE 0x800
>>>> +#define MSI_IOVA_LENGTH   0x10
>>>> +
>>>>  static int force_stage;
>>>>  module_param(force_stage, int, S_IRUGO);
>>>>  MODULE_PARM_DESC(force_stage,
>>>> @@ -1545,6 +1548,53 @@ static int arm_smmu_of_xlate(struct device *dev, 
>>>> struct of_phandle_args *args)
>>>>return iommu_fwspec_add_ids(dev, , 1);
>>>>  }
>>>>  
>>>> +static void arm_smmu_get_resv_regions(struct device *dev,
>>>> +struct list_head *head)
>>>> +{
>>>> +  struct iommu_resv_region *region;
>>>> +  struct pci_host_bridge *bridge;
>>>> +  struct resource_entry *window;
>>>> +
>>>> +  /* MSI region */
>>>> +  region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
>>>> +   IOMMU_RESV_MSI);
>>>> +  if (!region)
>>>> +  return;
>>>> +
>>>> +  list_add_tail(>list, head);
>>>> +
>>>> +  if (!dev_is_pci(dev))
>>>> +  return;
>>>> +
>>>> +  bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
>>>> +
>>>> +  resource_list_for_each_entry(window, >windows) {
>>>> +  phys_addr_t start;
>>>> +  size_t length;
>>>> +
>>>> +  if (resource_type(window->res) != IORESOURCE_MEM &&
>>>> +  resource_type(window->res) != IORESOURCE_IO)
>>>
>>> As Joerg commented elsewhere, considering anything other than memory
>>> resources isn't right (I appreciate you've merely copied my own mistake
>>> here). We need some other way to handle root complexes where the CPU
>>> MMIO views of PCI windows appear in PCI memory space - using the I/O
>>> address of I/O resources only works by chance on Juno, and it still
>>> doesn't account for config space. I suggest we just leave that out for
>>> the time being to make life easier (does it even apply to anything other
>>> than Juno?) and figure it out later.
>> OK so I understand I should remove IORESOURCE_IO check.
>>>
>>>> +  continue;
>>>> +
>>>> +  start = window->res->start - window->offset;
>>>> +  length = window->res->end - window->res->start + 1;
>>>> +  region = iommu_alloc_resv_region(start, length,
>>>> +   IOMMU_RESV_NOMAP);
>>>> +  if (!region)
>>>> +  return;
>>>> +  list_add_tail(>list, head);
>>>> +  }
>>>> +}
>>>
>>> Either way, there's nothing SMMU-specific about PCI windows. The fact
>>> that we'd have to copy-paste all of this into the SMMUv3 driver
>>> unchanged suggests it should go somewhere common (although I would be
>>> inclined to leave the insertion of the fake MSI region 

Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-12-08 Thread Auger Eric
Hi Robin,

On 08/12/2016 14:14, Robin Murphy wrote:
> On 08/12/16 09:36, Auger Eric wrote:
>> Hi,
>>
>> On 15/11/2016 14:09, Eric Auger wrote:
>>> Following LPC discussions, we now report reserved regions through
>>> iommu-group sysfs reserved_regions attribute file.
>>
>>
>> While I am respinning this series into v4, here is a tentative summary
>> of technical topics for which no consensus was reached at this point.
>>
>> 1) Shall we report the usable IOVA range instead of reserved IOVA
>>ranges. Not discussed at/after LPC.
>>x I currently report reserved regions. Alex expressed the need to
>>  report the full usable IOVA range instead (x86 min-max range
>>  minus MSI APIC window). I think this is meaningful for ARM
>>  too where arm-smmu might not support the full 64b range.
>>x Any objection we report the usable IOVA regions instead?
> 
> The issue with that is that we can't actually report "the usable
> regions" at the moment, as that involves pulling together disjoint
> properties of arbitrary hardware unrelated to the IOMMU. We'd be
> reporting "the not-definitely-unusable regions, which may have some
> unusable holes in them still". That seems like an ABI nightmare - I'd
> still much rather say "here are some, but not necessarily all, regions
> you definitely can't use", because saying "here are some regions which
> you might be able to use most of, probably" is what we're already doing
> today, via a single implicit region from 0 to ULONG_MAX ;)
> 
> The address space limits are definitely useful to know, but I think it
> would be better to expose them separately to avoid the ambiguity. At
> worst, I guess it would be reasonable to express the limits via an
> "out-of-range" reserved region type for 0 to $base and $top to
> ULONG-MAX. To *safely* expose usable regions, we'd have to start out
> with a very conservative assumption (e.g. only IOVAs matching physical
> RAM), and only expand them once we're sure we can detect every possible
> bit of problematic hardware in the system - that's just too limiting to
> be useful. And if we expose something knowingly inaccurate, we risk
> having another "bogoMIPS in /proc/cpuinfo" ABI burden on our hands, and
> nobody wants that...
Makes sense to me. "out-of-range reserved region type for 0 to $base and
$top to ULONG-MAX" can be an alternative to fulfill the requirement.
> 
>> 2) Shall the kernel check collision with MSI window* when userspace
>>calls VFIO_IOMMU_MAP_DMA?
>>Joerg/Will No; Alex yes
>>*for IOVA regions consumed downstream to the IOMMU: everyone says NO
> 
> If we're starting off by having the SMMU drivers expose it as a fake
> fixed region, I don't think we need to worry about this yet. We all seem
> to agree that as long as we communicate the fixed regions to userspace,
> it's then userspace's job to work around them. Let's come back to this
> one once we actually get to the point of dynamically sizing and
> allocating 'real' MSI remapping region(s).
> 
> Ultimately, the kernel *will* police collisions either way, because an
> underlying iommu_map() is going to fail if overlapping IOVAs are ever
> actually used, so it's really just a question of whether to have a more
> user-friendly failure mode.
That's true on ARM but not on x86 where the APIC MSI region is not
mapped I think.
> 
>> 3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no
>>My current series does not expose them in iommu group sysfs.
>>I understand we can expose the RMRR regions in the iomm group sysfs
>>without necessarily supporting RMRR requiring device assignment.
>>We can also add this support later.
> 
> As you say, reporting them doesn't necessitate allowing device
> assignment, and it's information which can already be easily grovelled
> out of dmesg (for intel-iommu at least) - there doesn't seem to be any
> need to hide them, but the x86 folks can have the final word on that.
agreed

Thanks

Eric
> 
> Robin.
> 
>> Thanks
>>
>> Eric
>>
>>
>>>
>>> Reserved regions are populated through the IOMMU get_resv_region callback
>>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
>>> arm-smmu.
>>>
>>> The intel-iommu reports the [FEE0_h - FEF0_000h] MSI window as an
>>> IOMMU_RESV_NOMAP reserved region.
>>>
>>> arm-smmu reports the MSI window (arbitrarily located at 0x800 and
>>> 1MB large) and the PCI host bridge windows.
>>>
>>> The series integrates a not officially posted patch from Robin

Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-12-08 Thread Auger Eric
Hi,

On 15/11/2016 14:09, Eric Auger wrote:
> Following LPC discussions, we now report reserved regions through
> iommu-group sysfs reserved_regions attribute file.


While I am respinning this series into v4, here is a tentative summary
of technical topics for which no consensus was reached at this point.

1) Shall we report the usable IOVA range instead of reserved IOVA
   ranges. Not discussed at/after LPC.
   x I currently report reserved regions. Alex expressed the need to
 report the full usable IOVA range instead (x86 min-max range
 minus MSI APIC window). I think this is meaningful for ARM
 too where arm-smmu might not support the full 64b range.
   x Any objection we report the usable IOVA regions instead?

2) Shall the kernel check collision with MSI window* when userspace
   calls VFIO_IOMMU_MAP_DMA?
   Joerg/Will No; Alex yes
   *for IOVA regions consumed downstream to the IOMMU: everyone says NO

3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no
   My current series does not expose them in iommu group sysfs.
   I understand we can expose the RMRR regions in the iomm group sysfs
   without necessarily supporting RMRR requiring device assignment.
   We can also add this support later.

Thanks

Eric


> 
> Reserved regions are populated through the IOMMU get_resv_region callback
> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
> arm-smmu.
> 
> The intel-iommu reports the [FEE0_h - FEF0_000h] MSI window as an
> IOMMU_RESV_NOMAP reserved region.
> 
> arm-smmu reports the MSI window (arbitrarily located at 0x800 and
> 1MB large) and the PCI host bridge windows.
> 
> The series integrates a not officially posted patch from Robin:
> "iommu/dma: Allow MSI-only cookies".
> 
> This series currently does not address IRQ safety assessment.
> 
> Best Regards
> 
> Eric
> 
> Git: complete series available at
> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
> 
> History:
> RFC v2 -> v3:
> - switch to an iommu-group sysfs API
> - use new dummy allocator provided by Robin
> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>   the reserved regions
> - at the moment ARM MSI base address/size is left unchanged compared
>   to v2
> - we currently report reserved regions and not usable IOVA regions as
>   requested by Alex
> 
> RFC v1 -> v2:
> - fix intel_add_reserved_regions
> - add mutex lock/unlock in vfio_iommu_type1
> 
> 
> Eric Auger (10):
>   iommu/dma: Allow MSI-only cookies
>   iommu: Rename iommu_dm_regions into iommu_resv_regions
>   iommu: Add new reserved IOMMU attributes
>   iommu: iommu_alloc_resv_region
>   iommu: Do not map reserved regions
>   iommu: iommu_get_group_resv_regions
>   iommu: Implement reserved_regions iommu-group sysfs file
>   iommu/vt-d: Implement reserved region get/put callbacks
>   iommu/arm-smmu: Implement reserved region get/put callbacks
>   vfio/type1: Get MSI cookie
> 
>  drivers/iommu/amd_iommu.c   |  20 +++---
>  drivers/iommu/arm-smmu.c|  52 +++
>  drivers/iommu/dma-iommu.c   | 116 ++---
>  drivers/iommu/intel-iommu.c |  50 ++
>  drivers/iommu/iommu.c   | 141 
> 
>  drivers/vfio/vfio_iommu_type1.c |  26 
>  include/linux/dma-iommu.h   |   7 ++
>  include/linux/iommu.h   |  49 ++
>  8 files changed, 391 insertions(+), 70 deletions(-)
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-12-07 Thread Auger Eric
Hi Shanker,

On 07/12/2016 19:52, Shanker Donthineni wrote:
> Hi Eric,
> 
> Is there any reason why you are not supporting SMMUv3 driver? Qualcomm
> hardware doesn't not support SMMUv2 hardware, please add support for
> SMMUv3 in next patch set. I've ported ' RFC,v3,09/10] iommu/arm-smmu:
> Implement reserved region get/put callbacks' to SMMUv3 driver and tested
> device-pass-through feature on Qualcomm server platform without any issue.
> 
> Tested-by: Shanker Donthineni 
Thanks!

No reason behind not supporting smmuv3 except I don't have any HW to test.

I will add this support in next version.

Thanks

Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-12-12 Thread Auger Eric
Hi Don,

On 11/12/2016 03:05, Don Dutile wrote:
> On 12/08/2016 04:36 AM, Auger Eric wrote:
>> Hi,
>>
>> On 15/11/2016 14:09, Eric Auger wrote:
>>> Following LPC discussions, we now report reserved regions through
>>> iommu-group sysfs reserved_regions attribute file.
>>
>>
>> While I am respinning this series into v4, here is a tentative summary
>> of technical topics for which no consensus was reached at this point.
>>
>> 1) Shall we report the usable IOVA range instead of reserved IOVA
>> ranges. Not discussed at/after LPC.
>> x I currently report reserved regions. Alex expressed the need to
>>   report the full usable IOVA range instead (x86 min-max range
>>   minus MSI APIC window). I think this is meaningful for ARM
>>   too where arm-smmu might not support the full 64b range.
>> x Any objection we report the usable IOVA regions instead?
>>
>> 2) Shall the kernel check collision with MSI window* when userspace
>> calls VFIO_IOMMU_MAP_DMA?
>> Joerg/Will No; Alex yes
>> *for IOVA regions consumed downstream to the IOMMU: everyone says NO
>>
>> 3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no
> Um, I'm missing context, but the only thing I recall saying no to wrt RMRR
> is that _any_ device that has an RMRR cannot be assigned to a guest.
Yes that was my understanding
> Or, are you saying, RMRR's should be exposed in the guest os?  if so, then
> you have my 'no' there.
> 
>> My current series does not expose them in iommu group sysfs.
>> I understand we can expose the RMRR regions in the iomm group sysfs
>> without necessarily supporting RMRR requiring device assignment.
> This sentence doesn't make sense to me.
> Can you try re-wording it?
> I can't tell what RMRR has to do w/device assignment, other than what I
> said above.
> Exposing RMRR's in sysfs is not an issue in general.
Sorry for the confusion. I Meant we can expose RMRR regions as part of
the reserved regions through the iommu group sysfs API without
supporting device assignment of devices that has RMRR.

Hope it clarifies

Eric
> 
>> We can also add this support later.
>>
>> Thanks
>>
>> Eric
>>
>>
>>>
>>> Reserved regions are populated through the IOMMU get_resv_region
>>> callback
>>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
>>> arm-smmu.
>>>
>>> The intel-iommu reports the [FEE0_h - FEF0_000h] MSI window as an
>>> IOMMU_RESV_NOMAP reserved region.
>>>
>>> arm-smmu reports the MSI window (arbitrarily located at 0x800 and
>>> 1MB large) and the PCI host bridge windows.
>>>
>>> The series integrates a not officially posted patch from Robin:
>>> "iommu/dma: Allow MSI-only cookies".
>>>
>>> This series currently does not address IRQ safety assessment.
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> Git: complete series available at
>>> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
>>>
>>> History:
>>> RFC v2 -> v3:
>>> - switch to an iommu-group sysfs API
>>> - use new dummy allocator provided by Robin
>>> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>>>the reserved regions
>>> - at the moment ARM MSI base address/size is left unchanged compared
>>>to v2
>>> - we currently report reserved regions and not usable IOVA regions as
>>>requested by Alex
>>>
>>> RFC v1 -> v2:
>>> - fix intel_add_reserved_regions
>>> - add mutex lock/unlock in vfio_iommu_type1
>>>
>>>
>>> Eric Auger (10):
>>>iommu/dma: Allow MSI-only cookies
>>>iommu: Rename iommu_dm_regions into iommu_resv_regions
>>>iommu: Add new reserved IOMMU attributes
>>>iommu: iommu_alloc_resv_region
>>>iommu: Do not map reserved regions
>>>iommu: iommu_get_group_resv_regions
>>>iommu: Implement reserved_regions iommu-group sysfs file
>>>iommu/vt-d: Implement reserved region get/put callbacks
>>>iommu/arm-smmu: Implement reserved region get/put callbacks
>>>vfio/type1: Get MSI cookie
>>>
>>>   drivers/iommu/amd_iommu.c   |  20 +++---
>>>   drivers/iommu/arm-smmu.c|  52 +++
>>>   drivers/iommu/dma-iommu.c   | 116
>>> ++---
>>>   drivers/iommu/intel-iommu.c |  50 ++
>>>   drivers/iommu/iommu.c   | 141
>>> 
>>>   drivers/vfio/vfio_iommu_type1.c |  26 
>>>   include/linux/dma-iommu.h   |   7 ++
>>>   include/linux/iommu.h   |  49 ++
>>>   8 files changed, 391 insertions(+), 70 deletions(-)
>>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 06/10] iommu: iommu_get_group_resv_regions

2016-12-07 Thread Auger Eric
Hi Robin,

On 06/12/2016 19:13, Robin Murphy wrote:
> On 15/11/16 13:09, Eric Auger wrote:
>> Introduce iommu_get_group_resv_regions whose role consists in
>> enumerating all devices from the group and collecting their
>> reserved regions. It checks duplicates.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> - we do not move list elements from device to group list since
>>   the iommu_put_resv_regions() could not be called.
>> - at the moment I did not introduce any iommu_put_group_resv_regions
>>   since it simply consists in voiding/freeing the list
>> ---
>>  drivers/iommu/iommu.c | 53 
>> +++
>>  include/linux/iommu.h |  8 
>>  2 files changed, 61 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index a4530ad..e0fbcc5 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -133,6 +133,59 @@ static ssize_t iommu_group_show_name(struct iommu_group 
>> *group, char *buf)
>>  return sprintf(buf, "%s\n", group->name);
>>  }
>>  
>> +static bool iommu_resv_region_present(struct iommu_resv_region *region,
>> +  struct list_head *head)
>> +{
>> +struct iommu_resv_region *entry;
>> +
>> +list_for_each_entry(entry, head, list) {
>> +if ((region->start == entry->start) &&
>> +(region->length == entry->length) &&
>> +(region->prot == entry->prot))
>> +return true;
>> +}
>> +return false;
>> +}
>> +
>> +static int
>> +iommu_insert_device_resv_regions(struct list_head *dev_resv_regions,
>> + struct list_head *group_resv_regions)
>> +{
>> +struct iommu_resv_region *entry, *region;
>> +
>> +list_for_each_entry(entry, dev_resv_regions, list) {
>> +if (iommu_resv_region_present(entry, group_resv_regions))
>> +continue;
> 
> In the case of overlapping regions which _aren't_ an exact match, would
> it be better to expand the existing one rather than leave the caller to
> sort it out? It seems a bit inconsistent to handle only the one case here.

Well this is mostly here to avoid inserting several times the same PCIe
host bridge windows (retrieved from several PCIe EP attached to the same
bridge). I don't know if it is worth making things over-complicated. Do
you have another situation in mind?
> 
>> +region = iommu_alloc_resv_region(entry->start, entry->length,
>> +   entry->prot);
>> +if (!region)
>> +return -ENOMEM;
>> +
>> +list_add_tail(>list, group_resv_regions);
>> +}
>> +return 0;
>> +}
>> +
>> +int iommu_get_group_resv_regions(struct iommu_group *group,
>> + struct list_head *head)
>> +{
>> +struct iommu_device *device;
>> +int ret = 0;
>> +
>> +list_for_each_entry(device, >devices, list) {
> 
> Should we not be taking the group mutex around this?
Yes you're right.

Thanks

Eric
> 
> Robin.
> 
>> +struct list_head dev_resv_regions;
>> +
>> +INIT_LIST_HEAD(_resv_regions);
>> +iommu_get_resv_regions(device->dev, _resv_regions);
>> +ret = iommu_insert_device_resv_regions(_resv_regions, head);
>> +iommu_put_resv_regions(device->dev, _resv_regions);
>> +if (ret)
>> +break;
>> +}
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);
>> +
>>  static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
>>  
>>  static void iommu_group_release(struct kobject *kobj)
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 0aea877..0f7ae2c 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -243,6 +243,8 @@ extern void iommu_set_fault_handler(struct iommu_domain 
>> *domain,
>>  extern int iommu_request_dm_for_dev(struct device *dev);
>>  extern struct iommu_resv_region *
>>  iommu_alloc_resv_region(phys_addr_t start, size_t length, unsigned int 
>> prot);
>> +extern int iommu_get_group_resv_regions(struct iommu_group *group,
>> +struct list_head *head);
>>  
>>  extern int iommu_attach_group(struct iommu_domain *domain,
>>struct iommu_group *group);
>> @@ -462,6 +464,12 @@ static inline void iommu_put_resv_regions(struct device 
>> *dev,
>>  return NULL;
>>  }
>>  
>> +static inline int iommu_get_group_resv_regions(struct iommu_group *group,
>> +   struct list_head *head)
>> +{
>> +return -ENODEV;
>> +}
>> +
>>  static inline int iommu_request_dm_for_dev(struct device *dev)
>>  {
>>  return -ENODEV;
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  

Re: [PATCH v7 08/19] iommu: Implement reserved_regions iommu-group sysfs file

2017-01-10 Thread Auger Eric
Hi Joerg,

On 09/01/2017 14:45, Eric Auger wrote:
> A new iommu-group sysfs attribute file is introduced. It contains
> the list of reserved regions for the iommu-group. Each reserved
> region is described on a separate line:
> - first field is the start IOVA address,
> - second is the end IOVA address,
> - third is the type.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v6 -> v7:
> - also report the type of the reserved region as a string
> - updated ABI documentation
> 
> v3 -> v4:
> - add cast to long long int when printing to avoid warning on
>   i386
> - change S_IRUGO into 0444
> - remove sort. The list is natively sorted now.
> 
> The file layout is inspired of /sys/bus/pci/devices/BDF/resource.
> I also read Documentation/filesystems/sysfs.txt so I expect this
> to be frowned upon.
> ---
>  .../ABI/testing/sysfs-kernel-iommu_groups  | 12 +++
>  drivers/iommu/iommu.c  | 38 
> ++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
> b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
> index 9b31556..35c64e0 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
> +++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
> @@ -12,3 +12,15 @@ Description:   /sys/kernel/iommu_groups/ contains a 
> number of sub-
>   file if the IOMMU driver has chosen to register a more
>   common name for the group.
>  Users:
> +
> +What:/sys/kernel/iommu_groups/reserved_regions
> +Date:January 2017
> +KernelVersion:  v4.11
> +Contact: Eric Auger 
> +Description:/sys/kernel/iommu_groups/reserved_regions list IOVA
> + regions that are reserved. Not necessarily all
> + reserved regions are listed. This is typically used to
> + output direct-mapped, MSI, non mappable regions. Each
> + region is described on a single line: the 1st field is
> + the base IOVA, the second is the end IOVA and the third
> + field describes the type of the region.
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 640056b..0123daa 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -68,6 +68,12 @@ struct iommu_group_attribute {
>const char *buf, size_t count);
>  };
>  
> +static const char * const iommu_group_resv_type_string[] = {
> + [IOMMU_RESV_DIRECT] = "direct",
> + [IOMMU_RESV_RESERVED]   = "reserved",
> + [IOMMU_RESV_MSI]= "msi",
> +};
> +
>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)\
>  struct iommu_group_attribute iommu_group_attr_##_name =  \
>   __ATTR(_name, _mode, _show, _store)
> @@ -231,8 +237,33 @@ int iommu_get_group_resv_regions(struct iommu_group 
> *group,
>  }
>  EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);
>  
> +static ssize_t iommu_group_show_resv_regions(struct iommu_group *group,
> +  char *buf)
> +{
> + struct iommu_resv_region *region, *next;
> + struct list_head group_resv_regions;
> + char *str = buf;
> +
> + INIT_LIST_HEAD(_resv_regions);
> + iommu_get_group_resv_regions(group, _resv_regions);
> +
> + list_for_each_entry_safe(region, next, _resv_regions, list) {
> + str += sprintf(str, "0x%016llx 0x%016llx %s\n",
> +(long long int)region->start,
> +(long long int)(region->start +
> + region->length - 1),
> +iommu_group_resv_type_string[region->type]);
> + kfree(region);
> + }
> +
> + return (str - buf);
> +}
> +
>  static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
>  
> +static IOMMU_GROUP_ATTR(reserved_regions, 0444,
> + iommu_group_show_resv_regions, NULL);
> +
>  static void iommu_group_release(struct kobject *kobj)
>  {
>   struct iommu_group *group = to_iommu_group(kobj);
> @@ -247,6 +278,8 @@ static void iommu_group_release(struct kobject *kobj)
>   if (group->default_domain)
>   iommu_domain_free(group->default_domain);
>  
> + iommu_group_remove_file(group, _group_attr_reserved_regions);

The /sys/kernel/iommu_groups/n directory seems to be removed before this
gets called and this may produce a WARNING when devices get removed from
the group. I intend to remove the call since I have the feeling
everything gets cleaned up properly.

Do you see any issue?

Thanks

Eric

[  350.753618] iommu: Removing device :01:10.0 from group 7
[  350.759331] kernfs: can not remove 'reserved_regions', no directory
[  350.765603] [ cut here ]
[  350.770216] WARNING: CPU: 3 PID: 2617 at fs/kernfs/dir.c:1406
kernfs_remove_by_name_ns+0x8c/0x98
../..
[  351.028154] [] 

Re: [PATCH v7 00/19] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2017-01-10 Thread Auger Eric
Hi all,

On 10/01/2017 15:09, Joerg Roedel wrote:
> On Mon, Jan 09, 2017 at 01:45:51PM +, Eric Auger wrote:
>> Eric Auger (17):
>>   iommu: Rename iommu_dm_regions into iommu_resv_regions
>>   iommu: Add a new type field in iommu_resv_region
>>   iommu: iommu_alloc_resv_region
>>   iommu: Only map direct mapped regions
>>   iommu: iommu_get_group_resv_regions
>>   iommu: Implement reserved_regions iommu-group sysfs file
>>   iommu/vt-d: Implement reserved region get/put callbacks
>>   iommu/amd: Declare MSI and HT regions as reserved IOVA regions
>>   iommu/arm-smmu: Implement reserved region get/put callbacks
>>   iommu/arm-smmu-v3: Implement reserved region get/put callbacks
> 
> IOMMU patches look good, what is the plan to merge this? I'd like to
> take the IOMMU patches and can provide a branch for someone else to base
> the rest on.

I will respin asap taking into account Marc's nits, Robin's update. Also
I would like to do one change in

[PATCH v7 08/19] iommu: Implement reserved_regions iommu-group sysfs file

I will send a separate email.

Thanks

Eric

> 
> 
>   Joerg
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 00/18] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2017-01-11 Thread Auger Eric
Hi Bharat,
On 12/01/2017 04:59, Bharat Bhushan wrote:
> 
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: Wednesday, January 11, 2017 3:12 PM
>> To: eric.au...@redhat.com; eric.auger@gmail.com;
>> christoffer.d...@linaro.org; marc.zyng...@arm.com;
>> robin.mur...@arm.com; alex.william...@redhat.com;
>> will.dea...@arm.com; j...@8bytes.org; t...@linutronix.de;
>> ja...@lakedaemon.net; linux-arm-ker...@lists.infradead.org
>> Cc: k...@vger.kernel.org; drjo...@redhat.com; linux-
>> ker...@vger.kernel.org; pranav.sawargaon...@gmail.com;
>> iommu@lists.linux-foundation.org; punit.agra...@arm.com; Diana Madalina
>> Craciun ; gpkulka...@gmail.com;
>> shank...@codeaurora.org; Bharat Bhushan ;
>> geethasowjanya.ak...@gmail.com
>> Subject: [PATCH v8 00/18] KVM PCIe/MSI passthrough on ARM/ARM64 and
>> IOVA reserved regions
>>
>> Following LPC discussions, we now report reserved regions through the
>> iommu-group sysfs reserved_regions attribute file.
>>
>> Reserved regions are populated through the IOMMU get_resv_region
>> callback (former get_dm_regions), now implemented by amd-iommu, intel-
>> iommu and arm-smmu:
>> - the intel-iommu reports the [0xfee0 - 0xfeef] MSI window
>>   as a reserved region and RMRR regions as direct-mapped regions.
>> - the amd-iommu reports device direct mapped regions, the MSI region
>>   and HT regions.
>> - the arm-smmu reports the MSI window (arbitrarily located at
>>   0x800 and 1MB large).
>>
>> Unsafe interrupt assignment is tested by enumerating all MSI irq domains
>> and checking MSI remapping is supported in the above hierarchy.
>> This check is done in case we detect the iommu translates MSI (an
>> IOMMU_RESV_MSI window exists). Otherwise the IRQ remapping capability
>> is checked at IOMMU level. Obviously this is a defensive IRQ safety
>> assessment: Assuming there are several MSI controllers in the system and at
>> least one does not implement IRQ remapping, the assignment will be
>> considered as unsafe (even if this controller is not acessible from the
>> assigned devices).
>>
>> The series first patch stems from Robin's branch:
>> http://linux-arm.org/git?p=linux-
>> rm.git;a=shortlog;h=refs/heads/iommu/misc
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/v4.10-rc3-reserved-v8
> 
> This series is tested on NXP platform, if you want you can add my tested by
> Tested-by: Bharat Bhushan 
Thank you for this!

Best Regards

Eric
> 
> Thanks
> -Bharat
> 
>>
>> istory:
>>
>> PATCHv7 -> PATCHv8
>> - take into account Marc's comments and apply his R-b
>> - remove iommu_group_remove_file call in iommu_group_release
>> - add Will's A-b
>> - removed [PATCH v7 01/19] iommu/dma: Implement PCI allocation
>>   optimisation and updated iommu/dma: Allow MSI-only cookies
>>   as per Robin's indications
>>
>> PATCHv6 -> PATCHv7:
>> - iommu/dma: Implement PCI allocation optimisation was added to apply
>>   iommu/dma: Allow MSI-only cookies
>> - report Intel RMRR as direct-mapped regions
>> - report the type in the iommu group sysfs reserved_regions file
>> - do not merge regions of different types when building the list
>>   of reserved regions
>> - intgeration Robin's "iommu/dma: Allow MSI-only cookies" last
>>   version
>> - update Documentation/ABI/testing/sysfs-kernel-iommu_groups
>> - rename IOMMU_RESV_NOMAP into IOMMU_RESV_RESERVED
>>
>> PATCHv5 -> PATCHv6
>> - Introduce IRQ_DOMAIN_FLAG_MSI as suggested by Marc
>> - irq_domain_is_msi, irq_domain_is_msi_remap,
>>   irq_domain_hierarchical_is_msi_remap,
>> - set IRQ_DOMAIN_FLAG_MSI in msi_create_irq_domain
>> - fix compil issue on i386
>> - rework test at VFIO level
>>
>> RFCv4 -> PATCHv5
>> - fix IRQ security assessment by looking at irq domain parents
>> - check DOMAIN_BUS_FSL_MC_MSI irq domains
>> - AMD MSI and HT regions are exposed in iommu group sysfs
>>
>> RFCv3 -> RFCv4:
>> - arm-smmu driver does not register PCI host bridge windows as
>>   reserved regions anymore
>> - Implement reserved region get/put callbacks also in arm-smmuv3
>> - take the iommu_group lock on iommu_get_group_resv_regions
>> - add a type field in iommu_resv_region instead of using prot
>> - init the region list_head in iommu_alloc_resv_region, also
>>   add type parameter
>> - iommu_insert_resv_region manage overlaps and sort reserved
>>   windows
>> - address IRQ safety assessment by enumerating all the MSI irq
>>   domains and checking the MSI_REMAP flag
>> - update Documentation/ABI/testing/sysfs-kernel-iommu_groups
>>
>> RFC v2 -> v3:
>> - switch to an iommu-group sysfs API
>> - use new dummy allocator provided by Robin
>> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>>   the reserved regions
>> - at the moment ARM MSI base address/size is left unchanged compared
>>   to v2
>> - we currently report reserved regions and not usable IOVA regions 

Re: [PATCH v7 08/19] iommu: Implement reserved_regions iommu-group sysfs file

2017-01-10 Thread Auger Eric
Hi Joerg,
On 10/01/2017 18:14, Joerg Roedel wrote:
> On Tue, Jan 10, 2017 at 05:20:34PM +0100, Auger Eric wrote:
>> The /sys/kernel/iommu_groups/n directory seems to be removed before this
>> gets called and this may produce a WARNING when devices get removed from
>> the group. I intend to remove the call since I have the feeling
>> everything gets cleaned up properly.
> 
> A feeling is not enough, please check that in the code.

So my understanding is on group's kobject_release we have:
kobject_release
|_ kobject_cleanup
|_ kobject_del
|_ sysfs_remove_dir
|_ kernfs_remove
|_ _kernfs_remove
../..
|_ ktype release (iommu_group_release)

_kernfs_remove() calls kernfs_put() on all descendant nodes, leading to
the whole directory cleanup.

In iommu_group_release I called sysfs_remove_file on the
reserved_regions attribute file. My understanding is its job is
identifical as what was done previously and the node was already
destroyed hence the warning.

sysfs_remove_file
|_ sysfs_remove_file_ns
|_ kernfs_remove_by_name_ns
|_kernfs_remove

So my understanding is it is safe to remove it.

Thanks

Eric
> 
> 
>   Joerg
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-11-30 Thread Auger Eric
Hi,

On 15/11/2016 14:09, Eric Auger wrote:
> Following LPC discussions, we now report reserved regions through
> iommu-group sysfs reserved_regions attribute file.
> 
> Reserved regions are populated through the IOMMU get_resv_region callback
> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
> arm-smmu.
> 
> The intel-iommu reports the [FEE0_h - FEF0_000h] MSI window as an
> IOMMU_RESV_NOMAP reserved region.
> 
> arm-smmu reports the MSI window (arbitrarily located at 0x800 and
> 1MB large) and the PCI host bridge windows.
> 
> The series integrates a not officially posted patch from Robin:
> "iommu/dma: Allow MSI-only cookies".
> 
> This series currently does not address IRQ safety assessment.

I will respin this series taking into account Joerg's comment. Does
anyone have additional comments or want to put forward some conceptual
issues with the current direction and with this implementation?

As for the IRQ safety assessment, in a first step I would propose to
remove the IOMMU_CAP_INTR_REMAP from arm-smmus and consider the
assignment as unsafe. Any objection?

Thanks

Eric


> Best Regards
> 
> Eric
> 
> Git: complete series available at
> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
> 
> History:
> RFC v2 -> v3:
> - switch to an iommu-group sysfs API
> - use new dummy allocator provided by Robin
> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>   the reserved regions
> - at the moment ARM MSI base address/size is left unchanged compared
>   to v2
> - we currently report reserved regions and not usable IOVA regions as
>   requested by Alex
> 
> RFC v1 -> v2:
> - fix intel_add_reserved_regions
> - add mutex lock/unlock in vfio_iommu_type1
> 
> 
> Eric Auger (10):
>   iommu/dma: Allow MSI-only cookies
>   iommu: Rename iommu_dm_regions into iommu_resv_regions
>   iommu: Add new reserved IOMMU attributes
>   iommu: iommu_alloc_resv_region
>   iommu: Do not map reserved regions
>   iommu: iommu_get_group_resv_regions
>   iommu: Implement reserved_regions iommu-group sysfs file
>   iommu/vt-d: Implement reserved region get/put callbacks
>   iommu/arm-smmu: Implement reserved region get/put callbacks
>   vfio/type1: Get MSI cookie
> 
>  drivers/iommu/amd_iommu.c   |  20 +++---
>  drivers/iommu/arm-smmu.c|  52 +++
>  drivers/iommu/dma-iommu.c   | 116 ++---
>  drivers/iommu/intel-iommu.c |  50 ++
>  drivers/iommu/iommu.c   | 141 
> 
>  drivers/vfio/vfio_iommu_type1.c |  26 
>  include/linux/dma-iommu.h   |   7 ++
>  include/linux/iommu.h   |  49 ++
>  8 files changed, 391 insertions(+), 70 deletions(-)
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 04/10] iommu: iommu_alloc_resv_region

2016-11-30 Thread Auger Eric
Hi Joerg,

On 29/11/2016 17:11, Joerg Roedel wrote:
> On Tue, Nov 15, 2016 at 01:09:17PM +, Eric Auger wrote:
>> +static inline struct iommu_resv_region *
>> +iommu_alloc_resv_region(phys_addr_t start, size_t length, unsigned int prot)
>> +{
>> +return NULL;
>> +}
>> +
> 
> Will this function be called outside of iommu code?

No the function is not bound to be called outside of the iommu code. I
will remove this.

Thanks

Eric
> 
> 
> 
>   Joerg
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-11-30 Thread Auger Eric
Hi Ganapat,

On 30/11/2016 11:04, Ganapatrao Kulkarni wrote:
> Hi Eric,
> 
> in you repo "https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3;
> there is 11th patch "pci: Enable overrides for missing ACS capabilities"
> is this patch part of some other series?

Actually this is a very old patch from Alex aimed at working around lack
of PCIe ACS support: https://lkml.org/lkml/2013/5/30/513

Thanks

Eric
> 
> thanks
> Ganapat
> 
> On Wed, Nov 30, 2016 at 3:19 PM, Auger Eric <eric.au...@redhat.com> wrote:
>> Hi,
>>
>> On 15/11/2016 14:09, Eric Auger wrote:
>>> Following LPC discussions, we now report reserved regions through
>>> iommu-group sysfs reserved_regions attribute file.
>>>
>>> Reserved regions are populated through the IOMMU get_resv_region callback
>>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
>>> arm-smmu.
>>>
>>> The intel-iommu reports the [FEE0_h - FEF0_000h] MSI window as an
>>> IOMMU_RESV_NOMAP reserved region.
>>>
>>> arm-smmu reports the MSI window (arbitrarily located at 0x800 and
>>> 1MB large) and the PCI host bridge windows.
>>>
>>> The series integrates a not officially posted patch from Robin:
>>> "iommu/dma: Allow MSI-only cookies".
>>>
>>> This series currently does not address IRQ safety assessment.
>>
>> I will respin this series taking into account Joerg's comment. Does
>> anyone have additional comments or want to put forward some conceptual
>> issues with the current direction and with this implementation?
>>
>> As for the IRQ safety assessment, in a first step I would propose to
>> remove the IOMMU_CAP_INTR_REMAP from arm-smmus and consider the
>> assignment as unsafe. Any objection?
>>
>> Thanks
>>
>> Eric
>>
>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> Git: complete series available at
>>> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
>>>
>>> History:
>>> RFC v2 -> v3:
>>> - switch to an iommu-group sysfs API
>>> - use new dummy allocator provided by Robin
>>> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>>>   the reserved regions
>>> - at the moment ARM MSI base address/size is left unchanged compared
>>>   to v2
>>> - we currently report reserved regions and not usable IOVA regions as
>>>   requested by Alex
>>>
>>> RFC v1 -> v2:
>>> - fix intel_add_reserved_regions
>>> - add mutex lock/unlock in vfio_iommu_type1
>>>
>>>
>>> Eric Auger (10):
>>>   iommu/dma: Allow MSI-only cookies
>>>   iommu: Rename iommu_dm_regions into iommu_resv_regions
>>>   iommu: Add new reserved IOMMU attributes
>>>   iommu: iommu_alloc_resv_region
>>>   iommu: Do not map reserved regions
>>>   iommu: iommu_get_group_resv_regions
>>>   iommu: Implement reserved_regions iommu-group sysfs file
>>>   iommu/vt-d: Implement reserved region get/put callbacks
>>>   iommu/arm-smmu: Implement reserved region get/put callbacks
>>>   vfio/type1: Get MSI cookie
>>>
>>>  drivers/iommu/amd_iommu.c   |  20 +++---
>>>  drivers/iommu/arm-smmu.c|  52 +++
>>>  drivers/iommu/dma-iommu.c   | 116 ++---
>>>  drivers/iommu/intel-iommu.c |  50 ++
>>>  drivers/iommu/iommu.c   | 141 
>>> 
>>>  drivers/vfio/vfio_iommu_type1.c |  26 
>>>  include/linux/dma-iommu.h   |   7 ++
>>>  include/linux/iommu.h   |  49 ++
>>>  8 files changed, 391 insertions(+), 70 deletions(-)
>>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

2016-11-30 Thread Auger Eric
Hi Will,

On 30/11/2016 11:37, Will Deacon wrote:
> On Wed, Nov 30, 2016 at 10:49:33AM +0100, Auger Eric wrote:
>> On 15/11/2016 14:09, Eric Auger wrote:
>>> Following LPC discussions, we now report reserved regions through
>>> iommu-group sysfs reserved_regions attribute file.
>>>
>>> Reserved regions are populated through the IOMMU get_resv_region callback
>>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
>>> arm-smmu.
>>>
>>> The intel-iommu reports the [FEE0_h - FEF0_000h] MSI window as an
>>> IOMMU_RESV_NOMAP reserved region.
>>>
>>> arm-smmu reports the MSI window (arbitrarily located at 0x800 and
>>> 1MB large) and the PCI host bridge windows.
>>>
>>> The series integrates a not officially posted patch from Robin:
>>> "iommu/dma: Allow MSI-only cookies".
>>>
>>> This series currently does not address IRQ safety assessment.
>>
>> I will respin this series taking into account Joerg's comment. Does
>> anyone have additional comments or want to put forward some conceptual
>> issues with the current direction and with this implementation?
>>
>> As for the IRQ safety assessment, in a first step I would propose to
>> remove the IOMMU_CAP_INTR_REMAP from arm-smmus and consider the
>> assignment as unsafe. Any objection?
> 
> Well, yeah, because it's perfectly safe with GICv3.

Well except if you have an MSI controller in-between the device and the
sMMU (typically embedded in the host bridge). Detecting this situation
is not straightforward; hence my proposal.

Thanks

Eric
> 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 17/18] vfio/type1: Check MSI remapping at irq domain level

2017-01-06 Thread Auger Eric
Hi Bharat

On 06/01/2017 09:50, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: Friday, January 06, 2017 12:35 AM
>> To: eric.au...@redhat.com; eric.auger@gmail.com;
>> christoffer.d...@linaro.org; marc.zyng...@arm.com;
>> robin.mur...@arm.com; alex.william...@redhat.com;
>> will.dea...@arm.com; j...@8bytes.org; t...@linutronix.de;
>> ja...@lakedaemon.net; linux-arm-ker...@lists.infradead.org
>> Cc: k...@vger.kernel.org; drjo...@redhat.com; linux-
>> ker...@vger.kernel.org; pranav.sawargaon...@gmail.com;
>> iommu@lists.linux-foundation.org; punit.agra...@arm.com; Diana Madalina
>> Craciun ; gpkulka...@gmail.com;
>> shank...@codeaurora.org; Bharat Bhushan ;
>> geethasowjanya.ak...@gmail.com
>> Subject: [PATCH v6 17/18] vfio/type1: Check MSI remapping at irq domain
>> level
>>
>> In case the IOMMU translates MSI transactions (typical case on ARM), we
>> check MSI remapping capability at IRQ domain level. Otherwise it is checked
>> at IOMMU level.
>>
>> At this stage the arm-smmu-(v3) still advertise the
>> IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be removed
>> in subsequent patches.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v6: rewrite test
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 9 ++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>> b/drivers/vfio/vfio_iommu_type1.c index b473ef80..fa0b5c4 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -40,6 +40,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson
>> "
>> @@ -1208,7 +1209,7 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>  struct vfio_domain *domain, *d;
>>  struct bus_type *bus = NULL, *mdev_bus;
>>  int ret;
>> -bool resv_msi;
>> +bool resv_msi, msi_remap;
>>  phys_addr_t resv_msi_base;
>>
>>  mutex_lock(>lock);
>> @@ -1284,8 +1285,10 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>  INIT_LIST_HEAD(>group_list);
>>  list_add(>next, >group_list);
>>
>> -if (!allow_unsafe_interrupts &&
>> -!iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> +msi_remap = resv_msi ? irq_domain_check_msi_remap() :
> 
> There can be multiple interrupt-controller, at-least theoretically it is 
> possible and not sure practically it exists and supported, where not all may 
> support IRQ_REMAP. If that is the case be then should we check for IRQ-REMAP 
> for that device-bus irq-domain?
> 
I mentioned in the cover letter that the approach was defensive and
rough today. As soon as we detect an MSI controller in the platform that
has no support for MSI remapping we flag the assignment as unsafe. I
think this approach was agreed on the ML. Such rough assessment was used
in the past on x86.

I am reluctant to add more complexity at that stage. This can be
improved latter I think when such platforms show up.

Best Regards

Eric
> Thanks
> -Bharat
> 
>> +iommu_capable(bus,
>> IOMMU_CAP_INTR_REMAP);
>> +
>> +if (!allow_unsafe_interrupts && !msi_remap) {
>>  pr_warn("%s: No interrupt remapping support.  Use the
>> module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support
>> on this platform\n",
>> __func__);
>>  ret = -EPERM;
>> --
>> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 07/18] iommu: Implement reserved_regions iommu-group sysfs file

2017-01-06 Thread Auger Eric
Hi Joerg,

On 06/01/2017 12:00, Joerg Roedel wrote:
> On Thu, Jan 05, 2017 at 07:04:35PM +, Eric Auger wrote:
>> +list_for_each_entry_safe(region, next, _resv_regions, list) {
>> +str += sprintf(str, "0x%016llx 0x%016llx\n",
>> +   (long long int)region->start,
>> +   (long long int)(region->start +
>> +region->length - 1));
>> +kfree(region);
>> +}
> 
> I think it also makes sense to report the type of the reserved region.

What is the best practice in that case? Shall we put the type enum
values as strings such as:
- direct
- nomap
- msi

and document that in Documentation/ABI/testing/sysfs-kernel-iommu_groups

Thanks

Eric
> 
> 
> 
>   Joerg
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/18] iommu/vt-d: Implement reserved region get/put callbacks

2017-01-06 Thread Auger Eric
Hi Joerg,

On 06/01/2017 12:01, Joerg Roedel wrote:
> On Thu, Jan 05, 2017 at 07:04:36PM +, Eric Auger wrote:
>> +static void intel_iommu_get_resv_regions(struct device *device,
>> + struct list_head *head)
>> +{
>> +struct iommu_resv_region *reg;
>> +
>> +reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
>> +  IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
>> +  0, IOMMU_RESV_NOMAP);
>> +if (!reg)
>> +return;
>> +list_add_tail(>list, head);
>> +}
> 
> That is different from what AMD does, can you also report the RMRR
> regions for the device here (as direct-map regions)?

if I return RMRR regions as direct mapped regions,
iommu_group_create_direct_mappings will perform the 1-1 mapping.

I am not familiar with the intel-iommu code but I guess this job
currently is done in the intel driver:
iommu_prepare_rmrr_dev -> iommu_prepare_identity_map
->domain_prepare_identity_map -> iommu_domain_identity_map?

What is your feeling?

Thanks

Eric
> 
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/18] iommu/dma: Allow MSI-only cookies

2017-01-06 Thread Auger Eric


On 06/01/2017 11:59, Joerg Roedel wrote:
> On Thu, Jan 05, 2017 at 07:04:29PM +, Eric Auger wrote:
>>  struct iommu_dma_cookie {
>> -struct iova_domain  iovad;
>> -struct list_headmsi_page_list;
>> -spinlock_t  msi_lock;
>> +union {
>> +struct iova_domain  iovad;
>> +dma_addr_t  msi_iova;
>> +};
>> +struct list_headmsi_page_list;
>> +spinlock_t  msi_lock;
>> +enum iommu_dma_cookie_type  type;
> 
> Please move the type to the beginning of the struct and add a comment
> how the type relates to the union.

Sure

Thank you for the review.

Best regards

Eric
> 
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/18] iommu/vt-d: Implement reserved region get/put callbacks

2017-01-06 Thread Auger Eric
Hi Joerg,

On 06/01/2017 13:46, Joerg Roedel wrote:
> On Fri, Jan 06, 2017 at 12:45:54PM +0100, Auger Eric wrote:
>> On 06/01/2017 12:01, Joerg Roedel wrote:
>>> On Thu, Jan 05, 2017 at 07:04:36PM +, Eric Auger wrote:
> 
>>> That is different from what AMD does, can you also report the RMRR
>>> regions for the device here (as direct-map regions)?
>>
>> if I return RMRR regions as direct mapped regions,
>> iommu_group_create_direct_mappings will perform the 1-1 mapping.
> 
> No, this will not happen until the Intel IOMMU driver returns valid
> IOMMU_DOMAIN_DMA type domains.
Hum OK thanks!

Best Regards

Eric

> 
>> I am not familiar with the intel-iommu code but I guess this job
>> currently is done in the intel driver:
>> iommu_prepare_rmrr_dev -> iommu_prepare_identity_map
>> ->domain_prepare_identity_map -> iommu_domain_identity_map?
> 
> Right, this is done in the Intel driver atm.
> 
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 07/18] iommu: Implement reserved_regions iommu-group sysfs file

2017-01-06 Thread Auger Eric
Hi,

On 06/01/2017 13:48, Joerg Roedel wrote:
> On Fri, Jan 06, 2017 at 12:46:05PM +0100, Auger Eric wrote:
>> On 06/01/2017 12:00, Joerg Roedel wrote:
> 
>>> I think it also makes sense to report the type of the reserved region.
>>
>> What is the best practice in that case? Shall we put the type enum
>> values as strings such as:
>> - direct
>> - nomap
>> - msi
>>
>> and document that in Documentation/ABI/testing/sysfs-kernel-iommu_groups
> 
> Yes, a string would be good. An probably 'reserved' is a better name
> than nomap?

OK that's equal to me.

Thanks

Eric
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/18] iommu/dma: Allow MSI-only cookies

2017-01-06 Thread Auger Eric
Hi Robin,
On 06/01/2017 13:12, Robin Murphy wrote:
> On 06/01/17 11:46, Auger Eric wrote:
>>
>>
>> On 06/01/2017 11:59, Joerg Roedel wrote:
>>> On Thu, Jan 05, 2017 at 07:04:29PM +, Eric Auger wrote:
>>>>  struct iommu_dma_cookie {
>>>> -  struct iova_domain  iovad;
>>>> -  struct list_headmsi_page_list;
>>>> -  spinlock_t  msi_lock;
>>>> +  union {
>>>> +  struct iova_domain  iovad;
>>>> +  dma_addr_t  msi_iova;
>>>> +  };
>>>> +  struct list_headmsi_page_list;
>>>> +  spinlock_t  msi_lock;
>>>> +  enum iommu_dma_cookie_type  type;
>>>
>>> Please move the type to the beginning of the struct and add a comment
>>> how the type relates to the union.
>>
>> Sure
>>
>> Thank you for the review.
> 
> FWIW I already had a cleaned up version of this patch, I just hadn't
> mentioned it. I've pushed out an update with that change added too[1].
> 
> Robin.
> 
> [1]:http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/misc

Great, thanks!

Eric
> 
>>
>> Best regards
>>
>> Eric
>>>
>>>
>>>
>>> Joerg
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 07/18] iommu: Implement reserved_regions iommu-group sysfs file

2017-01-06 Thread Auger Eric
Hi Joerg, Robin,

On 06/01/2017 13:48, Joerg Roedel wrote:
> On Fri, Jan 06, 2017 at 12:46:05PM +0100, Auger Eric wrote:
>> On 06/01/2017 12:00, Joerg Roedel wrote:
> 
>>> I think it also makes sense to report the type of the reserved region.
>>
>> What is the best practice in that case? Shall we put the type enum
>> values as strings such as:
>> - direct
>> - nomap
>> - msi
>>
>> and document that in Documentation/ABI/testing/sysfs-kernel-iommu_groups
> 
> Yes, a string would be good. An probably 'reserved' is a better name
> than nomap?
the iommu_insert_resv_region() function that builds the group reserved
region list sorts all regions and handles the case where there is an
overlap between regions. Current code does not care about the type of
regions. So in case a NOMAP region overlaps with a direct-mapped region,
what is reported to the user space is the superset and the type depends
on the overlap. This was suggested by Robin at some point to handle
overlaps.

I guess I should merge regions only in case the types equal?

I remember that Alex thought that user-space should not care so much
about the type of the regions so I tought it was better for the
user-space to have a minimal view of the regions.

On the other hand, this issue of merging regions of different types
should not happen often but I prefer to highlight the potential issue.

What is your guidance?

Thanks

Eric
> 
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/17] irqdomain: irq_domain_check_msi_remap

2017-01-04 Thread Auger Eric
Hi Marc,

On 04/01/2017 14:46, Marc Zyngier wrote:
> Hi Eric,
> 
> On 04/01/17 13:32, Eric Auger wrote:
>> This new function checks whether all platform and PCI
>> MSI domains implement IRQ remapping. This is useful to
>> understand whether VFIO passthrough is safe with respect
>> to interrupts.
>>
>> On ARM typically an MSI controller can sit downstream
>> to the IOMMU without preventing VFIO passthrough.
>> As such any assigned device can write into the MSI doorbell.
>> In case the MSI controller implements IRQ remapping, assigned
>> devices will not be able to trigger interrupts towards the
>> host. On the contrary, the assignment must be emphasized as
>> unsafe with respect to interrupts.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v4 -> v5:
>> - Handle DOMAIN_BUS_FSL_MC_MSI domains
>> - Check parents
>> ---
>>  include/linux/irqdomain.h |  1 +
>>  kernel/irq/irqdomain.c| 41 +
>>  2 files changed, 42 insertions(+)
>>
>> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
>> index ab017b2..281a40f 100644
>> --- a/include/linux/irqdomain.h
>> +++ b/include/linux/irqdomain.h
>> @@ -219,6 +219,7 @@ struct irq_domain *irq_domain_add_legacy(struct 
>> device_node *of_node,
>>   void *host_data);
>>  extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec 
>> *fwspec,
>> enum irq_domain_bus_token 
>> bus_token);
>> +extern bool irq_domain_check_msi_remap(void);
>>  extern void irq_set_default_host(struct irq_domain *host);
>>  extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
>>irq_hw_number_t hwirq, int node,
>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>> index 8c0a0ae..700caea 100644
>> --- a/kernel/irq/irqdomain.c
>> +++ b/kernel/irq/irqdomain.c
>> @@ -278,6 +278,47 @@ struct irq_domain *irq_find_matching_fwspec(struct 
>> irq_fwspec *fwspec,
>>  EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);
>>  
>>  /**
>> + * irq_domain_is_msi_remap - Check if @domain or any parent
>> + * has MSI remapping support
>> + * @domain: domain pointer
>> + */
>> +static bool irq_domain_is_msi_remap(struct irq_domain *domain)
>> +{
>> +struct irq_domain *h = domain;
>> +
>> +for (; h; h = h->parent) {
>> +if (h->flags & IRQ_DOMAIN_FLAG_MSI_REMAP)
>> +return true;
>> +}
>> +return false;
>> +}
>> +
>> +/**
>> + * irq_domain_check_msi_remap() - Checks whether all MSI
>> + * irq domains implement IRQ remapping
>> + */
>> +bool irq_domain_check_msi_remap(void)
>> +{
>> +struct irq_domain *h;
>> +bool ret = true;
>> +
>> +mutex_lock(_domain_mutex);
>> +list_for_each_entry(h, _domain_list, link) {
>> +if (((h->bus_token & DOMAIN_BUS_PCI_MSI) ||
>> + (h->bus_token & DOMAIN_BUS_PLATFORM_MSI) ||
>> + (h->bus_token & DOMAIN_BUS_FSL_MC_MSI)) &&
>> + !irq_domain_is_msi_remap(h)) {
> 
> (h->bus_token & DOMAIN_BUS_PCI_MSI) and co looks quite wrong. bus_token
> is not a bitmap, and DOMAIN_BUS_* not a single bit value (see enum
> irq_domain_bus_token). Surely this should read
> (h->bus_token == DOMAIN_BUS_PCI_MSI).
Oh I did not notice that. Thanks.

Any other comments on the irqdomain side? Do you think the current
approach consisting in looking at those bus tokens and their parents
looks good?

Thanks

Eric
> 
> Thanks,
> 
>   M.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/17] irqdomain: irq_domain_check_msi_remap

2017-01-04 Thread Auger Eric
Hi Marc,

On 04/01/2017 16:27, Marc Zyngier wrote:
> On 04/01/17 14:11, Auger Eric wrote:
>> Hi Marc,
>>
>> On 04/01/2017 14:46, Marc Zyngier wrote:
>>> Hi Eric,
>>>
>>> On 04/01/17 13:32, Eric Auger wrote:
>>>> This new function checks whether all platform and PCI
>>>> MSI domains implement IRQ remapping. This is useful to
>>>> understand whether VFIO passthrough is safe with respect
>>>> to interrupts.
>>>>
>>>> On ARM typically an MSI controller can sit downstream
>>>> to the IOMMU without preventing VFIO passthrough.
>>>> As such any assigned device can write into the MSI doorbell.
>>>> In case the MSI controller implements IRQ remapping, assigned
>>>> devices will not be able to trigger interrupts towards the
>>>> host. On the contrary, the assignment must be emphasized as
>>>> unsafe with respect to interrupts.
>>>>
>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v4 -> v5:
>>>> - Handle DOMAIN_BUS_FSL_MC_MSI domains
>>>> - Check parents
>>>> ---
>>>>  include/linux/irqdomain.h |  1 +
>>>>  kernel/irq/irqdomain.c| 41 +
>>>>  2 files changed, 42 insertions(+)
>>>>
>>>> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
>>>> index ab017b2..281a40f 100644
>>>> --- a/include/linux/irqdomain.h
>>>> +++ b/include/linux/irqdomain.h
>>>> @@ -219,6 +219,7 @@ struct irq_domain *irq_domain_add_legacy(struct 
>>>> device_node *of_node,
>>>> void *host_data);
>>>>  extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec 
>>>> *fwspec,
>>>>   enum irq_domain_bus_token 
>>>> bus_token);
>>>> +extern bool irq_domain_check_msi_remap(void);
>>>>  extern void irq_set_default_host(struct irq_domain *host);
>>>>  extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
>>>>  irq_hw_number_t hwirq, int node,
>>>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>>>> index 8c0a0ae..700caea 100644
>>>> --- a/kernel/irq/irqdomain.c
>>>> +++ b/kernel/irq/irqdomain.c
>>>> @@ -278,6 +278,47 @@ struct irq_domain *irq_find_matching_fwspec(struct 
>>>> irq_fwspec *fwspec,
>>>>  EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);
>>>>  
>>>>  /**
>>>> + * irq_domain_is_msi_remap - Check if @domain or any parent
>>>> + * has MSI remapping support
>>>> + * @domain: domain pointer
>>>> + */
>>>> +static bool irq_domain_is_msi_remap(struct irq_domain *domain)
>>>> +{
>>>> +  struct irq_domain *h = domain;
>>>> +
>>>> +  for (; h; h = h->parent) {
>>>> +  if (h->flags & IRQ_DOMAIN_FLAG_MSI_REMAP)
>>>> +  return true;
>>>> +  }
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/**
>>>> + * irq_domain_check_msi_remap() - Checks whether all MSI
>>>> + * irq domains implement IRQ remapping
>>>> + */
>>>> +bool irq_domain_check_msi_remap(void)
>>>> +{
>>>> +  struct irq_domain *h;
>>>> +  bool ret = true;
>>>> +
>>>> +  mutex_lock(_domain_mutex);
>>>> +  list_for_each_entry(h, _domain_list, link) {
>>>> +  if (((h->bus_token & DOMAIN_BUS_PCI_MSI) ||
>>>> +   (h->bus_token & DOMAIN_BUS_PLATFORM_MSI) ||
>>>> +   (h->bus_token & DOMAIN_BUS_FSL_MC_MSI)) &&
>>>> +   !irq_domain_is_msi_remap(h)) {
>>>
>>> (h->bus_token & DOMAIN_BUS_PCI_MSI) and co looks quite wrong. bus_token
>>> is not a bitmap, and DOMAIN_BUS_* not a single bit value (see enum
>>> irq_domain_bus_token). Surely this should read
>>> (h->bus_token == DOMAIN_BUS_PCI_MSI).
>> Oh I did not notice that. Thanks.
>>
>> Any other comments on the irqdomain side? Do you think the current
>> approach consisting in looking at those bus tokens and their parents
>> looks good?
> 
> To be completely honest, I don't like it much, as having to enumerate
> all the bus types can come u

Re: [PATCH v5 13/17] irqdomain: irq_domain_check_msi_remap

2017-01-05 Thread Auger Eric
Hi Marc,

On 05/01/2017 12:57, Marc Zyngier wrote:
> On 05/01/17 11:29, Auger Eric wrote:
>> Hi Marc,
>>
>> On 05/01/2017 12:25, Marc Zyngier wrote:
>>> On 05/01/17 10:45, Auger Eric wrote:
>>>> Hi Marc,
>>>>
>>>> On 04/01/2017 16:27, Marc Zyngier wrote:
>>>>> On 04/01/17 14:11, Auger Eric wrote:
>>>>>> Hi Marc,
>>>>>>
>>>>>> On 04/01/2017 14:46, Marc Zyngier wrote:
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> On 04/01/17 13:32, Eric Auger wrote:
>>>>>>>> This new function checks whether all platform and PCI
>>>>>>>> MSI domains implement IRQ remapping. This is useful to
>>>>>>>> understand whether VFIO passthrough is safe with respect
>>>>>>>> to interrupts.
>>>>>>>>
>>>>>>>> On ARM typically an MSI controller can sit downstream
>>>>>>>> to the IOMMU without preventing VFIO passthrough.
>>>>>>>> As such any assigned device can write into the MSI doorbell.
>>>>>>>> In case the MSI controller implements IRQ remapping, assigned
>>>>>>>> devices will not be able to trigger interrupts towards the
>>>>>>>> host. On the contrary, the assignment must be emphasized as
>>>>>>>> unsafe with respect to interrupts.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> v4 -> v5:
>>>>>>>> - Handle DOMAIN_BUS_FSL_MC_MSI domains
>>>>>>>> - Check parents
>>>>>>>> ---
>>>>>>>>  include/linux/irqdomain.h |  1 +
>>>>>>>>  kernel/irq/irqdomain.c| 41 
>>>>>>>> +
>>>>>>>>  2 files changed, 42 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
>>>>>>>> index ab017b2..281a40f 100644
>>>>>>>> --- a/include/linux/irqdomain.h
>>>>>>>> +++ b/include/linux/irqdomain.h
>>>>>>>> @@ -219,6 +219,7 @@ struct irq_domain *irq_domain_add_legacy(struct 
>>>>>>>> device_node *of_node,
>>>>>>>> void *host_data);
>>>>>>>>  extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec 
>>>>>>>> *fwspec,
>>>>>>>>   enum 
>>>>>>>> irq_domain_bus_token bus_token);
>>>>>>>> +extern bool irq_domain_check_msi_remap(void);
>>>>>>>>  extern void irq_set_default_host(struct irq_domain *host);
>>>>>>>>  extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
>>>>>>>>  irq_hw_number_t hwirq, int node,
>>>>>>>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>>>>>>>> index 8c0a0ae..700caea 100644
>>>>>>>> --- a/kernel/irq/irqdomain.c
>>>>>>>> +++ b/kernel/irq/irqdomain.c
>>>>>>>> @@ -278,6 +278,47 @@ struct irq_domain 
>>>>>>>> *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
>>>>>>>>  EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);
>>>>>>>>  
>>>>>>>>  /**
>>>>>>>> + * irq_domain_is_msi_remap - Check if @domain or any parent
>>>>>>>> + * has MSI remapping support
>>>>>>>> + * @domain: domain pointer
>>>>>>>> + */
>>>>>>>> +static bool irq_domain_is_msi_remap(struct irq_domain *domain)
>>>>>>>> +{
>>>>>>>> +  struct irq_domain *h = domain;
>>>>>>>> +
>>>>>>>> +  for (; h; h = h->parent) {
>>>>>>>> +  if (h->flags & IRQ_DOMAIN_FLAG_MSI_REMAP)
>>>>>>>> +  return true;
>>>>>>>> +  }
>>>>>>>> +  return false;
>>>>>&g

Re: [PATCH v5 13/17] irqdomain: irq_domain_check_msi_remap

2017-01-05 Thread Auger Eric
Hi Marc,

On 05/01/2017 12:25, Marc Zyngier wrote:
> On 05/01/17 10:45, Auger Eric wrote:
>> Hi Marc,
>>
>> On 04/01/2017 16:27, Marc Zyngier wrote:
>>> On 04/01/17 14:11, Auger Eric wrote:
>>>> Hi Marc,
>>>>
>>>> On 04/01/2017 14:46, Marc Zyngier wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 04/01/17 13:32, Eric Auger wrote:
>>>>>> This new function checks whether all platform and PCI
>>>>>> MSI domains implement IRQ remapping. This is useful to
>>>>>> understand whether VFIO passthrough is safe with respect
>>>>>> to interrupts.
>>>>>>
>>>>>> On ARM typically an MSI controller can sit downstream
>>>>>> to the IOMMU without preventing VFIO passthrough.
>>>>>> As such any assigned device can write into the MSI doorbell.
>>>>>> In case the MSI controller implements IRQ remapping, assigned
>>>>>> devices will not be able to trigger interrupts towards the
>>>>>> host. On the contrary, the assignment must be emphasized as
>>>>>> unsafe with respect to interrupts.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v4 -> v5:
>>>>>> - Handle DOMAIN_BUS_FSL_MC_MSI domains
>>>>>> - Check parents
>>>>>> ---
>>>>>>  include/linux/irqdomain.h |  1 +
>>>>>>  kernel/irq/irqdomain.c| 41 +
>>>>>>  2 files changed, 42 insertions(+)
>>>>>>
>>>>>> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
>>>>>> index ab017b2..281a40f 100644
>>>>>> --- a/include/linux/irqdomain.h
>>>>>> +++ b/include/linux/irqdomain.h
>>>>>> @@ -219,6 +219,7 @@ struct irq_domain *irq_domain_add_legacy(struct 
>>>>>> device_node *of_node,
>>>>>>   void *host_data);
>>>>>>  extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec 
>>>>>> *fwspec,
>>>>>> enum 
>>>>>> irq_domain_bus_token bus_token);
>>>>>> +extern bool irq_domain_check_msi_remap(void);
>>>>>>  extern void irq_set_default_host(struct irq_domain *host);
>>>>>>  extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
>>>>>>irq_hw_number_t hwirq, int node,
>>>>>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>>>>>> index 8c0a0ae..700caea 100644
>>>>>> --- a/kernel/irq/irqdomain.c
>>>>>> +++ b/kernel/irq/irqdomain.c
>>>>>> @@ -278,6 +278,47 @@ struct irq_domain *irq_find_matching_fwspec(struct 
>>>>>> irq_fwspec *fwspec,
>>>>>>  EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);
>>>>>>  
>>>>>>  /**
>>>>>> + * irq_domain_is_msi_remap - Check if @domain or any parent
>>>>>> + * has MSI remapping support
>>>>>> + * @domain: domain pointer
>>>>>> + */
>>>>>> +static bool irq_domain_is_msi_remap(struct irq_domain *domain)
>>>>>> +{
>>>>>> +struct irq_domain *h = domain;
>>>>>> +
>>>>>> +for (; h; h = h->parent) {
>>>>>> +if (h->flags & IRQ_DOMAIN_FLAG_MSI_REMAP)
>>>>>> +return true;
>>>>>> +}
>>>>>> +return false;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * irq_domain_check_msi_remap() - Checks whether all MSI
>>>>>> + * irq domains implement IRQ remapping
>>>>>> + */
>>>>>> +bool irq_domain_check_msi_remap(void)
>>>>>> +{
>>>>>> +struct irq_domain *h;
>>>>>> +bool ret = true;
>>>>>> +
>>>>>> +mutex_lock(_domain_mutex);
>>>>>> +list_for_each_entry(h, _domain_list, link) {
>>>>>> +if (((h->bus_token & DOMAIN_BUS_PCI_MSI) ||
>>>>>

Re: [PATCH v5 13/17] irqdomain: irq_domain_check_msi_remap

2017-01-05 Thread Auger Eric
Hi Marc,

On 04/01/2017 16:27, Marc Zyngier wrote:
> On 04/01/17 14:11, Auger Eric wrote:
>> Hi Marc,
>>
>> On 04/01/2017 14:46, Marc Zyngier wrote:
>>> Hi Eric,
>>>
>>> On 04/01/17 13:32, Eric Auger wrote:
>>>> This new function checks whether all platform and PCI
>>>> MSI domains implement IRQ remapping. This is useful to
>>>> understand whether VFIO passthrough is safe with respect
>>>> to interrupts.
>>>>
>>>> On ARM typically an MSI controller can sit downstream
>>>> to the IOMMU without preventing VFIO passthrough.
>>>> As such any assigned device can write into the MSI doorbell.
>>>> In case the MSI controller implements IRQ remapping, assigned
>>>> devices will not be able to trigger interrupts towards the
>>>> host. On the contrary, the assignment must be emphasized as
>>>> unsafe with respect to interrupts.
>>>>
>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v4 -> v5:
>>>> - Handle DOMAIN_BUS_FSL_MC_MSI domains
>>>> - Check parents
>>>> ---
>>>>  include/linux/irqdomain.h |  1 +
>>>>  kernel/irq/irqdomain.c| 41 +
>>>>  2 files changed, 42 insertions(+)
>>>>
>>>> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
>>>> index ab017b2..281a40f 100644
>>>> --- a/include/linux/irqdomain.h
>>>> +++ b/include/linux/irqdomain.h
>>>> @@ -219,6 +219,7 @@ struct irq_domain *irq_domain_add_legacy(struct 
>>>> device_node *of_node,
>>>> void *host_data);
>>>>  extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec 
>>>> *fwspec,
>>>>   enum irq_domain_bus_token 
>>>> bus_token);
>>>> +extern bool irq_domain_check_msi_remap(void);
>>>>  extern void irq_set_default_host(struct irq_domain *host);
>>>>  extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
>>>>  irq_hw_number_t hwirq, int node,
>>>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>>>> index 8c0a0ae..700caea 100644
>>>> --- a/kernel/irq/irqdomain.c
>>>> +++ b/kernel/irq/irqdomain.c
>>>> @@ -278,6 +278,47 @@ struct irq_domain *irq_find_matching_fwspec(struct 
>>>> irq_fwspec *fwspec,
>>>>  EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);
>>>>  
>>>>  /**
>>>> + * irq_domain_is_msi_remap - Check if @domain or any parent
>>>> + * has MSI remapping support
>>>> + * @domain: domain pointer
>>>> + */
>>>> +static bool irq_domain_is_msi_remap(struct irq_domain *domain)
>>>> +{
>>>> +  struct irq_domain *h = domain;
>>>> +
>>>> +  for (; h; h = h->parent) {
>>>> +  if (h->flags & IRQ_DOMAIN_FLAG_MSI_REMAP)
>>>> +  return true;
>>>> +  }
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/**
>>>> + * irq_domain_check_msi_remap() - Checks whether all MSI
>>>> + * irq domains implement IRQ remapping
>>>> + */
>>>> +bool irq_domain_check_msi_remap(void)
>>>> +{
>>>> +  struct irq_domain *h;
>>>> +  bool ret = true;
>>>> +
>>>> +  mutex_lock(_domain_mutex);
>>>> +  list_for_each_entry(h, _domain_list, link) {
>>>> +  if (((h->bus_token & DOMAIN_BUS_PCI_MSI) ||
>>>> +   (h->bus_token & DOMAIN_BUS_PLATFORM_MSI) ||
>>>> +   (h->bus_token & DOMAIN_BUS_FSL_MC_MSI)) &&
>>>> +   !irq_domain_is_msi_remap(h)) {
>>>
>>> (h->bus_token & DOMAIN_BUS_PCI_MSI) and co looks quite wrong. bus_token
>>> is not a bitmap, and DOMAIN_BUS_* not a single bit value (see enum
>>> irq_domain_bus_token). Surely this should read
>>> (h->bus_token == DOMAIN_BUS_PCI_MSI).
>> Oh I did not notice that. Thanks.
>>
>> Any other comments on the irqdomain side? Do you think the current
>> approach consisting in looking at those bus tokens and their parents
>> looks good?
> 
> To be completely honest, I don't like it much, as having to enumerate
> all the bus types can come u

Re: [PATCH v6 07/18] iommu: Implement reserved_regions iommu-group sysfs file

2017-01-08 Thread Auger Eric
Hi,

On 06/01/2017 18:18, Auger Eric wrote:
> Hi Joerg, Robin,
> 
> On 06/01/2017 13:48, Joerg Roedel wrote:
>> On Fri, Jan 06, 2017 at 12:46:05PM +0100, Auger Eric wrote:
>>> On 06/01/2017 12:00, Joerg Roedel wrote:
>>
>>>> I think it also makes sense to report the type of the reserved region.
>>>
>>> What is the best practice in that case? Shall we put the type enum
>>> values as strings such as:
>>> - direct
>>> - nomap
>>> - msi
>>>
>>> and document that in Documentation/ABI/testing/sysfs-kernel-iommu_groups
>>
>> Yes, a string would be good. An probably 'reserved' is a better name
>> than nomap?
> the iommu_insert_resv_region() function that builds the group reserved
> region list sorts all regions and handles the case where there is an
> overlap between regions. Current code does not care about the type of
> regions. So in case a NOMAP region overlaps with a direct-mapped region,
> what is reported to the user space is the superset and the type depends
> on the overlap. This was suggested by Robin at some point to handle
> overlaps.
> 
> I guess I should merge regions only in case the types equal?
> 
> I remember that Alex thought that user-space should not care so much
> about the type of the regions so I tought it was better for the
> user-space to have a minimal view of the regions.
> 
> On the other hand, this issue of merging regions of different types
> should not happen often but I prefer to highlight the potential issue.

> 
> What is your guidance?

Please forget the question. From an API point of view It does not make
sense that iommu_insert_resv_region() merges regions of a different
types since the type field becomes unreliable. I will fix this.

Thanks

Eric
> 
> Thanks
> 
> Eric
>>
>>
>>  Joerg
>>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/17] irqdomain: irq_domain_check_msi_remap

2017-01-06 Thread Auger Eric
Hi Bharat,

On 06/01/2017 05:27, Bharat Bhushan wrote:
> Hi Mark,
> 
>> -Original Message-----
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: Thursday, January 05, 2017 5:39 PM
>> To: Marc Zyngier <marc.zyng...@arm.com>; eric.auger@gmail.com;
>> christoffer.d...@linaro.org; robin.mur...@arm.com;
>> alex.william...@redhat.com; will.dea...@arm.com; j...@8bytes.org;
>> t...@linutronix.de; ja...@lakedaemon.net; linux-arm-
>> ker...@lists.infradead.org
>> Cc: drjo...@redhat.com; k...@vger.kernel.org; punit.agra...@arm.com;
>> linux-ker...@vger.kernel.org; geethasowjanya.ak...@gmail.com; Diana
>> Madalina Craciun <diana.crac...@nxp.com>; iommu@lists.linux-
>> foundation.org; pranav.sawargaon...@gmail.com; Bharat Bhushan
>> <bharat.bhus...@nxp.com>; shank...@codeaurora.org;
>> gpkulka...@gmail.com
>> Subject: Re: [PATCH v5 13/17] irqdomain: irq_domain_check_msi_remap
>>
>> Hi Marc,
>>
>> On 05/01/2017 12:57, Marc Zyngier wrote:
>>> On 05/01/17 11:29, Auger Eric wrote:
>>>> Hi Marc,
>>>>
>>>> On 05/01/2017 12:25, Marc Zyngier wrote:
>>>>> On 05/01/17 10:45, Auger Eric wrote:
>>>>>> Hi Marc,
>>>>>>
>>>>>> On 04/01/2017 16:27, Marc Zyngier wrote:
>>>>>>> On 04/01/17 14:11, Auger Eric wrote:
>>>>>>>> Hi Marc,
>>>>>>>>
>>>>>>>> On 04/01/2017 14:46, Marc Zyngier wrote:
>>>>>>>>> Hi Eric,
>>>>>>>>>
>>>>>>>>> On 04/01/17 13:32, Eric Auger wrote:
>>>>>>>>>> This new function checks whether all platform and PCI MSI
>>>>>>>>>> domains implement IRQ remapping. This is useful to understand
>>>>>>>>>> whether VFIO passthrough is safe with respect to interrupts.
>>>>>>>>>>
>>>>>>>>>> On ARM typically an MSI controller can sit downstream to the
>>>>>>>>>> IOMMU without preventing VFIO passthrough.
>>>>>>>>>> As such any assigned device can write into the MSI doorbell.
>>>>>>>>>> In case the MSI controller implements IRQ remapping, assigned
>>>>>>>>>> devices will not be able to trigger interrupts towards the
>>>>>>>>>> host. On the contrary, the assignment must be emphasized as
>>>>>>>>>> unsafe with respect to interrupts.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com>
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> v4 -> v5:
>>>>>>>>>> - Handle DOMAIN_BUS_FSL_MC_MSI domains
>>>>>>>>>> - Check parents
>>>>>>>>>> ---
>>>>>>>>>>  include/linux/irqdomain.h |  1 +
>>>>>>>>>>  kernel/irq/irqdomain.c| 41
>> +
>>>>>>>>>>  2 files changed, 42 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/linux/irqdomain.h
>>>>>>>>>> b/include/linux/irqdomain.h index ab017b2..281a40f 100644
>>>>>>>>>> --- a/include/linux/irqdomain.h
>>>>>>>>>> +++ b/include/linux/irqdomain.h
>>>>>>>>>> @@ -219,6 +219,7 @@ struct irq_domain
>> *irq_domain_add_legacy(struct device_node *of_node,
>>>>>>>>>>   void *host_data);
>>>>>>>>>>  extern struct irq_domain *irq_find_matching_fwspec(struct
>> irq_fwspec *fwspec,
>>>>>>>>>> enum
>> irq_domain_bus_token bus_token);
>>>>>>>>>> +extern bool irq_domain_check_msi_remap(void);
>>>>>>>>>>  extern void irq_set_default_host(struct irq_domain *host);
>>>>>>>>>> extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
>>>>>>>>>>irq_hw_number_t hwirq, int node,
>> diff --git
>>>>>>>>>> a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index
>>>>>

Re: [RFC v4 15/16] vfio/type1: Check MSI remapping at irq domain level

2016-12-23 Thread Auger Eric
Hi Geetha,

On 23/12/2016 14:33, Geetha Akula wrote:
> Hi Eric,
> 
> Seeing same issue reported by Diana on ThunderX with you
> v4.9-reserved-v4 branch.
> Vfio passthough work fine when allow_unsafe_interrupts is set.
Thank you for testing! I will fix the security assessment by better
studying flag propagation in domain hierarchy.

Best Regards

Eric

> 
> 
> Thank you,
> Geetha.
> 
> On Thu, Dec 22, 2016 at 6:32 PM, Auger Eric <eric.au...@redhat.com
> <mailto:eric.au...@redhat.com>> wrote:
> 
> Hi Diana,
> 
> On 22/12/2016 13:41, Diana Madalina Craciun wrote:
> > Hi Eric,
> >
> > On 12/13/2016 10:32 PM, Eric Auger wrote:
> >> In case the IOMMU does not bypass MSI transactions (typical
> >> case on ARM), we check all MSI controllers are IRQ remapping
> >> capable. If not the IRQ assignment may be unsafe.
> >>
> >> At this stage the arm-smmu-(v3) still advertise the
> >> IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be
> >> removed in subsequent patches.
> >>
> >> Signed-off-by: Eric Auger <eric.au...@redhat.com
> <mailto:eric.au...@redhat.com>>
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 9 ++---
> >>  1 file changed, 6 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> >> index d07fe73..a05648b 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -37,6 +37,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>
> >>  #define DRIVER_VERSION  "0.2"
> >>  #define DRIVER_AUTHOR   "Alex Williamson
> <alex.william...@redhat.com <mailto:alex.william...@redhat.com>>"
> >> @@ -765,7 +766,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> >>  struct vfio_domain *domain, *d;
> >>  struct bus_type *bus = NULL;
> >>  int ret;
> >> -bool resv_msi;
> >> +bool resv_msi, msi_remap;
> >>  phys_addr_t resv_msi_base;
> >>
> >>  mutex_lock(>lock);
> >> @@ -818,8 +819,10 @@ static int
> vfio_iommu_type1_attach_group(void *iommu_data,
> >>  INIT_LIST_HEAD(>group_list);
> >>  list_add(>next, >group_list);
> >>
> >> -if (!allow_unsafe_interrupts &&
> >> -!iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
> >> +msi_remap = resv_msi ? irq_domain_check_msi_remap() :
> >> +   iommu_capable(bus, IOMMU_CAP_INTR_REMAP);
> >> +
> >> +if (!allow_unsafe_interrupts && !msi_remap) {
> >>  pr_warn("%s: No interrupt remapping support.  Use
> the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
> support on this platform\n",
> >> __func__);
> >>  ret = -EPERM;
> >
> > I tested your v4.9-reserved-v4 branch on a ITS capable hardware (NXP
> > LS2080), so I did not set allow_unsafe_interrupts. It fails here
> > complaining that the there is no interrupt remapping support. The
> > irq_domain_check_msi_remap function returns false as none of the
> checked
> > domains has the IRQ_DOMAIN_FLAG_MSI_REMAP flag set. I think the reason
> > is that the flags are not propagated through the domain hierarchy when
> > the domain is created.
> 
> Hum OK. Please apologize for the inconvenience, all the more so this is
> the second time you report the same issue for different cause :-( At the
> moment I can't test on a GICv3 ITS based system. I will try to fix that
> though.
> 
> I would like to get the confirmation introducing this flag is the right
> direction though.
> 
> Thanks
> 
> Eric
> >
> > Thanks,
> >
> > Diana
> >
> >
> >
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> <mailto:linux-arm-ker...@lists.infradead.org>
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 09/10] iommu/arm-smmu: Implement reserved region get/put callbacks

2016-12-07 Thread Auger Eric
Hi Robin,
On 06/12/2016 19:55, Robin Murphy wrote:
> On 15/11/16 13:09, Eric Auger wrote:
>> The get() populates the list with the PCI host bridge windows
>> and the MSI IOVA range.
>>
>> At the moment an arbitray MSI IOVA window is set at 0x800
>> of size 1MB. This will allow to report those info in iommu-group
>> sysfs?


First thank you for reviewing the series. This is definitively helpful!
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> RFC v2 -> v3:
>> - use existing get/put_resv_regions
>>
>> RFC v1 -> v2:
>> - use defines for MSI IOVA base and length
>> ---
>>  drivers/iommu/arm-smmu.c | 52 
>> 
>>  1 file changed, 52 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 8f72814..81f1a83 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
>>  
>>  #define FSYNR0_WNR  (1 << 4)
>>  
>> +#define MSI_IOVA_BASE   0x800
>> +#define MSI_IOVA_LENGTH 0x10
>> +
>>  static int force_stage;
>>  module_param(force_stage, int, S_IRUGO);
>>  MODULE_PARM_DESC(force_stage,
>> @@ -1545,6 +1548,53 @@ static int arm_smmu_of_xlate(struct device *dev, 
>> struct of_phandle_args *args)
>>  return iommu_fwspec_add_ids(dev, , 1);
>>  }
>>  
>> +static void arm_smmu_get_resv_regions(struct device *dev,
>> +  struct list_head *head)
>> +{
>> +struct iommu_resv_region *region;
>> +struct pci_host_bridge *bridge;
>> +struct resource_entry *window;
>> +
>> +/* MSI region */
>> +region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
>> + IOMMU_RESV_MSI);
>> +if (!region)
>> +return;
>> +
>> +list_add_tail(>list, head);
>> +
>> +if (!dev_is_pci(dev))
>> +return;
>> +
>> +bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
>> +
>> +resource_list_for_each_entry(window, >windows) {
>> +phys_addr_t start;
>> +size_t length;
>> +
>> +if (resource_type(window->res) != IORESOURCE_MEM &&
>> +resource_type(window->res) != IORESOURCE_IO)
> 
> As Joerg commented elsewhere, considering anything other than memory
> resources isn't right (I appreciate you've merely copied my own mistake
> here). We need some other way to handle root complexes where the CPU
> MMIO views of PCI windows appear in PCI memory space - using the I/O
> address of I/O resources only works by chance on Juno, and it still
> doesn't account for config space. I suggest we just leave that out for
> the time being to make life easier (does it even apply to anything other
> than Juno?) and figure it out later.
OK so I understand I should remove IORESOURCE_IO check.
> 
>> +continue;
>> +
>> +start = window->res->start - window->offset;
>> +length = window->res->end - window->res->start + 1;
>> +region = iommu_alloc_resv_region(start, length,
>> + IOMMU_RESV_NOMAP);
>> +if (!region)
>> +return;
>> +list_add_tail(>list, head);
>> +}
>> +}
> 
> Either way, there's nothing SMMU-specific about PCI windows. The fact
> that we'd have to copy-paste all of this into the SMMUv3 driver
> unchanged suggests it should go somewhere common (although I would be
> inclined to leave the insertion of the fake MSI region to driver-private
> wrappers). As I said before, the current iova_reserve_pci_windows()
> simply wants splitting into appropriate public callbacks for
> get_resv_regions and apply_resv_regions.
Do you mean somewhere common in the arm-smmu subsystem (new file) or in
another subsystem (pci?)

More generally the current implementation does not handle the case where
any of those PCIe host bridge window collide with the MSI window. To me
this is a flaw.
1) Either we take into account the PCIe windows and prevent any
collision when allocating the MSI window.
2) or we do not care about PCIe host bridge windows at kernel level.

If 1) we are back to the original issue of where do we put the MSI
window. Obviously at a place which might not be QEMU friendly anymore.
What allocation policy shall we use?

Second option - sorry I may look stubborn - which I definitively prefer
and which was also advocated by Alex, we handle PCI host bridge windows
at user level. MSI window is reported through the iommu group sysfs.
PCIe host bridge windows can be enumerated through /proc/iomem. Both x86
iommu and arm smmu would report an MSI reserved window. ARM MSI window
would become a de facto reserved window for guests.

Thoughts?

Eric
> 
> Robin.
> 
>> +static void arm_smmu_put_resv_regions(struct device *dev,
>> +  struct list_head *head)
>> +{
>> +struct iommu_resv_region 

Re: [RFC v3 05/10] iommu: Do not map reserved regions

2016-12-07 Thread Auger Eric
Hi Robin

On 06/12/2016 18:36, Robin Murphy wrote:
> On 15/11/16 13:09, Eric Auger wrote:
>> As we introduced IOMMU_RESV_NOMAP and IOMMU_RESV_MSI regions,
>> let's prevent those new regions from being mapped.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/iommu/iommu.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 6ee529f..a4530ad 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -343,6 +343,9 @@ static int iommu_group_create_direct_mappings(struct 
>> iommu_group *group,
>>  start = ALIGN(entry->start, pg_size);
>>  end   = ALIGN(entry->start + entry->length, pg_size);
>>  
>> +if (entry->prot & IOMMU_RESV_MASK)
> 
> This seems to be the only place that this mask is used, and frankly I
> think it's less clear than simply "(IOMMU_RESV_NOMAP | IOMMU_RESV_MSI)"
> would be, at which point we may as well drop the mask and special value
> trickery altogether. Plus, per my previous comment, if it were to be "if
> (entry->type != )" instead, that's about as obvious
> as it can get.
OK I will add this new type entry in the reserved window struct.

thanks

Eric
> 
> Robin.
> 
>> +continue;
>> +
>>  for (addr = start; addr < end; addr += pg_size) {
>>  phys_addr_t phys_addr;
>>  
>>
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v4 15/16] vfio/type1: Check MSI remapping at irq domain level

2016-12-22 Thread Auger Eric
Hi Diana,

On 22/12/2016 13:41, Diana Madalina Craciun wrote:
> Hi Eric,
> 
> On 12/13/2016 10:32 PM, Eric Auger wrote:
>> In case the IOMMU does not bypass MSI transactions (typical
>> case on ARM), we check all MSI controllers are IRQ remapping
>> capable. If not the IRQ assignment may be unsafe.
>>
>> At this stage the arm-smmu-(v3) still advertise the
>> IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be
>> removed in subsequent patches.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 9 ++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index d07fe73..a05648b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -37,6 +37,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -765,7 +766,7 @@ static int vfio_iommu_type1_attach_group(void 
>> *iommu_data,
>>  struct vfio_domain *domain, *d;
>>  struct bus_type *bus = NULL;
>>  int ret;
>> -bool resv_msi;
>> +bool resv_msi, msi_remap;
>>  phys_addr_t resv_msi_base;
>>  
>>  mutex_lock(>lock);
>> @@ -818,8 +819,10 @@ static int vfio_iommu_type1_attach_group(void 
>> *iommu_data,
>>  INIT_LIST_HEAD(>group_list);
>>  list_add(>next, >group_list);
>>  
>> -if (!allow_unsafe_interrupts &&
>> -!iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>> +msi_remap = resv_msi ? irq_domain_check_msi_remap() :
>> +   iommu_capable(bus, IOMMU_CAP_INTR_REMAP);
>> +
>> +if (!allow_unsafe_interrupts && !msi_remap) {
>>  pr_warn("%s: No interrupt remapping support.  Use the module 
>> param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this 
>> platform\n",
>> __func__);
>>  ret = -EPERM;
> 
> I tested your v4.9-reserved-v4 branch on a ITS capable hardware (NXP
> LS2080), so I did not set allow_unsafe_interrupts. It fails here
> complaining that the there is no interrupt remapping support. The
> irq_domain_check_msi_remap function returns false as none of the checked
> domains has the IRQ_DOMAIN_FLAG_MSI_REMAP flag set. I think the reason
> is that the flags are not propagated through the domain hierarchy when
> the domain is created.

Hum OK. Please apologize for the inconvenience, all the more so this is
the second time you report the same issue for different cause :-( At the
moment I can't test on a GICv3 ITS based system. I will try to fix that
though.

I would like to get the confirmation introducing this flag is the right
direction though.

Thanks

Eric
> 
> Thanks,
> 
> Diana
> 
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/dma: Don't reserve PCI I/O windows

2017-03-13 Thread Auger Eric
Hi,

On 09/03/2017 20:50, Robin Murphy wrote:
> Even if a host controller's CPU-side MMIO windows into PCI I/O space do
> happen to leak into PCI memory space such that it might treat them as
> peer addresses, trying to reserve the corresponding I/O space addresses
> doesn't do anything to help solve that problem. Stop doing a silly thing.
> 
> Fixes: fade1ec055dc ("iommu/dma: Avoid PCI host bridge windows")
> Signed-off-by: Robin Murphy 
Reviewed-by: Eric Auger 

Regards

Eric
> ---
>  drivers/iommu/dma-iommu.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 48d36ce59efb..1e0983488a8d 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -175,8 +175,7 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
>   unsigned long lo, hi;
>  
>   resource_list_for_each_entry(window, >windows) {
> - if (resource_type(window->res) != IORESOURCE_MEM &&
> - resource_type(window->res) != IORESOURCE_IO)
> + if (resource_type(window->res) != IORESOURCE_MEM)
>   continue;
>  
>   lo = iova_pfn(iovad, window->res->start - window->offset);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/dma: Handle IOMMU API reserved regions

2017-03-13 Thread Auger Eric
Hi Robin,

On 09/03/2017 20:50, Robin Murphy wrote:
> Now that it's simple to discover the necessary reservations for a given
> device/IOMMU combination, let's wire up the appropriate handling. Basic
> reserved regions and direct-mapped regions are obvious enough to handle;
> hardware MSI regions we can handle by pre-populating the appropriate
> msi_pages in the cookie. That way, irqchip drivers which normally assume
> MSIs to require mapping at the IOMMU can keep working without having
> to special-case their iommu_dma_map_msi_msg() hook, or indeed be aware
> at all of integration quirks preventing the IOMMU translating certain
> addresses.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/dma-iommu.c | 65 
> +++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 1e0983488a8d..1082ebf8a415 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -167,6 +167,69 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
>  }
>  EXPORT_SYMBOL(iommu_put_dma_cookie);
>  
> +static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
> + phys_addr_t start, phys_addr_t end)
> +{
> + struct iova_domain *iovad = >iovad;
> + struct iommu_dma_msi_page *msi_page;
> + int i, num_pages;
> +
> + start &= ~iova_mask(iovad);
> + end = iova_align(iovad, end);
Is it always safe if second argument is a phys_addr_t?
> + num_pages = (end - start) >> iova_shift(iovad);
> +
> + msi_page = kcalloc(num_pages, sizeof(*msi_page), GFP_KERNEL);
> + if (!msi_page)
> + return -ENOMEM;
> +
> + for (i = 0; i < num_pages; i++) {
> + msi_page[i].phys = start;
> + msi_page[i].iova = start;
> + INIT_LIST_HEAD(_page[i].list);
> + list_add(_page[i].list, >msi_page_list);
> + start += iovad->granule;
> + }
> +
> + return 0;
> +}
> +
> +static int iova_reserve_iommu_regions(struct device *dev,
> + struct iommu_domain *domain)
> +{
> + struct iommu_dma_cookie *cookie = domain->iova_cookie;
> + struct iova_domain *iovad = >iovad;
> + struct iommu_resv_region *region;
> + struct list_head resv_regions;
> + unsigned long lo, hi;
> + int ret = 0;
> +
> + INIT_LIST_HEAD(_regions);
> + iommu_get_resv_regions(dev, _regions);
> + list_for_each_entry(region, _regions, list) {
> + /* We ARE the software that manages these! */
> + if (region->type & IOMMU_RESV_SW_MSI)
> + continue;
> +
> + lo = iova_pfn(iovad, region->start);
> + hi = iova_pfn(iovad, region->start + region->length);
> + reserve_iova(iovad, lo, hi);
> +
> + if (region->type & IOMMU_RESV_DIRECT) {
> + ret = iommu_map(domain, region->start, region->start,
> + region->length, region->prot);

in iommu.c, iommu_group_create_direct_mappings also iommu_map() direct
regions in some cases. Just to make sure cases don't overlap here.


> + } else if (region->type & IOMMU_RESV_MSI) {
> + ret = cookie_init_hw_msi_region(cookie, region->start,
> + region->start + region->length);
> + }
> +
> + if (ret)
> + break;
> + }
> + iommu_put_resv_regions(dev, _regions);
> +
> + return ret;
> +}
> +
>  static void iova_reserve_pci_windows(struct pci_dev *dev,
>   struct iova_domain *iovad)
>  {
> @@ -251,6 +314,8 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
> dma_addr_t base,
>   init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
>   if (pci)
>   iova_reserve_pci_windows(to_pci_dev(dev), iovad);
> + if (dev)
> + iova_reserve_iommu_regions(dev, domain);
Don't you want to escalate the returned value?

Besides
Reviewed-by: Eric Auger 

Thanks

Eric
>   }
>   return 0;
>  }
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: Disambiguate MSI region types

2017-03-13 Thread Auger Eric
Hi Robin,

On 09/03/2017 20:50, Robin Murphy wrote:
> Whilst it doesn't matter much to VFIO at the moment, when parsing
> reserved regions on the host side we really needs to be able to tell
s/needs/need
> the difference between the software-reserved region used to map MSIs
> translated by an IOMMU, and hardware regions for which the write might
> never even reach the IOMMU. In particular, ARM systems assume the former
> topology, but may need to cope with the latter as well, which will
> require rather different handling in the iommu-dma layer.
> 
> For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
> IOMMU_RESV_MSI to describe the hardware type, and document everything a
> little bit. Since the x86 MSI remapping hardware falls squarely under
> this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
> so that we tell a consistent story to userspace across platforms (and
> have future consistency if those drivers start migrating to iommu-dma).
> 
> Fixes: d30ddcaa7b02 ("iommu: Add a new type field in iommu_resv_region")
does it really fall under the category of fix here?
> CC: Eric Auger 
> CC: Alex Williamson 
> CC: David Woodhouse 
> CC: k...@vger.kernel.org
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/amd_iommu.c   | 2 +-
>  drivers/iommu/arm-smmu-v3.c | 2 +-
>  drivers/iommu/arm-smmu.c| 2 +-
>  drivers/iommu/intel-iommu.c | 2 +-
>  drivers/iommu/iommu.c   | 1 +
>  drivers/vfio/vfio_iommu_type1.c | 2 +-
>  include/linux/iommu.h   | 5 +
>  7 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 98940d1392cb..b17536d6e69b 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -3202,7 +3202,7 @@ static void amd_iommu_get_resv_regions(struct device 
> *dev,
>  
>   region = iommu_alloc_resv_region(MSI_RANGE_START,
>MSI_RANGE_END - MSI_RANGE_START + 1,
> -  0, IOMMU_RESV_RESERVED);
> +  0, IOMMU_RESV_MSI);
>   if (!region)
>   return;
>   list_add_tail(>list, head);
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 5806a6acc94e..591bb96047c9 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1888,7 +1888,7 @@ static void arm_smmu_get_resv_regions(struct device 
> *dev,
>   int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>  
>   region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
> -  prot, IOMMU_RESV_MSI);
> +  prot, IOMMU_RESV_SW_MSI);
>   if (!region)
>   return;
>  
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index abf6496843a6..b493c99e17f7 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1608,7 +1608,7 @@ static void arm_smmu_get_resv_regions(struct device 
> *dev,
>   int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>  
>   region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
> -  prot, IOMMU_RESV_MSI);
> +  prot, IOMMU_RESV_SW_MSI);
>   if (!region)
>   return;
>  
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 238ad3447712..f1611fd6f5b0 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5249,7 +5249,7 @@ static void intel_iommu_get_resv_regions(struct device 
> *device,
>  
>   reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
> IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
> -   0, IOMMU_RESV_RESERVED);
> +   0, IOMMU_RESV_MSI);
>   if (!reg)
>   return;
>   list_add_tail(>list, head);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8ea14f41a979..7dbc05f10d5a 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -72,6 +72,7 @@ static const char * const iommu_group_resv_type_string[] = {
>   [IOMMU_RESV_DIRECT] = "direct",
>   [IOMMU_RESV_RESERVED]   = "reserved",
>   [IOMMU_RESV_MSI]= "msi",
> + [IOMMU_RESV_SW_MSI] = "msi",
>  };
>  
>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)\
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c26fa1f3ed86..e32abdebd2df 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1192,7 +1192,7 @@ static bool vfio_iommu_has_resv_msi(struct iommu_group 
> *group,
Maybe we should change the name of the function into
vfio_iommu_has_resv_sw_msi?

Besides
Reviewed-by: 

Re: [PATCH 1/3] iommu: Disambiguate MSI region types

2017-03-13 Thread Auger Eric
Hi Robin,

On 13/03/2017 15:24, Robin Murphy wrote:
> On 13/03/17 13:08, Auger Eric wrote:
>> Hi Robin,
>>
>> On 09/03/2017 20:50, Robin Murphy wrote:
>>> Whilst it doesn't matter much to VFIO at the moment, when parsing
>>> reserved regions on the host side we really needs to be able to tell
>> s/needs/need
> 
> Oops!
> 
>>> the difference between the software-reserved region used to map MSIs
>>> translated by an IOMMU, and hardware regions for which the write might
>>> never even reach the IOMMU. In particular, ARM systems assume the former
>>> topology, but may need to cope with the latter as well, which will
>>> require rather different handling in the iommu-dma layer.
>>>
>>> For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
>>> IOMMU_RESV_MSI to describe the hardware type, and document everything a
>>> little bit. Since the x86 MSI remapping hardware falls squarely under
>>> this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
>>> so that we tell a consistent story to userspace across platforms (and
>>> have future consistency if those drivers start migrating to iommu-dma).
>>>
>>> Fixes: d30ddcaa7b02 ("iommu: Add a new type field in iommu_resv_region")
>> does it really fall under the category of fix here?
> 
> I was somewhat on the fence about that, as the rationale above does tend
> towards future new functionality, but the primary effect of this patch
> alone is an ABI-visible change for x86 in terms of "expose the MSI
> region as an MSI region". IMO it would be better to get that in before
> said ABI gets baked into a kernel release, hence leaning towards the
> "fix" side of things. I'm happy to rewrite the commit message in reverse
> order (i.e. "clean up this ABI inconsistency, with these additional
> benefits") if it would be clearer.
OK no worries. I understand what you meant.
> 
>>> CC: Eric Auger <eric.au...@redhat.com>
>>> CC: Alex Williamson <alex.william...@redhat.com>
>>> CC: David Woodhouse <dw...@infradead.org>
>>> CC: k...@vger.kernel.org
>>> Signed-off-by: Robin Murphy <robin.mur...@arm.com>
>>> ---
>>>  drivers/iommu/amd_iommu.c   | 2 +-
>>>  drivers/iommu/arm-smmu-v3.c | 2 +-
>>>  drivers/iommu/arm-smmu.c| 2 +-
>>>  drivers/iommu/intel-iommu.c | 2 +-
>>>  drivers/iommu/iommu.c   | 1 +
>>>  drivers/vfio/vfio_iommu_type1.c | 2 +-
>>>  include/linux/iommu.h   | 5 +
>>>  7 files changed, 11 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>>> index 98940d1392cb..b17536d6e69b 100644
>>> --- a/drivers/iommu/amd_iommu.c
>>> +++ b/drivers/iommu/amd_iommu.c
>>> @@ -3202,7 +3202,7 @@ static void amd_iommu_get_resv_regions(struct device 
>>> *dev,
>>>  
>>> region = iommu_alloc_resv_region(MSI_RANGE_START,
>>>  MSI_RANGE_END - MSI_RANGE_START + 1,
>>> -0, IOMMU_RESV_RESERVED);
>>> +0, IOMMU_RESV_MSI);
>>> if (!region)
>>> return;
>>> list_add_tail(>list, head);
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 5806a6acc94e..591bb96047c9 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -1888,7 +1888,7 @@ static void arm_smmu_get_resv_regions(struct device 
>>> *dev,
>>> int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>>>  
>>> region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
>>> -prot, IOMMU_RESV_MSI);
>>> +prot, IOMMU_RESV_SW_MSI);
>>> if (!region)
>>> return;
>>>  
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index abf6496843a6..b493c99e17f7 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -1608,7 +1608,7 @@ static void arm_smmu_get_resv_regions(struct device 
>>> *dev,
>>> int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>>>  
>>> region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
>>> -prot, IOMMU_RESV_MSI);
>>> +prot

  1   2   3   4   5   6   7   >