Re: Support SVM without PASID

2017-08-07 Thread Bob Liu
On 2017/8/7 20:52, Jean-Philippe Brucker wrote:
> Hi Bob,
> 
> On 07/08/17 13:18, Bob Liu wrote:
>> On 2017/8/7 18:31, Jean-Philippe Brucker wrote:
>>> On 05/08/17 06:14, valmiki wrote:
>>> [...]
 Hi Jean, Thanks a lot, now i understood the flow. From vfio kernel
 documentation we fill vaddr and iova in struct vfio_iommu_type1_dma_map
 and pass them to VFIO. But if we use dynamic allocation in application
 (say malloc), do we need to use dma API to get iova and then call
 VFIO_IOMMU_MAP ioctl ?
 If application needs multiple such dynamic allocations, then it need to
 allocate large chunk and program it via VFIO_IOMMU_MAP ioctl and then
 manage rest allocations requirements from this buffer ?
>>>
>>> Yes, without SVM, the application allocates large buffers, allocates IOVAs
>>> itself, and maps them with VFIO_IOMMU_MAP. Userspace doesn't rely on the
>>> DMA API at all, it manages IOVAs as it wants. Sizes passed to
>>> VFIO_IOMMU_MAP have to be multiples of the MMU or IOMMU page granularity
>>> (that is at least 4kB), and both iova and vaddr have to be aligned on that
>>> granularity as well. So malloc isn't really suitable in this case, you'll
>>> need mmap. The application can then implement a small allocator to manage
>>> the DMA pool created with VFIO_IOMMU_MAP.
>>>
>>> With SVM the application binds its address space to the device, and then
>>> uses malloc for all DMA buffers, no need for VFIO_IOMMU_MAP.
>>>
>>
>> Hi Jean,
>>
>> I think there is another way to support SVM without PASID.
>>
>> Suppose there is a device in the same SOC-chip, the device access memory 
>> through SMMU(using internal bus instead of PCIe)
>> Once page fault, the device send an event with (vaddr, substreamID) to SMMU, 
>> then SMMU triggers an event interrupt.
>>
>> In the event interrupt handler, we can implement the same logic as PRI 
>> interrupt in your patch.
>> What do you think about that?
> What you're describing is the SMMU stall model for platform devices. From
> the driver perspective it's the same as PRI and PASID (SubstreamID=PASID).
> 

Indeed!

> When a stall-capable device accesses unmapped memory, the SMMU parks the
> transaction and sends an event marked "stall" on the event queue, with a
> stall tag (STAG, roughly equivalent to PRG Index). The OS handles the
> fault and sends a CMD_RESUME command with the status and the STAG. Then
> the SMMU completes the access or terminates it.
> 
> In a prototype I have, the stall implementation reuses most of the

Glad to hear that.
Would you mind to share me the prototype patch?

> PASID/PRI code. The main difficulty is defining SSID and stall capability
> in firmware, as there are no standard capability probing for platform
> devices. Stall-capable devices must be able to wait an indefinite amount
> of time that their DMA transactions returns, therefore the stall model
> cannot work with PCI, only some integrated devices.
> 

I happen to have a board with such devices and like to do the test.
Will re-post a full version patch upstream once completing the verification.

Thanks,
Bob Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/3] iommu: prevent VMD child devices from being remapping targets

2017-08-07 Thread Jon Derrick
VMD child devices must use the VMD endpoint's ID as the DMA source.
Because of this, there needs to be a way to link the parent VMD
endpoint's DMAR domain to the VMD child devices' DMAR domain such that
attaching and detaching child devices modify the endpoint's DMAR mapping
and prevents early detaching.

This is outside the scope of VMD, so disable binding child devices to
prevent unforeseen issues. This functionality may be implemented in the
future.

This patch prevents VMD child devices from returning an IOMMU, which
prevents it from exposing iommu_group sysfs directories and subsequent
binding by userspace-access drivers such as VFIO.

Signed-off-by: Jon Derrick 
---
 drivers/iommu/intel-iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 687f18f..651a6cd 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -905,6 +905,11 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
 * the PF instead to find the IOMMU. */
pf_pdev = pci_physfn(pdev);
dev = _pdev->dev;
+
+   /* VMD child devices currently cannot be handled individually */
+   if (pci_bus_is_vmd(pdev->bus))
+   return NULL;
+
segment = pci_domain_nr(pdev->bus);
} else if (has_acpi_companion(dev))
dev = _COMPANION(dev)->dev;
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] MAINTAINERS: Add Jonathan Derrick as VMD maintainer

2017-08-07 Thread Jon Derrick
Add myself as VMD maintainer

Signed-off-by: Jon Derrick 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f66488d..3ec39df 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10090,6 +10090,7 @@ F:  drivers/pci/dwc/*imx6*
 
 PCI DRIVER FOR INTEL VOLUME MANAGEMENT DEVICE (VMD)
 M: Keith Busch 
+M: Jonathan Derrick 
 L: linux-...@vger.kernel.org
 S: Supported
 F: drivers/pci/host/vmd.c
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] pci: Generalize is_vmd behavior

2017-08-07 Thread Jon Derrick
Generalize is_vmd behavior to remove dependency on domain number
checking in pci quirks.

Signed-off-by: Jon Derrick 
---
 arch/x86/include/asm/pci.h | 8 +++-
 arch/x86/pci/common.c  | 2 +-
 drivers/pci/quirks.c   | 2 +-
 include/linux/pci.h| 4 
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 473a729..5c5d54a 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -60,16 +60,14 @@ static inline void *_pci_root_bus_fwnode(struct pci_bus 
*bus)
 #define pci_root_bus_fwnode_pci_root_bus_fwnode
 #endif
 
-static inline bool is_vmd(struct pci_bus *bus)
-{
 #if IS_ENABLED(CONFIG_VMD)
+static inline bool pci_bus_is_vmd(struct pci_bus *bus)
+{
struct pci_sysdata *sd = bus->sysdata;
 
return sd->vmd_domain;
-#else
-   return false;
-#endif
 }
+#endif
 
 /* Can be used to override the logic in pci_scan_bus for skipping
already-configured bus numbers - to be used for buggy BIOSes
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index dbe2132..18b2277 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -662,7 +662,7 @@ static void set_dma_domain_ops(struct pci_dev *pdev) {}
 
 static void set_dev_domain_options(struct pci_dev *pdev)
 {
-   if (is_vmd(pdev->bus))
+   if (pci_bus_is_vmd(pdev->bus))
pdev->hotplug_user_indicators = 1;
 }
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..ba47995 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4666,7 +4666,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, 
quirk_intel_qat_vf_cap);
 static void quirk_no_aersid(struct pci_dev *pdev)
 {
/* VMD Domain */
-   if (pdev->bus->sysdata && pci_domain_nr(pdev->bus) >= 0x1)
+   if (pci_bus_is_vmd(pdev->bus))
pdev->bus->bus_flags |= PCI_BUS_FLAGS_NO_AERSID;
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2030, quirk_no_aersid);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..0299d8b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1471,6 +1471,10 @@ static inline int pci_proc_domain(struct pci_bus *bus) { 
return 0; }
 static inline int pci_get_new_domain_nr(void) { return -ENOSYS; }
 #endif /* CONFIG_PCI_DOMAINS */
 
+#if !IS_ENABLED(CONFIG_VMD)
+static inline bool pci_bus_is_vmd(struct pci_bus *bus) { return false; }
+#endif
+
 /*
  * Generic implementation for PCI domain support. If your
  * architecture does not need custom management of PCI
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[CFP] VFIO/IOMMU/PCI microconference at LPC 2017

2017-08-07 Thread Lorenzo Pieralisi
Hi,

following the official LPC17 VFIO/IOMMU/PCI uconf acceptance notification:

https://www.linuxplumbersconf.org/2017/vfioiommupci-microconference-accepted-into-linux-plumbers-conference/

I am sending out a call for sessions proposals open to all developers
interested/involved in Linux kernel VFIO/IOMMU/PCI development.

The LPC17 uconf wiki provides a list of topics that we put forward for
the microconference submission:

http://wiki.linuxplumbersconf.org/2017:vfio_iommu_pci

The wiki is there to provide a list of topics that we considered key
but it should not be considered final, actually it is a starting point
to define a possible schedule structure.

Session proposals for the LPC17 VFIO/IOMMU/PCI microconference are warmly
encouraged and can be submitted here:

https://linuxplumbersconf.org/2017/ocw/events/LPC2017/tracks/636

Anyone involved in VFIO/IOMMU/PCI kernel development, if you wish to add
sessions and attend the microconference consider yourself welcome, for any
questions just reply to this thread or drop me a line.

Looking forward to meeting you all in Los Angeles !

Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/2] ACPI/IORT: Add ITS address regions reservation helper

2017-08-07 Thread Lorenzo Pieralisi
On Mon, Aug 07, 2017 at 08:21:40AM +, Shameerali Kolothum Thodi wrote:
> 
> 
> > -Original Message-
> > From: Robin Murphy [mailto:robin.mur...@arm.com]
> > Sent: Friday, August 04, 2017 5:57 PM
> > To: Shameerali Kolothum Thodi; lorenzo.pieral...@arm.com;
> > marc.zyng...@arm.com; sudeep.ho...@arm.com; will.dea...@arm.com;
> > hanjun@linaro.org
> > Cc: Gabriele Paoloni; John Garry; iommu@lists.linux-foundation.org; linux-
> > arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> > de...@acpica.org; Linuxarm; Wangzhou (B); Guohanjun (Hanjun Guo)
> > Subject: Re: [PATCH v5 1/2] ACPI/IORT: Add ITS address regions reservation
> > helper
> > 
> > On 01/08/17 11:49, Shameer Kolothum wrote:
> > > On some platforms ITS address regions have to be excluded from normal
> > > IOVA allocation in that they are detected and decoded in a HW specific
> > > way by system components and so they cannot be considered normal
> > IOVA
> > > address space.
> > >
> > > Add an helper function that retrieves ITS address regions through IORT
> > > device <-> ITS mappings and reserves it so that these regions will not
> > > be translated by IOMMU and will be excluded from IOVA allocations.
> > 
> > I've just realised that we no longer seem to have a check that ensures
> > the regions are only reserved on platforms that need it - if not, then
> > we're going to break everything else that does have an ITS behind SMMU
> > translation as expected.
> 
> Right. I had this doubt, but then my thinking was that we will have the SW_MSI
> regions for those and will end up  using that. But that doesn’t seems
> to be the case now.
> 
> > It feels like IORT should know enough to be able to make that decision
> > internally, but if not (or if it would be hideous to do so), then I
> > guess my idea for patch #2 was a bust and we probably do need to go back
> > to calling directly from the SMMU driver based on the SMMU model.
> 
> It might be possible to do that check inside iort code, but then we have to 
> find
> the  smmu node first and check the model number. I think it will be more
> cleaner if SMMU driver makes that check and call the
> iommu_dma_get_msi_resv_regions().

+1 on this one - we can do it in IORT but I think it is more logical
to have a flag in the SMMU driver (keeping the DT/ACPI fwnode switch in
generic IOMMU layer though).

Side note I: AFAICS iommu_dma_get_resv_regions() is only used on ARM, if
it is not patch 2 would break miserably on arches that do not select
IORT - you should rework the return code on !CONFIG_ACPI_IORT configs.

Side note II(nit): why not call the function iort_iommu_get_resv_regions() ?

Lorenzo

> If you are Ok with that I will quickly rework and send out the next version.
> 
> Thanks,
> Shameer
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Support SVM without PASID

2017-08-07 Thread Jean-Philippe Brucker
Hi Bob,

On 07/08/17 13:18, Bob Liu wrote:
> On 2017/8/7 18:31, Jean-Philippe Brucker wrote:
>> On 05/08/17 06:14, valmiki wrote:
>> [...]
>>> Hi Jean, Thanks a lot, now i understood the flow. From vfio kernel
>>> documentation we fill vaddr and iova in struct vfio_iommu_type1_dma_map
>>> and pass them to VFIO. But if we use dynamic allocation in application
>>> (say malloc), do we need to use dma API to get iova and then call
>>> VFIO_IOMMU_MAP ioctl ?
>>> If application needs multiple such dynamic allocations, then it need to
>>> allocate large chunk and program it via VFIO_IOMMU_MAP ioctl and then
>>> manage rest allocations requirements from this buffer ?
>>
>> Yes, without SVM, the application allocates large buffers, allocates IOVAs
>> itself, and maps them with VFIO_IOMMU_MAP. Userspace doesn't rely on the
>> DMA API at all, it manages IOVAs as it wants. Sizes passed to
>> VFIO_IOMMU_MAP have to be multiples of the MMU or IOMMU page granularity
>> (that is at least 4kB), and both iova and vaddr have to be aligned on that
>> granularity as well. So malloc isn't really suitable in this case, you'll
>> need mmap. The application can then implement a small allocator to manage
>> the DMA pool created with VFIO_IOMMU_MAP.
>>
>> With SVM the application binds its address space to the device, and then
>> uses malloc for all DMA buffers, no need for VFIO_IOMMU_MAP.
>>
> 
> Hi Jean,
> 
> I think there is another way to support SVM without PASID.
> 
> Suppose there is a device in the same SOC-chip, the device access memory 
> through SMMU(using internal bus instead of PCIe)
> Once page fault, the device send an event with (vaddr, substreamID) to SMMU, 
> then SMMU triggers an event interrupt.
> 
> In the event interrupt handler, we can implement the same logic as PRI 
> interrupt in your patch.
> What do you think about that?
What you're describing is the SMMU stall model for platform devices. From
the driver perspective it's the same as PRI and PASID (SubstreamID=PASID).

When a stall-capable device accesses unmapped memory, the SMMU parks the
transaction and sends an event marked "stall" on the event queue, with a
stall tag (STAG, roughly equivalent to PRG Index). The OS handles the
fault and sends a CMD_RESUME command with the status and the STAG. Then
the SMMU completes the access or terminates it.

In a prototype I have, the stall implementation reuses most of the
PASID/PRI code. The main difficulty is defining SSID and stall capability
in firmware, as there are no standard capability probing for platform
devices. Stall-capable devices must be able to wait an indefinite amount
of time that their DMA transactions returns, therefore the stall model
cannot work with PCI, only some integrated devices.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-08-07 Thread Rob Clark
On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
 wrote:
> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark  wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R  
>> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
 Hi Stephen,


 On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
>> *domain, unsigned long iova,
>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned 
>> long iova,
>>size_t size)
>>   {
>> -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> +size_t ret;
>> if (!ops)
>>   return 0;
>>   -return ops->unmap(ops, iova, size);
>> +pm_runtime_get_sync(smmu_domain->smmu->dev);
> Can these map/unmap ops be called from an atomic context? I seem
> to recall that being a problem before.

 That's something which was dropped in the following patch merged in master:
 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

 Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).
>
> I would like to understand whether there is a situation where an unmap is
> called in atomic context without an enabled master?
>
> Let's say we have the case where all the unmap calls in atomic context happen
> only from the master's context (in which case the device link should
> take care of
> the pm state of smmu), and the only unmap that happen in non-atomic context
> is the one with master disabled. In such a case doesn it make sense to
> distinguish
> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
> for the non-atomic context since that would be the one with master disabled.
>

At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
won't unmap anything in atomic ctx (but it can unmap w/ master
disabled).  I can't really comment about other non-gpu drivers.  It
seems like a reasonable constraint that either master is enabled or
not in atomic ctx.

Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
like to drop that to avoid powering up the gpu.

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Support SVM without PASID

2017-08-07 Thread Bob Liu
On 2017/8/7 18:31, Jean-Philippe Brucker wrote:
> On 05/08/17 06:14, valmiki wrote:
> [...]
>> Hi Jean, Thanks a lot, now i understood the flow. From vfio kernel
>> documentation we fill vaddr and iova in struct vfio_iommu_type1_dma_map
>> and pass them to VFIO. But if we use dynamic allocation in application
>> (say malloc), do we need to use dma API to get iova and then call
>> VFIO_IOMMU_MAP ioctl ?
>> If application needs multiple such dynamic allocations, then it need to
>> allocate large chunk and program it via VFIO_IOMMU_MAP ioctl and then
>> manage rest allocations requirements from this buffer ?
> 
> Yes, without SVM, the application allocates large buffers, allocates IOVAs
> itself, and maps them with VFIO_IOMMU_MAP. Userspace doesn't rely on the
> DMA API at all, it manages IOVAs as it wants. Sizes passed to
> VFIO_IOMMU_MAP have to be multiples of the MMU or IOMMU page granularity
> (that is at least 4kB), and both iova and vaddr have to be aligned on that
> granularity as well. So malloc isn't really suitable in this case, you'll
> need mmap. The application can then implement a small allocator to manage
> the DMA pool created with VFIO_IOMMU_MAP.
> 
> With SVM the application binds its address space to the device, and then
> uses malloc for all DMA buffers, no need for VFIO_IOMMU_MAP.
>

Hi Jean,

I think there is another way to support SVM without PASID.

Suppose there is a device in the same SOC-chip, the device access memory 
through SMMU(using internal bus instead of PCIe)
Once page fault, the device send an event with (vaddr, substreamID) to SMMU, 
then SMMU triggers an event interrupt.

In the event interrupt handler, we can implement the same logic as PRI 
interrupt in your patch.
What do you think about that?

Thanks,
Bob Liu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-08-07 Thread Vivek Gautam
On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark  wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R  wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
 On 07/06, Vivek Gautam wrote:
> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
> *domain, unsigned long iova,
>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long 
> iova,
>size_t size)
>   {
> -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> +size_t ret;
> if (!ops)
>   return 0;
>   -return ops->unmap(ops, iova, size);
> +pm_runtime_get_sync(smmu_domain->smmu->dev);
 Can these map/unmap ops be called from an atomic context? I seem
 to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
>
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).

I would like to understand whether there is a situation where an unmap is
called in atomic context without an enabled master?

Let's say we have the case where all the unmap calls in atomic context happen
only from the master's context (in which case the device link should
take care of
the pm state of smmu), and the only unmap that happen in non-atomic context
is the one with master disabled. In such a case doesn it make sense to
distinguish
the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
for the non-atomic context since that would be the one with master disabled.


Thanks
Vivek

> On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
>
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().
>
> BR,
> -R
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 1/2] ACPI/IORT: Add ITS address regions reservation helper

2017-08-07 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Friday, August 04, 2017 5:57 PM
> To: Shameerali Kolothum Thodi; lorenzo.pieral...@arm.com;
> marc.zyng...@arm.com; sudeep.ho...@arm.com; will.dea...@arm.com;
> hanjun@linaro.org
> Cc: Gabriele Paoloni; John Garry; iommu@lists.linux-foundation.org; linux-
> arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> de...@acpica.org; Linuxarm; Wangzhou (B); Guohanjun (Hanjun Guo)
> Subject: Re: [PATCH v5 1/2] ACPI/IORT: Add ITS address regions reservation
> helper
> 
> On 01/08/17 11:49, Shameer Kolothum wrote:
> > On some platforms ITS address regions have to be excluded from normal
> > IOVA allocation in that they are detected and decoded in a HW specific
> > way by system components and so they cannot be considered normal
> IOVA
> > address space.
> >
> > Add an helper function that retrieves ITS address regions through IORT
> > device <-> ITS mappings and reserves it so that these regions will not
> > be translated by IOMMU and will be excluded from IOVA allocations.
> 
> I've just realised that we no longer seem to have a check that ensures
> the regions are only reserved on platforms that need it - if not, then
> we're going to break everything else that does have an ITS behind SMMU
> translation as expected.

Right. I had this doubt, but then my thinking was that we will have the SW_MSI
regions for those and will end up  using that. But that doesn’t seems
to be the case now.

> It feels like IORT should know enough to be able to make that decision
> internally, but if not (or if it would be hideous to do so), then I
> guess my idea for patch #2 was a bust and we probably do need to go back
> to calling directly from the SMMU driver based on the SMMU model.

It might be possible to do that check inside iort code, but then we have to find
the  smmu node first and check the model number. I think it will be more
cleaner if SMMU driver makes that check and call the
iommu_dma_get_msi_resv_regions().

If you are Ok with that I will quickly rework and send out the next version.

Thanks,
Shameer

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v5 5/5] vfio-pci: Allow to expose MSI-X table to userspace when safe

2017-08-07 Thread Alexey Kardashevskiy
Some devices have a MSIX BAR not aligned to the system page size
greater than 4K (like 64k for ppc64) which at the moment prevents
such MMIO pages from being mapped to the userspace for the sake of
the MSIX BAR content protection. If such page happens to share
the same system page with some frequently accessed registers,
the entire system page will be emulated which can seriously affect
performance.

This allows mapping of MSI-X tables to userspace if hardware provides
MSIX isolation via interrupt remapping or filtering; in other words
allowing direct access to the MSIX BAR won't do any harm to other devices
or cause spurious interrupts visible to the kernel.

This adds a wrapping helper to check if a capability is supported by
an IOMMU group.

Signed-off-by: Alexey Kardashevskiy 
---
 include/linux/vfio.h |  1 +
 drivers/vfio/pci/vfio_pci.c  | 20 +---
 drivers/vfio/pci/vfio_pci_rdwr.c |  5 -
 drivers/vfio/vfio.c  | 15 +++
 4 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 586809abb273..7110bca2fb60 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -46,6 +46,7 @@ struct vfio_device_ops {
 
 extern struct iommu_group *vfio_iommu_group_get(struct device *dev);
 extern void vfio_iommu_group_put(struct iommu_group *group, struct device 
*dev);
+extern bool vfio_iommu_group_is_capable(struct device *dev, unsigned long cap);
 
 extern int vfio_add_group_dev(struct device *dev,
  const struct vfio_device_ops *ops,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index d87a0a3cda14..c4c39ed64b1e 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -561,11 +561,17 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
struct vfio_region_info_cap_sparse_mmap *sparse;
size_t end, size;
int nr_areas = 2, i = 0, ret;
+   bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev,
+   IOMMU_GROUP_CAP_ISOLATE_MSIX);
 
end = pci_resource_len(vdev->pdev, vdev->msix_bar);
 
-   /* If MSI-X table is aligned to the start or end, only one area */
-   if (((vdev->msix_offset & PAGE_MASK) == 0) ||
+   /*
+* If MSI-X table is allowed to mmap because of the capability
+* of IRQ remapping or aligned to the start or end, only one area
+*/
+   if (is_msix_isolated ||
+   ((vdev->msix_offset & PAGE_MASK) == 0) ||
(PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
nr_areas = 1;
 
@@ -577,6 +583,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
 
sparse->nr_areas = nr_areas;
 
+   if (is_msix_isolated) {
+   sparse->areas[i].offset = 0;
+   sparse->areas[i].size = end;
+   return 0;
+   }
+
if (vdev->msix_offset & PAGE_MASK) {
sparse->areas[i].offset = 0;
sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
@@ -1094,6 +1106,8 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
unsigned int index;
u64 phys_len, req_len, pgoff, req_start;
int ret;
+   bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev,
+   IOMMU_GROUP_CAP_ISOLATE_MSIX);
 
index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
 
@@ -1115,7 +1129,7 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
if (req_start + req_len > phys_len)
return -EINVAL;
 
-   if (index == vdev->msix_bar) {
+   if (index == vdev->msix_bar && !is_msix_isolated) {
/*
 * Disallow mmaps overlapping the MSI-X table; users don't
 * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 357243d76f10..7514206a5ea7 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vfio_pci_private.h"
 
@@ -123,6 +124,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char 
__user *buf,
resource_size_t end;
void __iomem *io;
ssize_t done;
+   bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev,
+   IOMMU_GROUP_CAP_ISOLATE_MSIX);
 
if (pci_resource_start(pdev, bar))
end = pci_resource_len(pdev, bar);
@@ -164,7 +167,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char 
__user *buf,
} else
io = vdev->barmap[bar];
 
-   if (bar == vdev->msix_bar) {
+   if (bar == vdev->msix_bar && !is_msix_isolated) {
x_start = vdev->msix_offset;
x_end = vdev->msix_offset + vdev->msix_size;
}

Re: [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op

2017-08-07 Thread Vivek Gautam
Hi Robin,


On Fri, Aug 4, 2017 at 10:34 PM, Robin Murphy  wrote:
> On 03/08/17 06:35, Vivek Gautam wrote:
>> Hi Robin,
>>
>>
>>
>> On 08/02/2017 05:47 PM, Robin Murphy wrote:
>>> On 02/08/17 10:53, Vivek Gautam wrote:
 We don't want to touch the TLB when smmu is suspended.
 Defer it until resume.

 Signed-off-by: Vivek Gautam 
 ---

 Hi all,

 Here's the small patch in response of suggestion to defer tlb operations
 when smmu is in suspend state.
 The patch stores the TLB requests in 'unmap' when the smmu device is
 suspended. On resume, it checks all the pending TLB requests, and
 performs the unmap over those.

 Right now, I have applied the patch on top of the pm runtime series.
 Let me know what you think of the change. It will also be helpful if
 somebody can please test a valid use case with this.
>>> The patch itself doesn't make much sense to me, but more crucially it's
>>> definitely broken in concept. We can't return from arm_smmu_unmap()
>>> without having actually unmapped anything, because that leaves the page
>>> tables out of sync with what the caller expects - they may immmediately
>>> reuse that IOVA to map something else for a different device and hit an
>>> unexpected failure from io-pgtable when the PTE turns out to be
>>> non-empty.
>>
>> To understand things bit more,
>> once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is,
>> the next mapping can happen only with the *knowledge* of smmu, i.e.,
>> smmu should be active at that time.
>> If that's true then, the _runtime()_resume() method will take care of
>> invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume().
>> Is my understanding correct here?
>
> What I mean is that it's OK for arm_smmu_unmap() to defer the physical
> TLB maintenance for an unmap request if the SMMU is suspended, but it
> *must* still update the pagetable so that the given address is logically
> unmapped before returning. In other words, the place to make decisions
> based on the SMMU PM state would be in the .tlb_add_flush and .tlb_sync
> callbacks, rather than at the top level.

Okay, i understand it better now.
.tlb_add_flush and .tlb_sync callbacks should be the right place.

>
>>> However, if in general suspend *might* power-gate any part of the SMMU,
>>> then I don't think we have any guarantee of what state any TLBs could be
>>> in upon resume. Therefore any individual invalidations we skip while
>>> suspended are probably moot, since resume would almost certainly have to
>>> invalidate everything to get back to a safe state anyway.
>>
>> Right, in case when the suspend power-gates the SMMU, then
>> the TLB context is lost anyways. So resume path can freshly start.
>> This is something that exynos does at present.
>
> Yes, in general I don't think we can assume any SMMU state is preserved,
> so the only safe option would be for .runtime_resume to do the same
> thing as .resume, which does at least make things nice and simple.

Let me try to find out more about the state of TLBs. As far as the
programmable registers are concerned, qcom platforms have retention
enabled for them. So they don't loose state after SMMU power down.

>
>>> Conversely though, the situation that still concerns me is whether this
>>> can work at all for a distributed SMMU if things *don't* lose state. Say
>>> the GPU and its local TBU are in the same clock domain - if the GPU has
>>> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU)
>>> is still active servicing other devices, we will assume we can happily
>>> unmap GPU buffers and issue TLBIs, but what happens with entries held in
>>> the unclocked TBU's micro-TLB?
>>
>> We know of platforms we have that have shared TCU and multiple TBUs.
>> Each TBU is available in its own power domain, not in master's power
>> domain.
>> In such cases we may want to runtime_get() the TBUs, so that unmap()
>> call with
>> master clock gated gets through.
>>
>> Can we have a situation where the TBU and master are in the same power
>> domain, and the unmap is called when the master is not runtime active?
>> How will such a situation be handled?
>
> Having thought about it a bit more, I think the
> unmap-after-master-suspended case is only one facet of the problem - if
> we can power down individual TBUs/micro-TLBs without suspending the rest
> of the SMMU, do we also have any guarantee that such TLBs don't power
> back on full of valid-looking random junk?
>
> I'm starting to think the only way to be generally safe would be to
> globally invalidate all TLBs after any *master* is resumed, and I'm not
> even sure that's feasible :/
>
> Robin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


regards
Vivek

-- 
Qualcomm 

[RFC PATCH v5 4/5] powerpc/iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX

2017-08-07 Thread Alexey Kardashevskiy
This sets IOMMU_GROUP_CAP_ISOLATE_MSIX to a group unconditionally as
there is no IOMMU-capable hardware without such a feature.

On IBM POWERPC (POWER8) [1], PCI host bridge maintains BFDN-to-PE
translation (PE stands for "partitionable endpoint"), and PE index is used
to look at Interrupt Vector Table (IVT) to identify the interrupt server.
Without these translations in place, MSIX messages won't pass PHB.

[1] 3.2.4. MSI Design
http://openpowerfoundation.org/wp-content/uploads/resources/IODA2Spec/IODA2WGSpec-1.0.0-20160217.pdf

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 233ca3fe4754..dca0a83f1560 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -936,6 +936,7 @@ void iommu_register_group(struct iommu_table_group 
*table_group,
if (!name)
return;
iommu_group_set_name(grp, name);
+   iommu_group_set_caps(grp, 0, IOMMU_GROUP_CAP_ISOLATE_MSIX);
kfree(name);
 }
 
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 2/5] iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if MSI controller enables IRQ remapping

2017-08-07 Thread Alexey Kardashevskiy
This sets IOMMU_GROUP_CAP_ISOLATE_MSIX to a group if MSI remapping is
enabled on an IRQ domain; this is expected to set the capability
on ARM.

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/iommu/iommu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6b2c34fe2c3d..e720e90fa93c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct kset *iommu_group_kset;
@@ -1028,6 +1029,7 @@ struct iommu_group *iommu_group_get_for_dev(struct device 
*dev)
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
int ret;
+   struct irq_domain *d = dev_get_msi_domain(dev);
 
group = iommu_group_get(dev);
if (group)
@@ -1070,6 +1072,11 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
return ERR_PTR(ret);
}
 
+   if (d && irq_domain_is_msi(d) &&
+   irq_domain_hierarchical_is_msi_remap(d))
+   iommu_group_set_caps(group, 0,
+   IOMMU_GROUP_CAP_ISOLATE_MSIX);
+
return group;
 }
 
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 1/5] iommu: Add capabilities to a group

2017-08-07 Thread Alexey Kardashevskiy
This introduces capabilities to IOMMU groups. The first defined
capability is IOMMU_GROUP_CAP_ISOLATE_MSIX which tells the IOMMU
group users that a particular IOMMU group is capable of MSIX message
filtering; this is useful when deciding whether or not to allow mapping
of MSIX table to the userspace. Various architectures will enable it
when they decide that it is safe to do so.

Signed-off-by: Alexey Kardashevskiy 
---
 include/linux/iommu.h | 20 
 drivers/iommu/iommu.c | 28 
 2 files changed, 48 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54adc4a33..6b6f3c2f4904 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -155,6 +155,9 @@ struct iommu_resv_region {
enum iommu_resv_typetype;
 };
 
+/* IOMMU group capabilities */
+#define IOMMU_GROUP_CAP_ISOLATE_MSIX   (1U)
+
 #ifdef CONFIG_IOMMU_API
 
 /**
@@ -312,6 +315,11 @@ extern void *iommu_group_get_iommudata(struct iommu_group 
*group);
 extern void iommu_group_set_iommudata(struct iommu_group *group,
  void *iommu_data,
  void (*release)(void *iommu_data));
+extern void iommu_group_set_caps(struct iommu_group *group,
+unsigned long clearcaps,
+unsigned long setcaps);
+extern bool iommu_group_is_capable(struct iommu_group *group,
+  unsigned long cap);
 extern int iommu_group_set_name(struct iommu_group *group, const char *name);
 extern int iommu_group_add_device(struct iommu_group *group,
  struct device *dev);
@@ -513,6 +521,18 @@ static inline void iommu_group_set_iommudata(struct 
iommu_group *group,
 {
 }
 
+static inline void iommu_group_set_caps(struct iommu_group *group,
+   unsigned long clearcaps,
+   unsigned long setcaps)
+{
+}
+
+static inline bool iommu_group_is_capable(struct iommu_group *group,
+ unsigned long cap)
+{
+   return false;
+}
+
 static inline int iommu_group_set_name(struct iommu_group *group,
   const char *name)
 {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3f6ea160afed..6b2c34fe2c3d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -52,6 +52,7 @@ struct iommu_group {
void (*iommu_data_release)(void *iommu_data);
char *name;
int id;
+   unsigned long caps;
struct iommu_domain *default_domain;
struct iommu_domain *domain;
 };
@@ -447,6 +448,33 @@ void iommu_group_set_iommudata(struct iommu_group *group, 
void *iommu_data,
 EXPORT_SYMBOL_GPL(iommu_group_set_iommudata);
 
 /**
+ * iommu_group_set_caps - Change the group capabilities
+ * @group: the group
+ * @clearcaps: capabilities mask to remove
+ * @setcaps: capabilities mask to add
+ *
+ * IOMMU groups can be capable of various features which device drivers
+ * may read and adjust the behavior.
+ */
+void iommu_group_set_caps(struct iommu_group *group,
+   unsigned long clearcaps, unsigned long setcaps)
+{
+   group->caps &= ~clearcaps;
+   group->caps |= setcaps;
+}
+EXPORT_SYMBOL_GPL(iommu_group_set_caps);
+
+/**
+ * iommu_group_is_capable - Returns if a group capability is present
+ * @group: the group
+ */
+bool iommu_group_is_capable(struct iommu_group *group, unsigned long cap)
+{
+   return !!(group->caps & cap);
+}
+EXPORT_SYMBOL_GPL(iommu_group_is_capable);
+
+/**
  * iommu_group_set_name - set name for a group
  * @group: the group
  * @name: name
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-07 Thread Alexey Kardashevskiy

This is a followup for "[PATCH kernel v4 0/6] vfio-pci: Add support for 
mmapping MSI-X table"
http://www.spinics.net/lists/kvm/msg152232.html

This time it is using "caps" in IOMMU groups. The main question is if PCI
bus flags or IOMMU domains are still better (and which one).



Here is some background:

Current vfio-pci implementation disallows to mmap the page
containing MSI-X table in case that users can write directly
to MSI-X table and generate an incorrect MSIs.

However, this will cause some performance issue when there
are some critical device registers in the same page as the
MSI-X table. We have to handle the mmio access to these
registers in QEMU emulation rather than in guest.

To solve this issue, this series allows to expose MSI-X table
to userspace when hardware enables the capability of interrupt
remapping which can ensure that a given PCI device can only
shoot the MSIs assigned for it. And we introduce a new bus_flags
PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
for different archs.


This is based on sha1
26c5cebfdb6c "Merge branch 'parisc-4.13-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux"

Please comment. Thanks.

Changelog:

v5:
* redid the whole thing via so-called IOMMU group capabilities

v4:
* rebased on recent upstream
* got all 6 patches from v2 (v3 was missing some)




Alexey Kardashevskiy (5):
  iommu: Add capabilities to a group
  iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if MSI controller enables IRQ
remapping
  iommu/intel/amd: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if IRQ remapping is
enabled
  powerpc/iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX
  vfio-pci: Allow to expose MSI-X table to userspace when safe

 include/linux/iommu.h| 20 
 include/linux/vfio.h |  1 +
 arch/powerpc/kernel/iommu.c  |  1 +
 drivers/iommu/amd_iommu.c|  3 +++
 drivers/iommu/intel-iommu.c  |  3 +++
 drivers/iommu/iommu.c| 35 +++
 drivers/vfio/pci/vfio_pci.c  | 20 +---
 drivers/vfio/pci/vfio_pci_rdwr.c |  5 -
 drivers/vfio/vfio.c  | 15 +++
 9 files changed, 99 insertions(+), 4 deletions(-)

-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu