[PATCH v2] MAINTAINERS: update QUALCOMM IOMMU after Arm SMMU drivers move

2020-08-24 Thread Lukas Bulwahn
Commit e86d1aa8b60f ("iommu/arm-smmu: Move Arm SMMU drivers into their own
subdirectory") moved drivers/iommu/qcom_iommu.c to
drivers/iommu/arm/arm-smmu/qcom_iommu.c amongst other moves, adjusted some
sections in MAINTAINERS, but missed adjusting the QUALCOMM IOMMU section.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:

  warning: no file matchesF:drivers/iommu/qcom_iommu.c

Update the file entry in MAINTAINERS to the new location.

Signed-off-by: Lukas Bulwahn 
Acked-by: Will Deacon 
---
v1: https://lore.kernel.org/lkml/20200802065320.7470-1-lukas.bulw...@gmail.com/
v1 -> v2: typo fixed; added Will's Ack.

Joerg, please pick this minor non-urgent patch for your -next branch.

applies cleanly on next-20200731

 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1469cb81261d..e175c0741653 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14358,7 +14358,7 @@ M:  Rob Clark 
 L: iommu@lists.linux-foundation.org
 L: linux-arm-...@vger.kernel.org
 S: Maintained
-F: drivers/iommu/qcom_iommu.c
+F: drivers/iommu/arm/arm-smmu/qcom_iommu.c
 
 QUALCOMM IPCC MAILBOX DRIVER
 M: Manivannan Sadhasivam 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain

2020-08-24 Thread Jürgen Groß

On 21.08.20 02:24, Thomas Gleixner wrote:

To allow utilizing the irq domain pointer in struct device it is necessary
to make XEN/MSI irq domain compatible.

While the right solution would be to truly convert XEN to irq domains, this
is an exercise which is not possible for mere mortals with limited XENology.

Provide a plain irqdomain wrapper around XEN. While this is blatant
violation of the irqdomain design, it's the only solution for a XEN igorant
person to make progress on the issue which triggered this change.

Signed-off-by: Thomas Gleixner
Cc:linux-...@vger.kernel.org
Cc:xen-de...@lists.xenproject.org


Acked-by: Juergen Gross 

Looking into https://www.kernel.org/doc/Documentation/IRQ-domain.txt (is
this still valid?) I believe Xen should be able to use the "No Map"
approach, as Xen only ever uses software IRQs (at least those are the
only ones visible to any driver). The (virtualized) hardware interrupts
are Xen events after all.

So maybe morphing Xen into supporting irqdomains in a sane way isn't
that complicated. Maybe I'm missing the main complexities, though.


Juergen


---
Note: This is completely untested, but it compiles so it must be perfect.
---
  arch/x86/pci/xen.c |   63 
+
  1 file changed, 63 insertions(+)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -406,6 +406,63 @@ static void xen_teardown_msi_irq(unsigne
WARN_ON_ONCE(1);
  }
  
+static int xen_msi_domain_alloc_irqs(struct irq_domain *domain,

+struct device *dev,  int nvec)
+{
+   int type;
+
+   if (WARN_ON_ONCE(!dev_is_pci(dev)))
+   return -EINVAL;
+
+   if (first_msi_entry(dev)->msi_attrib.is_msix)
+   type = PCI_CAP_ID_MSIX;
+   else
+   type = PCI_CAP_ID_MSI;
+
+   return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type);
+}
+
+static void xen_msi_domain_free_irqs(struct irq_domain *domain,
+struct device *dev)
+{
+   if (WARN_ON_ONCE(!dev_is_pci(dev)))
+   return;
+
+   x86_msi.teardown_msi_irqs(to_pci_dev(dev));
+}
+
+static struct msi_domain_ops xen_pci_msi_domain_ops = {
+   .domain_alloc_irqs  = xen_msi_domain_alloc_irqs,
+   .domain_free_irqs   = xen_msi_domain_free_irqs,
+};
+
+static struct msi_domain_info xen_pci_msi_domain_info = {
+   .ops= _pci_msi_domain_ops,
+};
+
+/*
+ * This irq domain is a blatant violation of the irq domain design, but
+ * distangling XEN into real irq domains is not a job for mere mortals with
+ * limited XENology. But it's the least dangerous way for a mere mortal to
+ * get rid of the arch_*_msi_irqs() hackery in order to store the irq
+ * domain pointer in struct device. This irq domain wrappery allows to do
+ * that without breaking XEN terminally.
+ */
+static __init struct irq_domain *xen_create_pci_msi_domain(void)
+{
+   struct irq_domain *d = NULL;
+   struct fwnode_handle *fn;
+
+   fn = irq_domain_alloc_named_fwnode("XEN-MSI");
+   if (fn)
+   d = msi_create_irq_domain(fn, _pci_msi_domain_info, NULL);
+
+   /* FIXME: No idea how to survive if this fails */
+   BUG_ON(!d);
+
+   return d;
+}
+
  static __init void xen_setup_pci_msi(void)
  {
if (xen_initial_domain()) {
@@ -426,6 +483,12 @@ static __init void xen_setup_pci_msi(voi
}
  
  	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;

+
+   /*
+* Override the PCI/MSI irq domain init function. No point
+* in allocating the native domain and never use it.
+*/
+   x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain;
  }
  
  #else /* CONFIG_PCI_MSI */




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 07/15] iommu/vt-d: Delegate the dma domain to upper layer

2020-08-24 Thread Chris Wilson
Quoting Lu Baolu (2020-08-24 07:31:23)
> Hi Chris,
> 
> On 2020/8/22 2:33, Chris Wilson wrote:
> > Quoting Lu Baolu (2019-05-25 06:41:28)
> >> This allows the iommu generic layer to allocate a dma domain and
> >> attach it to a device through the iommu api's. With all types of
> >> domains being delegated to upper layer, we can remove an internal
> >> flag which was used to distinguish domains mananged internally or
> >> externally.
> > 
> > I'm seeing some really strange behaviour with this patch on a 32b
> > Skylake system (and still present on mainline). Before this patch
> > everything is peaceful and appears to work correctly. Applying this patch,
> > and we fail to initialise the GPU with a few DMAR errors reported, e.g.
> > 
> > [   20.279445] DMAR: DRHD: handling fault status reg 3
> > [   20.279508] DMAR: [DMA Read] Request device [00:02.0] fault addr 
> > 8900a000 [fault reason 05] PTE Write access is not set
> > 
> > Setting an identity map for the igfx made the DMAR errors disappear, but
> > the GPU still failed to initialise.
> > 
> > There's no difference in the DMAR configuration dmesg between working and
> > the upset patch. And the really strange part is that switching to a 64b
> > kernel with this patch, it's working.
> > 
> > Any suggestions on what I should look for?
> 
> Can the patch titled "[PATCH] iommu/intel: Handle 36b addressing for
> x86-32" solve this problem?

It does. Not sure why, but that mystery I can leave for others.
-Chris
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Where are iommu region setup in dma_alloc_from_dev_coherent() track?

2020-08-24 Thread Robin Murphy

On 2020-08-24 08:01, Сергей Пыптев wrote:

Hi!

I try to port our 3D scanner driver to kernel v4.9 and I have a trouble. 
I can not find a correct way to allocate DMA memory with correct iommu 
setup. Could you help me?


HW: Tegra X1 and Tegra X2 (arm64), our PCIe bus master device uses about 
200M of system memory to send data from sensors. The memory is organized 
in circular buffer (8 frames) and one info page (only 8 bytes are used 
now).
Old driver works at kernel v3.10 and some memory access rules are new in 
v4.9, such as iommu.


I try to use reserved memory region attached to the device in device 
tree and dma_alloc_coherent() to allocate the buffer. (Then I am going 
to find correct flags to turn off coherency and to sync data to CPU 
manually when full frame is arrived.)


Unfortunately, iommu does not set the area.

I've tried to find why. I see iommu calls in general dma memory 
allocation where arch-dependent dma_ops are used (and it sets iommu 
area, I can see it in /sys/kernel/debug), but I can not find the calls 
in per-device allocation track. So I do not see if my code is not 
working because of my bugs or I use wrong concept or it is bug in kernel.


Is there iommu setup in per-device reserved memory allocator (or may be 
earlier in memory reservation)? If no, why? Is it a feature or a bug?


There is not. It's not a bug, just that there's no support for this 
combination because it doesn't really make sense. The expectation is 
that an IOMMU can address all system memory at page granularity, 
therefore there's no need for reserved regions because there are no 
constraints on the *physical* address space. The IOMMU-based DMA ops can 
allocate 200MB worth of pages from all over the place and still present 
them to the device (and driver) as one large contiguous buffer.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d:Add support for probing ACPI device in RMRR

2020-08-24 Thread Dan Carpenter
Hi FelixCuioc,

Thank you for the patch! Perhaps something to improve:

url:
https://github.com/0day-ci/linux/commits/FelixCuioc/iommu-vt-d-Add-support-for-probing-ACPI-device-in-RMRR/20200818-104920
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: x86_64-randconfig-m001-20200820 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 

New smatch warnings:
drivers/iommu/intel/iommu.c:4850 probe_acpi_namespace_devices() warn: 
inconsistent returns 'adev->physical_node_lock'.

Old smatch warnings:
drivers/iommu/intel/iommu.c:836 device_to_iommu() error: we previously assumed 
'pdev' could be null (see line 809)
drivers/iommu/intel/iommu.c:2272 __domain_mapping() error: we previously 
assumed 'sg' could be null (see line 2263)
drivers/iommu/intel/iommu.c:4371 intel_iommu_add() warn: should '(1 << sp)' be 
a 64 bit type?

# 
https://github.com/0day-ci/linux/commit/3adfd53a32c52833a34033fabaa5c1e65dfca688
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
FelixCuioc/iommu-vt-d-Add-support-for-probing-ACPI-device-in-RMRR/20200818-104920
git checkout 3adfd53a32c52833a34033fabaa5c1e65dfca688
vim +4850 drivers/iommu/intel/iommu.c

fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4797  static 
int __init probe_acpi_namespace_devices(void)
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4798  {
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4799 
struct dmar_drhd_unit *drhd;
af88ec3962010e3 drivers/iommu/intel-iommu.c Qian Cai   2019-06-03  4800 
/* To avoid a -Wunused-but-set-variable warning. */
af88ec3962010e3 drivers/iommu/intel-iommu.c Qian Cai   2019-06-03  4801 
struct intel_iommu *iommu __maybe_unused;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4802 
struct device *dev;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4803 
int i, ret = 0;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4804  
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4805 
for_each_active_iommu(iommu, drhd) {
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4806 
for_each_active_dev_scope(drhd->devices,
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4807 
  drhd->devices_cnt, i, dev) {
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4808 
struct acpi_device_physical_node *pn;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4809 
struct iommu_group *group;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4810 
struct acpi_device *adev;
3adfd53a32c5283 drivers/iommu/intel/iommu.c FelixCuioc 2020-08-17  4811 
struct device *pn_dev = NULL;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4812  
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4813 
if (dev->bus != _bus_type)
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4814 
continue;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4815  
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4816 
adev = to_acpi_device(dev);
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4817 
mutex_lock(>physical_node_lock);

^

fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4818 
list_for_each_entry(pn,
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4819 
>physical_node_list, node) {
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4820 
group = iommu_group_get(pn->dev);
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4821 
if (group) {
3adfd53a32c5283 drivers/iommu/intel/iommu.c FelixCuioc 2020-08-17  4822 
pn_dev = pn->dev;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4823 
iommu_group_put(group);
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4824 
continue;
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4825 
}
fa212a97f3a366a drivers/iommu/intel-iommu.c Lu Baolu   2019-05-25  4826  
fa212a97f3a366a 

Re: [PATCH v4 07/15] iommu/vt-d: Delegate the dma domain to upper layer

2020-08-24 Thread Lu Baolu

Hi Chris,

On 2020/8/22 2:33, Chris Wilson wrote:

Quoting Lu Baolu (2019-05-25 06:41:28)

This allows the iommu generic layer to allocate a dma domain and
attach it to a device through the iommu api's. With all types of
domains being delegated to upper layer, we can remove an internal
flag which was used to distinguish domains mananged internally or
externally.


I'm seeing some really strange behaviour with this patch on a 32b
Skylake system (and still present on mainline). Before this patch
everything is peaceful and appears to work correctly. Applying this patch,
and we fail to initialise the GPU with a few DMAR errors reported, e.g.

[   20.279445] DMAR: DRHD: handling fault status reg 3
[   20.279508] DMAR: [DMA Read] Request device [00:02.0] fault addr 8900a000 
[fault reason 05] PTE Write access is not set

Setting an identity map for the igfx made the DMAR errors disappear, but
the GPU still failed to initialise.

There's no difference in the DMAR configuration dmesg between working and
the upset patch. And the really strange part is that switching to a 64b
kernel with this patch, it's working.

Any suggestions on what I should look for?


Can the patch titled "[PATCH] iommu/intel: Handle 36b addressing for
x86-32" solve this problem?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Where are iommu region setup in dma_alloc_from_dev_coherent() track?

2020-08-24 Thread Сергей Пыптев

Hi!

I try to port our 3D scanner driver to kernel v4.9 and I have a trouble. 
I can not find a correct way to allocate DMA memory with correct iommu 
setup. Could you help me?


HW: Tegra X1 and Tegra X2 (arm64), our PCIe bus master device uses about 
200M of system memory to send data from sensors. The memory is organized 
in circular buffer (8 frames) and one info page (only 8 bytes are used now).
Old driver works at kernel v3.10 and some memory access rules are new in 
v4.9, such as iommu.


I try to use reserved memory region attached to the device in device 
tree and dma_alloc_coherent() to allocate the buffer. (Then I am going 
to find correct flags to turn off coherency and to sync data to CPU 
manually when full frame is arrived.)


Unfortunately, iommu does not set the area.

I've tried to find why. I see iommu calls in general dma memory 
allocation where arch-dependent dma_ops are used (and it sets iommu 
area, I can see it in /sys/kernel/debug), but I can not find the calls 
in per-device allocation track. So I do not see if my code is not 
working because of my bugs or I use wrong concept or it is bug in kernel.


Is there iommu setup in per-device reserved memory allocator (or may be 
earlier in memory reservation)? If no, why? Is it a feature or a bug?


While there are some significant changes in the code from v4.9 to v5.8, 
I have not found any related to iommu in the area, so I think it may be 
the same problem with latest kernel version.


Thanks,
Sergei Pyptev

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] docs: Document IO Address Space ID (IOASID) APIs

2020-08-24 Thread Jean-Philippe Brucker
On Fri, Aug 21, 2020 at 09:35:10PM -0700, Jacob Pan wrote:
> IOASID is used to identify address spaces that can be targeted by device
> DMA. It is a system-wide resource that is essential to its many users.
> This document is an attempt to help developers from all vendors navigate
> the APIs. At this time, ARM SMMU and Intel’s Scalable IO Virtualization
> (SIOV) enabled platforms are the primary users of IOASID. Examples of
> how SIOV components interact with IOASID APIs are provided in that many
> APIs are driven by the requirements from SIOV.
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Wu Hao 
> Signed-off-by: Jacob Pan 
> ---
>  Documentation/ioasid.rst | 618 
> +++
>  1 file changed, 618 insertions(+)
>  create mode 100644 Documentation/ioasid.rst
> 
> diff --git a/Documentation/ioasid.rst b/Documentation/ioasid.rst

Thanks for writing this up. Should it go to Documentation/driver-api/, or
Documentation/driver-api/iommu/? I think this also needs to Cc
linux-...@vger.kernel.org and cor...@lwn.net

> new file mode 100644
> index ..b6a8cdc885ff
> --- /dev/null
> +++ b/Documentation/ioasid.rst
> @@ -0,0 +1,618 @@
> +.. ioasid:
> +
> +=
> +IO Address Space ID
> +=
> +
> +IOASID is a generic name for PCIe Process Address ID (PASID) or ARM
> +SMMU sub-stream ID. An IOASID identifies an address space that DMA

"SubstreamID"

> +requests can target.
> +
> +The primary use cases for IOASID are Shared Virtual Address (SVA) and
> +IO Virtual Address (IOVA). However, the requirements for IOASID

IOVA alone isn't a use case, maybe "multiple IOVA spaces per device"?

> +management can vary among hardware architectures.
> +
> +This document covers the generic features supported by IOASID
> +APIs. Vendor-specific use cases are also illustrated with Intel's VT-d
> +based platforms as the first example.
> +
> +.. contents:: :local:
> +
> +Glossary
> +
> +PASID - Process Address Space ID
> +
> +IOASID - IO Address Space ID (generic term for PCIe PASID and
> +sub-stream ID in SMMU)

"SubstreamID"

> +
> +SVA/SVM - Shared Virtual Addressing/Memory
> +
> +ENQCMD - New Intel X86 ISA for efficient workqueue submission [1]

Maybe drop the "New", to keep the documentation perennial. It might be
good to add internal links here to the specifications URLs at the bottom.

> +
> +DSA - Intel Data Streaming Accelerator [2]
> +
> +VDCM - Virtual device composition module [3]
> +
> +SIOV - Intel Scalable IO Virtualization
> +
> +
> +Key Concepts
> +
> +
> +IOASID Set
> +---
> +An IOASID set is a group of IOASIDs allocated from the system-wide
> +IOASID pool. An IOASID set is created and can be identified by a
> +token of u64. Refer to IOASID set APIs for more details.

Identified either by an u64 or an mm_struct, right?  Maybe just drop the
second sentence if it's detailed in the IOASID set section below.

> +
> +IOASID set is particularly useful for guest SVA where each guest could
> +have its own IOASID set for security and efficiency reasons.
> +
> +IOASID Set Private ID (SPID)
> +
> +SPIDs are introduced as IOASIDs within its set. Each SPID maps to a
> +system-wide IOASID but the namespace of SPID is within its IOASID
> +set.

The intro isn't super clear. Perhaps this is simpler:
"Each IOASID set has a private namespace of SPIDs. An SPID maps to a
single system-wide IOASID."

> SPIDs can be used as guest IOASIDs where each guest could do
> +IOASID allocation from its own pool and map them to host physical
> +IOASIDs. SPIDs are particularly useful for supporting live migration
> +where decoupling guest and host physical resources are necessary.
> +
> +For example, two VMs can both allocate guest PASID/SPID #101 but map to
> +different host PASIDs #201 and #202 respectively as shown in the
> +diagram below.
> +::
> +
> + .--..--.
> + |   VM 1   ||   VM 2   |
> + |  ||  |
> + |--||--|
> + | GPASID/SPID 101  || GPASID/SPID 101  |
> + '--'---' Guest
> + __|__|__
> +   |  |   Host
> +   v  v
> + .--..--.
> + | Host IOASID 201  || Host IOASID 202  |
> + '--''--'
> + |   IOASID set 1   ||   IOASID set 2   |
> + '--''--'
> +
> +Guest PASID is treated as IOASID set private ID (SPID) within an
> +IOASID set, mappings between guest and host IOASIDs are stored in the
> +set for inquiry.
> +
> +IOASID APIs
> +===
> +To get the IOASID APIs, users must #include . These APIs
> +serve the following functionalities:
> +
> +  - IOASID allocation/Free
> +  - Group 

[PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-24 Thread Joerg Roedel
From: Joerg Roedel 

Hi,

Some IOMMUv2 capable devices do not work correctly when SME is
active, because their DMA mask does not include the encryption bit, so
that they can not DMA to encrypted memory directly.

The IOMMU can jump in here, but the AMD IOMMU driver puts IOMMUv2
capable devices into an identity mapped domain. Fix that by not
forcing an identity mapped domain on devices when SME is active and
forbid using their IOMMUv2 functionality.

Please review.

Thanks,

Joerg

Joerg Roedel (2):
  iommu/amd: Do not force direct mapping when SME is active
  iommu/amd: Do not use IOMMUv2 functionality when SME is active

 drivers/iommu/amd/iommu.c| 7 ++-
 drivers/iommu/amd/iommu_v2.c | 7 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/amd: Do not force direct mapping when SME is active

2020-08-24 Thread Joerg Roedel
From: Joerg Roedel 

Do not force devices supporting IOMMUv2 to be direct mapped when memory
encryption is active. This might cause them to be unusable because their
DMA mask does not include the encryption bit.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/iommu.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ba9f3dbc5b94..77e4268e41cf 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2659,7 +2659,12 @@ static int amd_iommu_def_domain_type(struct device *dev)
if (!dev_data)
return 0;
 
-   if (dev_data->iommu_v2)
+   /*
+* Do not identity map IOMMUv2 capable devices when memory encryption is
+* active, because some of those devices (AMD GPUs) don't have the
+* encryption bit in their DMA-mask and require remapping.
+*/
+   if (!mem_encrypt_active() && dev_data->iommu_v2)
return IOMMU_DOMAIN_IDENTITY;
 
return 0;
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/amd: Do not use IOMMUv2 functionality when SME is active

2020-08-24 Thread Joerg Roedel
From: Joerg Roedel 

When memory encryption is active the device is likely not in a direct
mapped domain. Forbid using IOMMUv2 functionality for now until finer
grained checks for this have been implemented.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/iommu_v2.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index c259108ab6dd..0d175aed1d92 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -737,6 +737,13 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 
might_sleep();
 
+   /*
+* When memory encryption is active the device is likely not in a
+* direct-mapped domain. Forbid using IOMMUv2 functionality for now.
+*/
+   if (mem_encrypt_active())
+   return -ENODEV;
+
if (!amd_iommu_v2_supported())
return -ENODEV;
 
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround

2020-08-24 Thread Robin Murphy

On 2020-08-23 22:34, Dmitry Osipenko wrote:

21.08.2020 03:11, Robin Murphy пишет:
...

Hello, Robin! Thank you for yours work!

Some drivers, like this Tegra VDE (Video Decoder Engine) driver for
example, do not want to use implicit IOMMU domain.


That isn't (intentionally) changing here - the only difference should be
that instead of having the ARM-special implicit domain, which you have
to kick out of the way with the ARM-specific API before you're able to
attach your own domain, the implicit domain is now a proper IOMMU API
default domain, which automatically gets bumped by your attach. The
default domains should still only be created in the same cases that the
ARM dma_iommu_mappings were.


Tegra VDE driver
relies on explicit IOMMU domain in a case of Tegra SMMU because VDE
hardware can't access last page of the AS and because driver wants to
reserve some fixed addresses [1].

[1]
https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/staging/media/tegra-vde/iommu.c#L100


Tegra30 SoC supports up to 4 domains, hence it's not possible to afford
wasting unused implicit domains. I think this needs to be addressed
before this patch could be applied.


Yeah, there is one subtle change in behaviour from removing the ARM
layer on top of the core API, in that the IOMMU driver will no longer
see an explicit detach call. Thus it does stand to benefit from being a
bit cleverer about noticing devices being moved from one domain to
another by an attach call, either by releasing the hardware context for
the inactive domain once the device(s) are moved across to the new one,
or by simply reprogramming the hardware context in-place for the new
domain's address space without allocating a new one at all (most of the
drivers that don't have multiple contexts already handle the latter
approach quite well).


Would it be possible for IOMMU drivers to gain support for filtering out
devices in iommu_domain_alloc(dev, type)? Then perhaps Tegra SMMU driver
could simply return NULL in a case of type=IOMMU_DOMAIN_DMA and
dev=tegra-vde.


If you can implement IOMMU_DOMAIN_IDENTITY by allowing the relevant
devices to bypass translation entirely without needing a hardware
context (or at worst, can spare one context which all identity-mapped
logical domains can share), then you could certainly do that kind of
filtering with the .def_domain_type callback if you really wanted to. As
above, the intent is that that shouldn't be necessary for this
particular case, since only one of a group's default domain and
explicitly attached domain can be live at any given time, so the driver
should be able to take advantage of that.

If you simply have more active devices (groups) than available contexts
then yes, you probably would want to do some filtering to decide who
deserves a translation domain and who doesn't, but in that case you
should already have had a long-standing problem with the ARM implicit
domains.


Alternatively, the Tegra SMMU could be changed such that the devices
will be attached to a domain at the time of a first IOMMU mapping
invocation instead of attaching at the time of attach_dev() callback
invocation.

Or maybe even IOMMU core could be changed to attach devices at the time
of the first IOMMU mapping invocation? This could be a universal
solution for all drivers.


I suppose technically you could do that within an IOMMU driver already
(similar to how some defer most of setup that logically belongs to
->domain_alloc until the first ->attach_dev). It's a bit grim from the
caller's PoV though, in terms of the failure mode being non-obvious and
having no real way to recover. Again, you'd be better off simply making
decisions up-front at domain_alloc or attach time based on the domain type.


Robin, thank you very much for the clarifications!

In accordance to yours comments, this patch can't be applied until Tegra
SMMU will support IOMMU_DOMAIN_IDENTITY and implement def_domain_type()
callback that returns IOMMU_DOMAIN_IDENTITY for the VDE device.

Otherwise you're breaking the VDE driver because
dma_buf_map_attachment() [1] returns the IOMMU SGT of the implicit
domain which is then mapped into the VDE's explicit domain [2], and this
is a nonsense.


It's true that iommu_dma_ops will do some work in the unattached default 
domain, but non-coherent cache maintenance will still be performed 
correctly on the underlying memory, which is really all that you care 
about for this case. As for tegra_vde_iommu_map(), that seems to do the 
right thing in only referencing the physical side of the scatterlist 
(via iommu_map_sg()) and ignoring the DMA side, so things ought to work 
out OK even if it is a little non-obvious.



[1]
https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/staging/media/tegra-vde/dmabuf-cache.c#L102

[2]
https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/staging/media/tegra-vde/dmabuf-cache.c#L122

Hence, either VDE driver should bypass iommu_dma_ops from the start or
it needs a way to kick 

Re: [PATCH 00/18] Convert arch/arm to use iommu-dma

2020-08-24 Thread Marek Szyprowski
Hi Robin,

On 20.08.2020 17:08, Robin Murphy wrote:
> Hi all,
>
> After 5 years or so of intending to get round to this, finally the
> time comes! The changes themselves actualy turn out to be relatively
> mechanical; the bigger concern appears to be how to get everything
> merged across about 5 diffferent trees given the dependencies.
>
> I've lightly boot-tested things on Rockchip RK3288 and Exynos 4412
> (Odroid-U3), to the degree that their display drivers should be using
> IOMMU-backed buffers and don't explode (the Odroid doesn't manage to
> send a working HDMI signal to the one monitor I have that it actually
> detects, but that's a pre-existing condition...) Confirmation that the
> Mediatek, OMAP and Tegra changes work will be most welcome.
>
> Patches are based on 5.9-rc1, branch available here:
>
>git://linux-arm.org/linux-rm arm/dma

Well, my first proposal for the ARM and ARM64 DMA-mapping unification 
has been posted long time ago: https://lkml.org/lkml/2016/2/19/79

Thanks for resurrecting it! :)

I've tested this patchset on various ARM32bit Exynos based boards (not 
only Exynos4412) and most of them works fine after your conversion. 
However there are issues you cannot learn from the code.

Conversion of the Exynos DRM was straightforward (thanks!), but there 
are other Exynos drivers that depends on the old ARM implementation. The 
S5P-MFC (only for the v5 hardware) and Exynos4 FIMC-ISP drivers depends 
on the first-fit IOVA allocation algorithm in the old ARM DMA-mapping. 
This was the main reason I've didn't continue my initial conversion attempt.

Both drivers allocate a buffer for their firmware and then in the 
hardware registers address video buffers as an offset from the 
begginning of the firmware. This doesn't work when underlying 
DMA-mapping allocates IOVA with the last-fit algorithm, what the 
drivers/iommu/dma-iommu.c does. So far I didn't find a good solution for 
that issue.

I'm open for suggestions. One more limitation for the S5P-MFC driver is 
that the hardware is capable only for addressing 128MiB. They will 
probably need to call IOMMU API directly, but I would like to keep as 
much from the IOMMU/DMA-mapping code as possible.


Anyway, we need to move ARM 32bit forward, so for the ARM DMA-mapping 
and Exynos DRM changes, feel free to add:

Acked-by: Marek Szyprowski 

Tested-by: Marek Szyprowski 

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Is: virtio_gpu_object_shmem_init issues? Was:Re: upstream boot error: general protection fault in swiotlb_map

2020-08-24 Thread Christoph Hellwig
On Mon, Aug 24, 2020 at 11:06:51AM -0400, Konrad Rzeszutek Wilk wrote:
> So it fails at
> 
> 683 dev_WARN_ONCE(dev, 1, 
>   
> 684 "swiotlb addr %pad+%zu overflow (mask %llx, bus 
> limit %llx).\n",
> 685 _addr, size, *dev->dma_mask, 
> dev->bus_dma_limit);   
> 
> 
> which makes no sense to me as `dev` surely exists. I can see in the console 
> log:
> 
> virtio-pci :00:01.0: vgaarb: deactivate vga console
> 
> So what gives?

Well, look at the if around the WARN_ON - dma_capable failed on the
swiotlb buffer. This means the virtio drm thingy has a dma mask
(either the actual one set by the driver, or bus_dma_mask), which isn't
enough to address the swiotlb buffer.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/20] iommu/arm-smmu: Prepare for the adreno-smmu implementation

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Do a bit of prep work to add the upcoming adreno-smmu implementation.

Add an hook to allow the implementation to choose which context banks
to allocate.

Move some of the common structs to arm-smmu.h in anticipation of them
being used by the implementations and update some of the existing hooks
to pass more information that the implementation will need.

These modifications will be used by the upcoming Adreno SMMU
implementation to identify the GPU device and properly configure it
for pagetable switching.

Co-developed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 69 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.h  | 51 +++-
 3 files changed, 68 insertions(+), 54 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index a9861dcd0884..88f17cc33023 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -69,7 +69,7 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 }
 
 static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
-   struct io_pgtable_cfg *pgtbl_cfg)
+   struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 976d43a7f2ff..e63a480d7f71 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -65,41 +65,10 @@ module_param(disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
"Disable bypass streams such that incoming transactions from devices 
that are not attached to an iommu domain will report an abort back to the 
device and will not be allowed to pass through the SMMU.");
 
-struct arm_smmu_s2cr {
-   struct iommu_group  *group;
-   int count;
-   enum arm_smmu_s2cr_type type;
-   enum arm_smmu_s2cr_privcfg  privcfg;
-   u8  cbndx;
-};
-
 #define s2cr_init_val (struct arm_smmu_s2cr){  \
.type = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS,\
 }
 
-struct arm_smmu_smr {
-   u16 mask;
-   u16 id;
-   boolvalid;
-};
-
-struct arm_smmu_cb {
-   u64 ttbr[2];
-   u32 tcr[2];
-   u32 mair[2];
-   struct arm_smmu_cfg *cfg;
-};
-
-struct arm_smmu_master_cfg {
-   struct arm_smmu_device  *smmu;
-   s16 smendx[];
-};
-#define INVALID_SMENDX -1
-#define cfg_smendx(cfg, fw, i) \
-   (i >= fw->num_ids ? INVALID_SMENDX : cfg->smendx[i])
-#define for_each_cfg_sme(cfg, fw, i, idx) \
-   for (i = 0; idx = cfg_smendx(cfg, fw, i), i < fw->num_ids; ++i)
-
 static bool using_legacy_binding, using_generic_binding;
 
 static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
@@ -234,19 +203,6 @@ static int arm_smmu_register_legacy_master(struct device 
*dev,
 }
 #endif /* CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS */
 
-static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
-{
-   int idx;
-
-   do {
-   idx = find_next_zero_bit(map, end, start);
-   if (idx == end)
-   return -ENOSPC;
-   } while (test_and_set_bit(idx, map));
-
-   return idx;
-}
-
 static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 {
clear_bit(idx, map);
@@ -578,7 +534,7 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
}
 }
 
-static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
+void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
 {
u32 reg;
bool stage1;
@@ -665,7 +621,8 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
 }
 
 static int arm_smmu_init_domain_context(struct iommu_domain *domain,
-   struct arm_smmu_device *smmu)
+   struct arm_smmu_device *smmu,
+   struct device *dev)
 {
int irq, start, ret = 0;
unsigned long ias, oas;
@@ -780,10 +737,20 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
ret = -EINVAL;
goto out_unlock;
}
-   ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
+
+   smmu_domain->smmu = smmu;
+
+   if (smmu->impl && smmu->impl->alloc_context_bank)
+ 

[PATCH 02/20] iommu/arm-smmu: Pass io-pgtable config to implementation specific function

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Construct the io-pgtable config before calling the implementation specific
init_context function and pass it so the implementation specific function
can get a chance to change it before the io-pgtable is created.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 11 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index f4ff124a1967..a9861dcd0884 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 09c42af9f31e..37d8d49299b4 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -795,11 +795,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
-   if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain);
-   if (ret)
-   goto out_unlock;
-   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -810,6 +805,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
.iommu_dev  = smmu->dev,
};
 
+   if (smmu->impl && smmu->impl->init_context) {
+   ret = smmu->impl->init_context(smmu_domain, _cfg);
+   if (ret)
+   goto out_clear_smmu;
+   }
+
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index d890a4a968e8..83294516ac08 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -386,7 +386,8 @@ struct arm_smmu_impl {
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
-   int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   int (*init_context)(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *cfg);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
int (*def_domain_type)(struct device *dev);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/20] iommu: add private interface for adreno-smmu

2020-08-24 Thread Rob Clark
From: Rob Clark 

This interface will be used for drm/msm to coordinate with the
qcom_adreno_smmu_impl to enable/disable TTBR0 translation.

Once TTBR0 translation is enabled, the GPU's CP (Command Processor)
will directly switch TTBR0 pgtables (and do the necessary TLB inv)
synchronized to the GPU's operation.  But help from the SMMU driver
is needed to initially bootstrap TTBR0 translation, which cannot be
done from the GPU.

Since this is a very special case, a private interface is used to
avoid adding highly driver specific things to the public iommu
interface.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
---
 include/linux/adreno-smmu-priv.h | 36 
 1 file changed, 36 insertions(+)
 create mode 100644 include/linux/adreno-smmu-priv.h

diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
new file mode 100644
index ..a889f28afb42
--- /dev/null
+++ b/include/linux/adreno-smmu-priv.h
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google, Inc
+ */
+
+#ifndef __ADRENO_SMMU_PRIV_H
+#define __ADRENO_SMMU_PRIV_H
+
+#include 
+
+/**
+ * struct adreno_smmu_priv - private interface between adreno-smmu and GPU
+ *
+ * @cookie:An opque token provided by adreno-smmu and passed
+ * back into the callbacks
+ * @get_ttbr1_cfg: Get the TTBR1 config for the GPUs context-bank
+ * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
+ * NULL config disables TTBR0 translation, otherwise
+ * TTBR0 translation is enabled with the specified cfg
+ *
+ * The GPU driver (drm/msm) and adreno-smmu work together for controlling
+ * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
+ * updating the SMMU for context switches, while on the other hand we do
+ * not want to duplicate all of the initial setup logic from arm-smmu.
+ *
+ * This private interface is used for the two drivers to coordinate.  The
+ * cookie and callback functions are populated when the GPU driver attaches
+ * it's domain.
+ */
+struct adreno_smmu_priv {
+const void *cookie;
+const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
+int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
+};
+
+#endif /* __ADRENO_SMMU_PRIV_H */
\ No newline at end of file
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/20] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Add a special implementation for the SMMU attached to most Adreno GPU
target triggered from the qcom,adreno-smmu compatible string.

The new Adreno SMMU implementation will enable split pagetables
(TTBR1) for the domain attached to the GPU device (SID 0) and
hard code it context bank 0 so the GPU hardware can implement
per-instance pagetables.

Co-developed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 149 -
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |   1 +
 3 files changed, 151 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index 88f17cc33023..d199b4bff15d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -223,6 +223,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
of_device_is_compatible(np, "qcom,sm8250-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
+   return qcom_adreno_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
smmu->impl = _mmu500_impl;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..5640d9960610 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2019, The Linux Foundation. All rights reserved.
  */
 
+#include 
 #include 
 #include 
 
@@ -12,6 +13,132 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
+#define QCOM_ADRENO_SMMU_GPU_SID 0
+
+static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
+{
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   int i;
+
+   /*
+* The GPU will always use SID 0 so that is a handy way to uniquely
+* identify it and configure it for per-instance pagetables
+*/
+   for (i = 0; i < fwspec->num_ids; i++) {
+   u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]);
+
+   if (sid == QCOM_ADRENO_SMMU_GPU_SID)
+   return true;
+   }
+
+   return false;
+}
+
+static const struct io_pgtable_cfg *qcom_adreno_smmu_get_ttbr1_cfg(
+   const void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct io_pgtable *pgtable =
+   io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+   return >cfg;
+}
+
+/*
+ * Local implementation to configure TTBR0 with the specified pagetable config.
+ * The GPU driver will call this to enable TTBR0 when per-instance pagetables
+ * are active
+ */
+
+static int qcom_adreno_smmu_set_ttbr0_cfg(const void *cookie,
+   const struct io_pgtable_cfg *pgtbl_cfg)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct io_pgtable *pgtable = 
io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+   struct arm_smmu_cfg *cfg = _domain->cfg;
+   struct arm_smmu_cb *cb = _domain->smmu->cbs[cfg->cbndx];
+
+   /* The domain must have split pagetables already enabled */
+   if (cb->tcr[0] & ARM_SMMU_TCR_EPD1)
+   return -EINVAL;
+
+   /* If the pagetable config is NULL, disable TTBR0 */
+   if (!pgtbl_cfg) {
+   /* Do nothing if it is already disabled */
+   if ((cb->tcr[0] & ARM_SMMU_TCR_EPD0))
+   return -EINVAL;
+
+   /* Set TCR to the original configuration */
+   cb->tcr[0] = arm_smmu_lpae_tcr(>cfg);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+   } else {
+   u32 tcr = cb->tcr[0];
+
+   /* Don't call this again if TTBR0 is already enabled */
+   if (!(cb->tcr[0] & ARM_SMMU_TCR_EPD0))
+   return -EINVAL;
+
+   tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+   tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+   cb->tcr[0] = tcr;
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+   }
+
+   arm_smmu_write_context_bank(smmu_domain->smmu, cb->cfg->cbndx);
+
+   return 0;
+}
+
+static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain 
*smmu_domain,
+   struct device *dev, int start, int count)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   /*
+* Assign context bank 0 to the GPU device so the GPU hardware can
+* switch pagetables
+*/
+   if (qcom_adreno_smmu_is_gpu_device(dev)) {
+   start = 0;
+   count = 1;
+   } else {
+   start 

[PATCH 07/20] drm/msm: set adreno_smmu as gpu's drvdata

2020-08-24 Thread Rob Clark
From: Rob Clark 

This will be populated by adreno-smmu, to provide a way for coordinating
enabling/disabling TTBR0 translation.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 2 --
 drivers/gpu/drm/msm/msm_gpu.c  | 2 +-
 drivers/gpu/drm/msm/msm_gpu.h  | 6 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 26664e1b30c0..58e03b20e1c7 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -417,8 +417,6 @@ static int adreno_bind(struct device *dev, struct device 
*master, void *data)
return PTR_ERR(gpu);
}
 
-   dev_set_drvdata(dev, gpu);
-
return 0;
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 6aa9e04e52e7..806eb0957280 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -892,7 +892,7 @@ int msm_gpu_init(struct drm_device *drm, struct 
platform_device *pdev,
gpu->gpu_cx = NULL;
 
gpu->pdev = pdev;
-   platform_set_drvdata(pdev, gpu);
+   platform_set_drvdata(pdev, >adreno_smmu);
 
msm_devfreq_init(gpu);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 8bda7beaed4b..f91b141add75 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -7,6 +7,7 @@
 #ifndef __MSM_GPU_H__
 #define __MSM_GPU_H__
 
+#include 
 #include 
 #include 
 #include 
@@ -73,6 +74,8 @@ struct msm_gpu {
struct platform_device *pdev;
const struct msm_gpu_funcs *funcs;
 
+   struct adreno_smmu_priv adreno_smmu;
+
/* performance counters (hw & sw): */
spinlock_t perf_lock;
bool perfcntr_active;
@@ -143,7 +146,8 @@ struct msm_gpu {
 
 static inline struct msm_gpu *dev_to_gpu(struct device *dev)
 {
-   return dev_get_drvdata(dev);
+   struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(dev);
+   return container_of(adreno_smmu, struct msm_gpu, adreno_smmu);
 }
 
 /* It turns out that all targets use the same ringbuffer size */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 00/20] iommu/arm-smmu + drm/msm: per-process GPU pgtables

2020-08-24 Thread Rob Clark
From: Rob Clark 

This series adds an Adreno SMMU implementation to arm-smmu to allow GPU hardware
pagetable switching.

The Adreno GPU has built in capabilities to switch the TTBR0 pagetable during
runtime to allow each individual instance or application to have its own
pagetable.  In order to take advantage of the HW capabilities there are certain
requirements needed of the SMMU hardware.

This series adds support for an Adreno specific arm-smmu implementation. The new
implementation 1) ensures that the GPU domain is always assigned context bank 0,
2) enables split pagetable support (TTBR1) so that the instance specific
pagetable can be swapped while the global memory remains in place and 3) shares
the current pagetable configuration with the GPU driver to allow it to create
its own io-pgtable instances.

The series then adds the drm/msm code to enable these features. For targets that
support it allocate new pagetables using the io-pgtable configuration shared by
the arm-smmu driver and swap them in during runtime.

This version of the series merges the previous patchset(s) [1] and [2]
with the following improvements:

v15: (Respin by Rob)
  - Adjust dt bindings to keep SoC specific compatible (Doug)
  - Add dts workaround for cheza fw limitation
  - Add missing 'select IOMMU_IO_PGTABLE' (Guenter)
v14: (Respin by Rob)
  - Minor update to 16/20 (only force ASID to zero in one place)
  - Addition of sc7180 dtsi patch.
v13: (Respin by Rob)
  - Switch to a private interface between adreno-smmu and GPU driver,
dropping the custom domain attr (Will Deacon)
  - Rework the SCTLR.HUPCF patch to add new fields in smmu_domain->cfg
rather than adding new impl hook (Will Deacon)
  - Drop for_each_cfg_sme() in favor of plain for() loop (Will Deacon)
  - Fix context refcnt'ing issue which was causing problems with GPU
crash recover stress testing.
  - Spiff up $debugfs/gem to show process information associated with
VMAs
v12:
  - Nitpick cleanups in gpu/drm/msm/msm_iommu.c (Rob Clark)
  - Reorg in gpu/drm/msm/msm_gpu.c (Rob Clark)
  - Use the default asid for the context bank so that iommu_tlb_flush_all works
  - Flush the UCHE after a page switch
  - Add the SCTLR.HUPCF patch at the end of the series
v11:
  - Add implementation specific get_attr/set_attr functions (per Rob Clark)
  - Fix context bank allocation (per Bjorn Andersson)
v10:
  - arm-smmu: add implementation hook to allocate context banks
  - arm-smmu: Match the GPU domain by stream ID instead of compatible string
  - arm-smmu: Make DOMAIN_ATTR_PGTABLE_CFG bi-directional. The leaf driver
queries the configuration to create a pagetable and then sends the newly
created configuration back to the smmu-driver to enable TTBR0
  - drm/msm: Add context reference counting for submissions
  - drm/msm: Use dummy functions to skip TLB operations on per-instance
pagetables

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-June/045653.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2020-June/045659.html

Jordan Crouse (12):
  iommu/arm-smmu: Pass io-pgtable config to implementation specific
function
  iommu/arm-smmu: Add support for split pagetables
  iommu/arm-smmu: Prepare for the adreno-smmu implementation
  iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU
  dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  drm/msm: Add a context pointer to the submitqueue
  drm/msm: Drop context arg to gpu->submit()
  drm/msm: Set the global virtual address range from the IOMMU domain
  drm/msm: Add support to create a local pagetable
  drm/msm: Add support for private address space instances
  drm/msm/a6xx: Add support for per-instance pagetables
  arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

Rob Clark (8):
  drm/msm: remove dangling submitqueue references
  iommu: add private interface for adreno-smmu
  drm/msm/gpu: add dev_to_gpu() helper
  drm/msm: set adreno_smmu as gpu's drvdata
  iommu/arm-smmu: constify some helpers
  arm: dts: qcom: sc7180: Set the compatible string for the GPU SMMU
  iommu/arm-smmu: add a way for implementations to influence SCTLR
  drm/msm: show process names in gem_describe

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   9 +-
 arch/arm64/boot/dts/qcom/sc7180.dtsi  |   2 +-
 arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi|   9 +
 arch/arm64/boot/dts/qcom/sdm845.dtsi  |   2 +-
 drivers/gpu/drm/msm/Kconfig   |   1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  12 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  68 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
 drivers/gpu/drm/msm/adreno/adreno_device.c|  12 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |  18 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   3 +-
 drivers/gpu/drm/msm/msm_drv.c |  16 +-
 drivers/gpu/drm/msm/msm_drv.h |  25 +++
 drivers/gpu/drm/msm/msm_gem.c |  25 ++-
 

[PATCH 11/20] drm/msm: Add a context pointer to the submitqueue

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Each submitqueue is attached to a context. Add a pointer to the
context to the submitqueue at create time and refcount it so
that it stays around through the life of the queue.

Co-developed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c |  3 ++-
 drivers/gpu/drm/msm/msm_drv.h | 20 
 drivers/gpu/drm/msm/msm_gem.h |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c  |  6 +++---
 drivers/gpu/drm/msm/msm_gpu.h |  1 +
 drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
 6 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 79333842f70a..75cd7639f560 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -594,6 +594,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
if (!ctx)
return -ENOMEM;
 
+   kref_init(>ref);
msm_submitqueue_init(dev, ctx);
 
ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
@@ -615,7 +616,7 @@ static int msm_open(struct drm_device *dev, struct drm_file 
*file)
 static void context_close(struct msm_file_private *ctx)
 {
msm_submitqueue_close(ctx);
-   kfree(ctx);
+   msm_file_private_put(ctx);
 }
 
 static void msm_postclose(struct drm_device *dev, struct drm_file *file)
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index af259b0573ea..4561bfb5e745 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -57,6 +57,7 @@ struct msm_file_private {
struct list_head submitqueues;
int queueid;
struct msm_gem_address_space *aspace;
+   struct kref ref;
 };
 
 enum msm_mdp_plane_property {
@@ -428,6 +429,25 @@ void msm_submitqueue_close(struct msm_file_private *ctx);
 
 void msm_submitqueue_destroy(struct kref *kref);
 
+static inline void __msm_file_private_destroy(struct kref *kref)
+{
+   struct msm_file_private *ctx = container_of(kref,
+   struct msm_file_private, ref);
+
+   kfree(ctx);
+}
+
+static inline void msm_file_private_put(struct msm_file_private *ctx)
+{
+   kref_put(>ref, __msm_file_private_destroy);
+}
+
+static inline struct msm_file_private *msm_file_private_get(
+   struct msm_file_private *ctx)
+{
+   kref_get(>ref);
+   return ctx;
+}
 
 #define DBG(fmt, ...) DRM_DEBUG_DRIVER(fmt"\n", ##__VA_ARGS__)
 #define VERB(fmt, ...) if (0) DRM_DEBUG_DRIVER(fmt"\n", ##__VA_ARGS__)
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 972490b14ba5..9c573c4269cb 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -142,6 +142,7 @@ struct msm_gem_submit {
bool valid; /* true if no cmdstream patching needed */
bool in_rb; /* "sudo" mode, copy cmds into RB */
struct msm_ringbuffer *ring;
+   struct msm_file_private *ctx;
unsigned int nr_cmds;
unsigned int nr_bos;
u32 ident; /* A "identifier" for the submit for logging */
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 8cb9aa15ff90..1464b04d25d3 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -27,7 +27,7 @@
 #define BO_PINNED   0x2000
 
 static struct msm_gem_submit *submit_create(struct drm_device *dev,
-   struct msm_gpu *gpu, struct msm_gem_address_space *aspace,
+   struct msm_gpu *gpu,
struct msm_gpu_submitqueue *queue, uint32_t nr_bos,
uint32_t nr_cmds)
 {
@@ -43,7 +43,7 @@ static struct msm_gem_submit *submit_create(struct drm_device 
*dev,
return NULL;
 
submit->dev = dev;
-   submit->aspace = aspace;
+   submit->aspace = queue->ctx->aspace;
submit->gpu = gpu;
submit->fence = NULL;
submit->cmd = (void *)>bos[nr_bos];
@@ -677,7 +677,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
}
}
 
-   submit = submit_create(dev, gpu, ctx->aspace, queue, args->nr_bos,
+   submit = submit_create(dev, gpu, queue, args->nr_bos,
args->nr_cmds);
if (!submit) {
ret = -ENOMEM;
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index f91b141add75..97c527e98391 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -190,6 +190,7 @@ struct msm_gpu_submitqueue {
u32 flags;
u32 prio;
int faults;
+   struct msm_file_private *ctx;
struct list_head node;
struct kref ref;
 };
diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
b/drivers/gpu/drm/msm/msm_submitqueue.c
index 90c9d84e6155..c3d206105d28 100644
--- a/drivers/gpu/drm/msm/msm_submitqueue.c
+++ b/drivers/gpu/drm/msm/msm_submitqueue.c
@@ -12,6 +12,8 @@ void 

[PATCH 13/20] drm/msm: Set the global virtual address range from the IOMMU domain

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Use the aperture settings from the IOMMU domain to set up the virtual
address range for the GPU. This allows us to transparently deal with
IOMMU side features (like split pagetables).

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++--
 drivers/gpu/drm/msm/msm_iommu.c |  7 +++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 533a34b4cce2..34e6242c1767 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu,
struct iommu_domain *iommu = iommu_domain_alloc(_bus_type);
struct msm_mmu *mmu = msm_iommu_new(>dev, iommu);
struct msm_gem_address_space *aspace;
+   u64 start, size;
 
-   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
-   0x - SZ_16M);
+   /*
+* Use the aperture start or SZ_16M, whichever is greater. This will
+* ensure that we align with the allocated pagetable range while still
+* allowing room in the lower 32 bits for GMEM and whatnot
+*/
+   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
+   size = iommu->geometry.aperture_end - start + 1;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu",
+   start & GENMASK(48, 0), size);
 
if (IS_ERR(aspace) && !IS_ERR(mmu))
mmu->funcs->destroy(mmu);
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 3a381a9674c9..1b6635504069 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
struct msm_iommu *iommu = to_msm_iommu(mmu);
size_t ret;
 
+   /* The arm-smmu driver expects the addresses to be sign extended */
+   if (iova & BIT_ULL(48))
+   iova |= GENMASK_ULL(63, 49);
+
ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
WARN_ON(!ret);
 
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t 
iova, size_t len)
 {
struct msm_iommu *iommu = to_msm_iommu(mmu);
 
+   if (iova & BIT_ULL(48))
+   iova |= GENMASK_ULL(63, 49);
+
iommu_unmap(iommu->domain, iova, len);
 
return 0;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 14/20] drm/msm: Add support to create a local pagetable

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Add support to create a io-pgtable for use by targets that support
per-instance pagetables. In order to support per-instance pagetables the
GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
split pagetables enabled.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/Kconfig  |   1 +
 drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c  | 199 ++-
 drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
 4 files changed, 215 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 6deaa7d01654..5102a58830b9 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -8,6 +8,7 @@ config DRM_MSM
depends on MMU
depends on INTERCONNECT || !INTERCONNECT
depends on QCOM_OCMEM || QCOM_OCMEM=n
+   select IOMMU_IO_PGTABLE
select QCOM_MDT_LOADER if ARCH_QCOM
select REGULATOR
select DRM_KMS_HELPER
diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..aab121f4beb7 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct 
msm_gpu *gpu)
}
 
gpummu->gpu = gpu;
-   msm_mmu_init(>base, dev, );
+   msm_mmu_init(>base, dev, , MSM_MMU_GPUMMU);
 
return >base;
 }
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 1b6635504069..697cc0a059d6 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -4,15 +4,210 @@
  * Author: Rob Clark 
  */
 
+#include 
+#include 
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
 struct msm_iommu {
struct msm_mmu base;
struct iommu_domain *domain;
+   atomic_t pagetables;
 };
+
 #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
 
+struct msm_iommu_pagetable {
+   struct msm_mmu base;
+   struct msm_mmu *parent;
+   struct io_pgtable_ops *pgtbl_ops;
+   phys_addr_t ttbr;
+   u32 asid;
+};
+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
+{
+   return container_of(mmu, struct msm_iommu_pagetable, base);
+}
+
+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
+   size_t size)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   size_t unmapped = 0;
+
+   /* Unmap the block one page at a time */
+   while (size) {
+   unmapped += ops->unmap(ops, iova, 4096, NULL);
+   iova += 4096;
+   size -= 4096;
+   }
+
+   iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
+
+   return (unmapped == size) ? 0 : -EINVAL;
+}
+
+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
+   struct sg_table *sgt, size_t len, int prot)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   struct scatterlist *sg;
+   size_t mapped = 0;
+   u64 addr = iova;
+   unsigned int i;
+
+   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+   size_t size = sg->length;
+   phys_addr_t phys = sg_phys(sg);
+
+   /* Map the block one page at a time */
+   while (size) {
+   if (ops->map(ops, addr, phys, 4096, prot, GFP_KERNEL)) {
+   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   return -EINVAL;
+   }
+
+   phys += 4096;
+   addr += 4096;
+   size -= 4096;
+   mapped += 4096;
+   }
+   }
+
+   return 0;
+}
+
+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct msm_iommu *iommu = to_msm_iommu(pagetable->parent);
+   struct adreno_smmu_priv *adreno_smmu =
+   dev_get_drvdata(pagetable->parent->dev);
+
+   /*
+* If this is the last attached pagetable for the parent,
+* disable TTBR0 in the arm-smmu driver
+*/
+   if (atomic_dec_return(>pagetables) == 0)
+   adreno_smmu->set_ttbr0_cfg(adreno_smmu->cookie, NULL);
+
+   free_io_pgtable_ops(pagetable->pgtbl_ops);
+   kfree(pagetable);
+}
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu,
+   phys_addr_t *ttbr, int *asid)
+{
+   struct msm_iommu_pagetable *pagetable;
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (ttbr)
+   *ttbr = pagetable->ttbr;
+
+   if (asid)
+   *asid = pagetable->asid;
+
+   return 0;

[PATCH 19/20] iommu/arm-smmu: add a way for implementations to influence SCTLR

2020-08-24 Thread Rob Clark
From: Rob Clark 

For the Adreno GPU's SMMU, we want SCTLR.HUPCF set to ensure that
pending translations are not terminated on iova fault.  Otherwise
a terminated CP read could hang the GPU by returning invalid
command-stream data.

Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 3 +++
 drivers/iommu/arm/arm-smmu/arm-smmu.h  | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 5640d9960610..2aa6249050ff 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -127,6 +127,12 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
(smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
 
+   /*
+* On the GPU device we want to process subsequent transactions after a
+* fault to keep the GPU from hanging
+*/
+   smmu_domain->cfg.sctlr_set |= ARM_SMMU_SCTLR_HUPCF;
+
/*
 * Initialize private interface with GPU:
 */
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index e63a480d7f71..bbec5793faf8 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -617,6 +617,9 @@ void arm_smmu_write_context_bank(struct arm_smmu_device 
*smmu, int idx)
if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
reg |= ARM_SMMU_SCTLR_E;
 
+   reg |= cfg->sctlr_set;
+   reg &= ~cfg->sctlr_clr;
+
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index cd75a33967bb..2df3a70a8a41 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -144,6 +144,7 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_SCTLR  0x0
 #define ARM_SMMU_SCTLR_S1_ASIDPNE  BIT(12)
 #define ARM_SMMU_SCTLR_CFCFG   BIT(7)
+#define ARM_SMMU_SCTLR_HUPCF   BIT(8)
 #define ARM_SMMU_SCTLR_CFIEBIT(6)
 #define ARM_SMMU_SCTLR_CFREBIT(5)
 #define ARM_SMMU_SCTLR_E   BIT(4)
@@ -341,6 +342,8 @@ struct arm_smmu_cfg {
u16 asid;
u16 vmid;
};
+   u32 sctlr_set;/* extra bits to set in 
SCTLR */
+   u32 sctlr_clr;/* bits to mask in SCTLR 
*/
enum arm_smmu_cbar_type cbar;
enum arm_smmu_context_fmt   fmt;
 };
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/20] drm/msm: Drop context arg to gpu->submit()

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Now that we can get the ctx from the submitqueue, the extra arg is
redundant.

Signed-off-by: Jordan Crouse 
[split out of previous patch to reduce churny noise]
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 12 +---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  5 ++---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  5 ++---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  3 +--
 drivers/gpu/drm/msm/msm_gem_submit.c|  2 +-
 drivers/gpu/drm/msm/msm_gpu.c   |  9 -
 drivers/gpu/drm/msm/msm_gpu.h   |  6 ++
 7 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 9e63a190642c..eff2439ea57b 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -43,8 +43,7 @@ static void a5xx_flush(struct msm_gpu *gpu, struct 
msm_ringbuffer *ring)
gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
 }
 
-static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit 
*submit,
-   struct msm_file_private *ctx)
+static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit 
*submit)
 {
struct msm_drm_private *priv = gpu->dev->dev_private;
struct msm_ringbuffer *ring = submit->ring;
@@ -57,7 +56,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct 
msm_gem_submit *submit
case MSM_SUBMIT_CMD_IB_TARGET_BUF:
break;
case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-   if (priv->lastctx == ctx)
+   if (priv->lastctx == submit->queue->ctx)
break;
/* fall-thru */
case MSM_SUBMIT_CMD_BUF:
@@ -103,8 +102,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct 
msm_gem_submit *submit
msm_gpu_retire(gpu);
 }
 
-static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-   struct msm_file_private *ctx)
+static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
@@ -114,7 +112,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
 
if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
priv->lastctx = NULL;
-   a5xx_submit_in_rb(gpu, submit, ctx);
+   a5xx_submit_in_rb(gpu, submit);
return;
}
 
@@ -148,7 +146,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
case MSM_SUBMIT_CMD_IB_TARGET_BUF:
break;
case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-   if (priv->lastctx == ctx)
+   if (priv->lastctx == submit->queue->ctx)
break;
/* fall-thru */
case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c5a3e4d4c007..5eabb0109577 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -81,8 +81,7 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
u32 counter,
OUT_RING(ring, upper_32_bits(iova));
 }
 
-static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-   struct msm_file_private *ctx)
+static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
struct msm_drm_private *priv = gpu->dev->dev_private;
@@ -115,7 +114,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
case MSM_SUBMIT_CMD_IB_TARGET_BUF:
break;
case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-   if (priv->lastctx == ctx)
+   if (priv->lastctx == submit->queue->ctx)
break;
/* fall-thru */
case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index d2dbb6968cba..533a34b4cce2 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -457,8 +457,7 @@ void adreno_recover(struct msm_gpu *gpu)
}
 }
 
-void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-   struct msm_file_private *ctx)
+void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct msm_drm_private *priv = gpu->dev->dev_private;
@@ -472,7 +471,7 @@ void adreno_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
break;
case 

[PATCH 18/20] arm: dts: qcom: sc7180: Set the compatible string for the GPU SMMU

2020-08-24 Thread Rob Clark
From: Rob Clark 

Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables and per-instance pagetables for drm/msm.

Signed-off-by: Rob Clark 
---
 arch/arm64/boot/dts/qcom/sc7180.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi 
b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index d46b3833e52f..f3bef1cad889 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -1937,7 +1937,7 @@ opp-18000 {
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sc7180-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,sc7180-smmu-v2", "qcom,adreno-smmu", 
"qcom,smmu-v2";
reg = <0 0x0504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 17/20] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables and per-instance pagetables for drm/msm.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi | 9 +
 arch/arm64/boot/dts/qcom/sdm845.dtsi   | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
index 64fc1bfd66fa..39f23cdcbd02 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
@@ -633,6 +633,15 @@ _mdp {
status = "okay";
 };
 
+/*
+ * Cheza fw does not properly program the GPU aperture to allow the
+ * GPU to update the SMMU pagetables for context switches.  Work
+ * around this by dropping the "qcom,adreno-smmu" compat string.
+ */
+_smmu {
+   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+};
+
 _pil {
iommus = <_smmu 0x781 0x0>,
 <_smmu 0x724 0x3>;
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 2884577dcb77..76a8a34640ae 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -4058,7 +4058,7 @@ opp-25700 {
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,sdm845-smmu-v2", "qcom,adreno-smmu", 
"qcom,smmu-v2";
reg = <0 0x504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 15/20] drm/msm: Add support for private address space instances

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Add support for allocating private address space instances. Targets that
support per-context pagetables should implement their own function to
allocate private address spaces.

The default will return a pointer to the global address space.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c | 13 +++--
 drivers/gpu/drm/msm/msm_drv.h |  5 +
 drivers/gpu/drm/msm/msm_gem_vma.c |  9 +
 drivers/gpu/drm/msm/msm_gpu.c | 22 ++
 drivers/gpu/drm/msm/msm_gpu.h |  5 +
 5 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 75cd7639f560..7e963f707852 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -597,7 +597,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
kref_init(>ref);
msm_submitqueue_init(dev, ctx);
 
-   ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
+   ctx->aspace = msm_gpu_create_private_address_space(priv->gpu);
file->driver_priv = ctx;
 
return 0;
@@ -780,18 +780,19 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, 
void *data,
 }
 
 static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-   struct drm_gem_object *obj, uint64_t *iova)
+   struct drm_file *file, struct drm_gem_object *obj,
+   uint64_t *iova)
 {
-   struct msm_drm_private *priv = dev->dev_private;
+   struct msm_file_private *ctx = file->driver_priv;
 
-   if (!priv->gpu)
+   if (!ctx->aspace)
return -EINVAL;
 
/*
 * Don't pin the memory here - just get an address so that userspace can
 * be productive
 */
-   return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
+   return msm_gem_get_iova(obj, ctx->aspace, iova);
 }
 
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
@@ -830,7 +831,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
args->value = msm_gem_mmap_offset(obj);
break;
case MSM_INFO_GET_IOVA:
-   ret = msm_ioctl_gem_info_iova(dev, obj, >value);
+   ret = msm_ioctl_gem_info_iova(dev, file, obj, >value);
break;
case MSM_INFO_SET_NAME:
/* length check should leave room for terminating null: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 4561bfb5e745..2ca9c3c03845 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -249,6 +249,10 @@ int msm_gem_map_vma(struct msm_gem_address_space *aspace,
 void msm_gem_close_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma);
 
+
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace);
+
 void msm_gem_address_space_put(struct msm_gem_address_space *aspace);
 
 struct msm_gem_address_space *
@@ -434,6 +438,7 @@ static inline void __msm_file_private_destroy(struct kref 
*kref)
struct msm_file_private *ctx = container_of(kref,
struct msm_file_private, ref);
 
+   msm_gem_address_space_put(ctx->aspace);
kfree(ctx);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 5f6a11211b64..29cc1305cf37 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -27,6 +27,15 @@ void msm_gem_address_space_put(struct msm_gem_address_space 
*aspace)
kref_put(>kref, msm_gem_address_space_destroy);
 }
 
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace)
+{
+   if (!IS_ERR_OR_NULL(aspace))
+   kref_get(>kref);
+
+   return aspace;
+}
+
 /* Actually unmap memory for the vma */
 void msm_gem_purge_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma)
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index e1a3cbe25a0c..951850804d77 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -823,6 +823,28 @@ static int get_clocks(struct platform_device *pdev, struct 
msm_gpu *gpu)
return 0;
 }
 
+/* Return a new address space for a msm_drm_private instance */
+struct msm_gem_address_space *
+msm_gpu_create_private_address_space(struct msm_gpu *gpu)
+{
+   struct msm_gem_address_space *aspace = NULL;
+
+   if (!gpu)
+   return NULL;
+
+   /*
+* If the target doesn't support private address spaces then return
+* the global one
+*/
+   if (gpu->funcs->create_private_address_space)
+   aspace = gpu->funcs->create_private_address_space(gpu);
+
+   if (IS_ERR_OR_NULL(aspace))
+   aspace = msm_gem_address_space_get(gpu->aspace);
+
+   return aspace;
+}
+
 int msm_gpu_init(struct 

[PATCH 20/20] drm/msm: show process names in gem_describe

2020-08-24 Thread Rob Clark
From: Rob Clark 

In $debugfs/gem we already show any vma(s) associated with an object.
Also show process names if the vma's address space is a per-process
address space.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
---
 drivers/gpu/drm/msm/msm_drv.c |  2 +-
 drivers/gpu/drm/msm/msm_gem.c | 25 +
 drivers/gpu/drm/msm/msm_gem.h |  5 +
 drivers/gpu/drm/msm/msm_gem_vma.c |  1 +
 drivers/gpu/drm/msm/msm_gpu.c |  8 +---
 drivers/gpu/drm/msm/msm_gpu.h |  2 +-
 6 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 7e963f707852..7143756b7e83 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -597,7 +597,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
kref_init(>ref);
msm_submitqueue_init(dev, ctx);
 
-   ctx->aspace = msm_gpu_create_private_address_space(priv->gpu);
+   ctx->aspace = msm_gpu_create_private_address_space(priv->gpu, current);
file->driver_priv = ctx;
 
return 0;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 3cb7aeb93fd3..76a6c5271e57 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -842,11 +842,28 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m)
 
seq_puts(m, "  vmas:");
 
-   list_for_each_entry(vma, _obj->vmas, list)
-   seq_printf(m, " [%s: %08llx,%s,inuse=%d]",
-   vma->aspace != NULL ? vma->aspace->name : NULL,
-   vma->iova, vma->mapped ? "mapped" : "unmapped",
+   list_for_each_entry(vma, _obj->vmas, list) {
+   const char *name, *comm;
+   if (vma->aspace) {
+   struct msm_gem_address_space *aspace = 
vma->aspace;
+   struct task_struct *task =
+   get_pid_task(aspace->pid, PIDTYPE_PID);
+   if (task) {
+   comm = kstrdup(task->comm, GFP_KERNEL);
+   } else {
+   comm = NULL;
+   }
+   name = aspace->name;
+   } else {
+   name = comm = NULL;
+   }
+   seq_printf(m, " [%s%s%s: aspace=%p, 
%08llx,%s,inuse=%d]",
+   name, comm ? ":" : "", comm ? comm : "",
+   vma->aspace, vma->iova,
+   vma->mapped ? "mapped" : "unmapped",
vma->inuse);
+   kfree(comm);
+   }
 
seq_puts(m, "\n");
}
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 9c573c4269cb..7b1c7a5f8eef 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -24,6 +24,11 @@ struct msm_gem_address_space {
spinlock_t lock; /* Protects drm_mm node allocation/removal */
struct msm_mmu *mmu;
struct kref kref;
+
+   /* For address spaces associated with a specific process, this
+* will be non-NULL:
+*/
+   struct pid *pid;
 };
 
 struct msm_gem_vma {
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 29cc1305cf37..80a8a266d68f 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -17,6 +17,7 @@ msm_gem_address_space_destroy(struct kref *kref)
drm_mm_takedown(>mm);
if (aspace->mmu)
aspace->mmu->funcs->destroy(aspace->mmu);
+   put_pid(aspace->pid);
kfree(aspace);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 951850804d77..ac8961187a73 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -825,10 +825,9 @@ static int get_clocks(struct platform_device *pdev, struct 
msm_gpu *gpu)
 
 /* Return a new address space for a msm_drm_private instance */
 struct msm_gem_address_space *
-msm_gpu_create_private_address_space(struct msm_gpu *gpu)
+msm_gpu_create_private_address_space(struct msm_gpu *gpu, struct task_struct 
*task)
 {
struct msm_gem_address_space *aspace = NULL;
-
if (!gpu)
return NULL;
 
@@ -836,8 +835,11 @@ msm_gpu_create_private_address_space(struct msm_gpu *gpu)
 * If the target doesn't support private address spaces then return
 * the global one
 */
-   if (gpu->funcs->create_private_address_space)
+   if (gpu->funcs->create_private_address_space) {
aspace = gpu->funcs->create_private_address_space(gpu);
+   if (!IS_ERR(aspace))
+   

[PATCH 10/20] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Every Qcom Adreno GPU has an embedded SMMU for its own use. These
devices depend on unique features such as split pagetables,
different stall/halt requirements and other settings. Identify them
with a compatible string so that they can be identified in the
arm-smmu implementation specific code.

Signed-off-by: Jordan Crouse 
Reviewed-by: Rob Herring 
Signed-off-by: Rob Clark 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 503160a7b9a0..3b63f2ae24db 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -28,8 +28,6 @@ properties:
   - enum:
   - qcom,msm8996-smmu-v2
   - qcom,msm8998-smmu-v2
-  - qcom,sc7180-smmu-v2
-  - qcom,sdm845-smmu-v2
   - const: qcom,smmu-v2
 
   - description: Qcom SoCs implementing "arm,mmu-500"
@@ -40,6 +38,13 @@ properties:
   - qcom,sm8150-smmu-500
   - qcom,sm8250-smmu-500
   - const: arm,mmu-500
+  - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
+items:
+  - enum:
+  - qcom,sc7180-smmu-v2
+  - qcom,sdm845-smmu-v2
+  - const: qcom,adreno-smmu
+  - const: qcom,smmu-v2
   - description: Marvell SoCs implementing "arm,mmu-500"
 items:
   - const: marvell,ap806-smmu-500
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/20] iommu/arm-smmu: Add support for split pagetables

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Enable TTBR1 for a context bank if IO_PGTABLE_QUIRK_ARM_TTBR1 is selected
by the io-pgtable configuration.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 21 -
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 25 +++--
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 37d8d49299b4..976d43a7f2ff 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -552,11 +552,15 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+   cfg->asid);
cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-cfg->asid);
+   cfg->asid);
+
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   cb->ttbr[1] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   else
+   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -822,7 +826,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   }
+
domain->geometry.force_aperture = true;
 
/* Initialise the context bank with our page table cfg */
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 83294516ac08..f3e456893f28 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -169,10 +169,12 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_TCR0x30
 #define ARM_SMMU_TCR_EAE   BIT(31)
 #define ARM_SMMU_TCR_EPD1  BIT(23)
+#define ARM_SMMU_TCR_A1BIT(22)
 #define ARM_SMMU_TCR_TG0   GENMASK(15, 14)
 #define ARM_SMMU_TCR_SH0   GENMASK(13, 12)
 #define ARM_SMMU_TCR_ORGN0 GENMASK(11, 10)
 #define ARM_SMMU_TCR_IRGN0 GENMASK(9, 8)
+#define ARM_SMMU_TCR_EPD0  BIT(7)
 #define ARM_SMMU_TCR_T0SZ  GENMASK(5, 0)
 
 #define ARM_SMMU_VTCR_RES1 BIT(31)
@@ -350,12 +352,23 @@ struct arm_smmu_domain {
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
-   return ARM_SMMU_TCR_EPD1 |
-  FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
-  FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
-  FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
-  FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
-  FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+   u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+   FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+   FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+   FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+   FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+
+   /*
+   * When TTBR1 is selected shift the TCR fields by 16 bits and disable
+   * translation in TTBR0
+   */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   tcr = (tcr << 16) & ~ARM_SMMU_TCR_A1;
+   tcr |= ARM_SMMU_TCR_EPD0;
+   } else
+   tcr |= ARM_SMMU_TCR_EPD1;
+
+   return tcr;
 }
 
 static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/20] drm/msm: remove dangling submitqueue references

2020-08-24 Thread Rob Clark
From: Rob Clark 

Currently it doesn't matter, since we free the ctx immediately.  But
when we start refcnt'ing the ctx, we don't want old dangling list
entries to hang around.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
---
 drivers/gpu/drm/msm/msm_submitqueue.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
b/drivers/gpu/drm/msm/msm_submitqueue.c
index a1d94be7883a..90c9d84e6155 100644
--- a/drivers/gpu/drm/msm/msm_submitqueue.c
+++ b/drivers/gpu/drm/msm/msm_submitqueue.c
@@ -49,8 +49,10 @@ void msm_submitqueue_close(struct msm_file_private *ctx)
 * No lock needed in close and there won't
 * be any more user ioctls coming our way
 */
-   list_for_each_entry_safe(entry, tmp, >submitqueues, node)
+   list_for_each_entry_safe(entry, tmp, >submitqueues, node) {
+   list_del(>node);
msm_submitqueue_put(entry);
+   }
 }
 
 int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private 
*ctx,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/20] iommu/arm-smmu: constify some helpers

2020-08-24 Thread Rob Clark
From: Rob Clark 

Sprinkle a few `const`s where helpers don't need write access.

Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 59ff3fc5c6c8..27c8fc50 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -377,7 +377,7 @@ struct arm_smmu_master_cfg {
s16 smendx[];
 };
 
-static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
+static inline u32 arm_smmu_lpae_tcr(const struct io_pgtable_cfg *cfg)
 {
u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
@@ -398,13 +398,13 @@ static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg 
*cfg)
return tcr;
 }
 
-static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
+static inline u32 arm_smmu_lpae_tcr2(const struct io_pgtable_cfg *cfg)
 {
return FIELD_PREP(ARM_SMMU_TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
   FIELD_PREP(ARM_SMMU_TCR2_SEP, ARM_SMMU_TCR2_SEP_UPSTREAM);
 }
 
-static inline u32 arm_smmu_lpae_vtcr(struct io_pgtable_cfg *cfg)
+static inline u32 arm_smmu_lpae_vtcr(const struct io_pgtable_cfg *cfg)
 {
return ARM_SMMU_VTCR_RES1 |
   FIELD_PREP(ARM_SMMU_VTCR_PS, cfg->arm_lpae_s2_cfg.vtcr.ps) |
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/20] drm/msm/gpu: add dev_to_gpu() helper

2020-08-24 Thread Rob Clark
From: Rob Clark 

In a later patch, the drvdata will not directly be 'struct msm_gpu *',
so add a helper to reduce the churn.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 10 --
 drivers/gpu/drm/msm/msm_gpu.c  |  6 +++---
 drivers/gpu/drm/msm/msm_gpu.h  |  5 +
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 9eeb46bf2a5d..26664e1b30c0 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -282,7 +282,7 @@ struct msm_gpu *adreno_load_gpu(struct drm_device *dev)
int ret;
 
if (pdev)
-   gpu = platform_get_drvdata(pdev);
+   gpu = dev_to_gpu(>dev);
 
if (!gpu) {
dev_err_once(dev->dev, "no GPU device was found\n");
@@ -425,7 +425,7 @@ static int adreno_bind(struct device *dev, struct device 
*master, void *data)
 static void adreno_unbind(struct device *dev, struct device *master,
void *data)
 {
-   struct msm_gpu *gpu = dev_get_drvdata(dev);
+   struct msm_gpu *gpu = dev_to_gpu(dev);
 
pm_runtime_force_suspend(dev);
gpu->funcs->destroy(gpu);
@@ -490,16 +490,14 @@ static const struct of_device_id dt_match[] = {
 #ifdef CONFIG_PM
 static int adreno_resume(struct device *dev)
 {
-   struct platform_device *pdev = to_platform_device(dev);
-   struct msm_gpu *gpu = platform_get_drvdata(pdev);
+   struct msm_gpu *gpu = dev_to_gpu(dev);
 
return gpu->funcs->pm_resume(gpu);
 }
 
 static int adreno_suspend(struct device *dev)
 {
-   struct platform_device *pdev = to_platform_device(dev);
-   struct msm_gpu *gpu = platform_get_drvdata(pdev);
+   struct msm_gpu *gpu = dev_to_gpu(dev);
 
return gpu->funcs->pm_suspend(gpu);
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index d5645472b25d..6aa9e04e52e7 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -24,7 +24,7 @@
 static int msm_devfreq_target(struct device *dev, unsigned long *freq,
u32 flags)
 {
-   struct msm_gpu *gpu = platform_get_drvdata(to_platform_device(dev));
+   struct msm_gpu *gpu = dev_to_gpu(dev);
struct dev_pm_opp *opp;
 
opp = devfreq_recommended_opp(dev, freq, flags);
@@ -45,7 +45,7 @@ static int msm_devfreq_target(struct device *dev, unsigned 
long *freq,
 static int msm_devfreq_get_dev_status(struct device *dev,
struct devfreq_dev_status *status)
 {
-   struct msm_gpu *gpu = platform_get_drvdata(to_platform_device(dev));
+   struct msm_gpu *gpu = dev_to_gpu(dev);
ktime_t time;
 
if (gpu->funcs->gpu_get_freq)
@@ -64,7 +64,7 @@ static int msm_devfreq_get_dev_status(struct device *dev,
 
 static int msm_devfreq_get_cur_freq(struct device *dev, unsigned long *freq)
 {
-   struct msm_gpu *gpu = platform_get_drvdata(to_platform_device(dev));
+   struct msm_gpu *gpu = dev_to_gpu(dev);
 
if (gpu->funcs->gpu_get_freq)
*freq = gpu->funcs->gpu_get_freq(gpu);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 0db117a7339b..8bda7beaed4b 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -141,6 +141,11 @@ struct msm_gpu {
struct msm_gpu_state *crashstate;
 };
 
+static inline struct msm_gpu *dev_to_gpu(struct device *dev)
+{
+   return dev_get_drvdata(dev);
+}
+
 /* It turns out that all targets use the same ringbuffer size */
 #define MSM_GPU_RINGBUFFER_SZ SZ_32K
 #define MSM_GPU_RINGBUFFER_BLKSIZE 32
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 16/20] drm/msm/a6xx: Add support for per-instance pagetables

2020-08-24 Thread Rob Clark
From: Jordan Crouse 

Add support for using per-instance pagetables if all the dependencies are
available.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Akhil P Oommen 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 63 +++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
 drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
 3 files changed, 65 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5eabb0109577..d7ad6c78d787 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -81,6 +81,49 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
u32 counter,
OUT_RING(ring, upper_32_bits(iova));
 }
 
+static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
+   struct msm_ringbuffer *ring, struct msm_file_private *ctx)
+{
+   phys_addr_t ttbr;
+   u32 asid;
+   u64 memptr = rbmemptr(ring, ttbr0);
+
+   if (ctx == a6xx_gpu->cur_ctx)
+   return;
+
+   if (msm_iommu_pagetable_params(ctx->aspace->mmu, , ))
+   return;
+
+   /* Execute the table update */
+   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
+   OUT_RING(ring, CP_SMMU_TABLE_UPDATE_0_TTBR0_LO(lower_32_bits(ttbr)));
+
+   OUT_RING(ring,
+   CP_SMMU_TABLE_UPDATE_1_TTBR0_HI(upper_32_bits(ttbr)) |
+   CP_SMMU_TABLE_UPDATE_1_ASID(asid));
+   OUT_RING(ring, CP_SMMU_TABLE_UPDATE_2_CONTEXTIDR(0));
+   OUT_RING(ring, CP_SMMU_TABLE_UPDATE_3_CONTEXTBANK(0));
+
+   /*
+* Write the new TTBR0 to the memstore. This is good for debugging.
+*/
+   OUT_PKT7(ring, CP_MEM_WRITE, 4);
+   OUT_RING(ring, CP_MEM_WRITE_0_ADDR_LO(lower_32_bits(memptr)));
+   OUT_RING(ring, CP_MEM_WRITE_1_ADDR_HI(upper_32_bits(memptr)));
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, (asid << 16) | upper_32_bits(ttbr));
+
+   /*
+* And finally, trigger a uche flush to be sure there isn't anything
+* lingering in that part of the GPU
+*/
+
+   OUT_PKT7(ring, CP_EVENT_WRITE, 1);
+   OUT_RING(ring, 0x31);
+
+   a6xx_gpu->cur_ctx = ctx;
+}
+
 static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
@@ -90,6 +133,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
struct msm_ringbuffer *ring = submit->ring;
unsigned int i;
 
+   a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
+
get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
rbmemptr_stats(ring, index, cpcycles_start));
 
@@ -696,6 +741,8 @@ static int a6xx_hw_init(struct msm_gpu *gpu)
/* Always come up on rb 0 */
a6xx_gpu->cur_ring = gpu->rb[0];
 
+   a6xx_gpu->cur_ctx = NULL;
+
/* Enable the SQE_to start the CP engine */
gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 1);
 
@@ -1008,6 +1055,21 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_private_address_space(struct msm_gpu *gpu)
+{
+   struct msm_gem_address_space *aspace = NULL;
+   struct msm_mmu *mmu;
+
+   mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
+
+   if (!IS_ERR(mmu))
+   aspace = msm_gem_address_space_create(mmu,
+   "gpu", 0x1ULL, 0x1ULL);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -1031,6 +1093,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_put = a6xx_gpu_state_put,
 #endif
.create_address_space = adreno_iommu_create_address_space,
+   .create_private_address_space = 
a6xx_create_private_address_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 03ba60d5b07f..da22d7549d9b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -19,6 +19,7 @@ struct a6xx_gpu {
uint64_t sqe_iova;
 
struct msm_ringbuffer *cur_ring;
+   struct msm_file_private *cur_ctx;
 
struct a6xx_gmu gmu;
 };
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h 
b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 7764373d0ed2..0987d6bf848c 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -31,6 +31,7 @@ struct msm_rbmemptrs {
volatile uint32_t fence;
 
volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
+   volatile u64 ttbr0;
 };
 
 struct msm_ringbuffer {
-- 
2.26.2

___
iommu mailing list

Re: [PATCH v2 2/9] iommu/ioasid: Rename ioasid_set_data()

2020-08-24 Thread Jean-Philippe Brucker
On Fri, Aug 21, 2020 at 09:35:11PM -0700, Jacob Pan wrote:
> Rename ioasid_set_data() to ioasid_attach_data() to avoid confusion with
> struct ioasid_set. ioasid_set is a group of IOASIDs that share a common
> token.
> 
> Signed-off-by: Jacob Pan 

Reviewed-by: Jean-Philippe Brucker 

> ---
>  drivers/iommu/intel/svm.c | 6 +++---
>  drivers/iommu/ioasid.c| 6 +++---
>  include/linux/ioasid.h| 4 ++--
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index b6972dca2ae0..37a9beabc0ca 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -342,7 +342,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
> struct device *dev,
>   svm->gpasid = data->gpasid;
>   svm->flags |= SVM_FLAG_GUEST_PASID;
>   }
> - ioasid_set_data(data->hpasid, svm);
> + ioasid_attach_data(data->hpasid, svm);
>   INIT_LIST_HEAD_RCU(>devs);
>   mmput(svm->mm);
>   }
> @@ -394,7 +394,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
> struct device *dev,
>   list_add_rcu(>list, >devs);
>   out:
>   if (!IS_ERR_OR_NULL(svm) && list_empty(>devs)) {
> - ioasid_set_data(data->hpasid, NULL);
> + ioasid_attach_data(data->hpasid, NULL);
>   kfree(svm);
>   }
>  
> @@ -437,7 +437,7 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
>* the unbind, IOMMU driver will get notified
>* and perform cleanup.
>*/
> - ioasid_set_data(pasid, NULL);
> + ioasid_attach_data(pasid, NULL);
>   kfree(svm);
>   }
>   }
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 0f8dd377aada..5f63af07acd5 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -258,14 +258,14 @@ void ioasid_unregister_allocator(struct 
> ioasid_allocator_ops *ops)
>  EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
>  
>  /**
> - * ioasid_set_data - Set private data for an allocated ioasid
> + * ioasid_attach_data - Set private data for an allocated ioasid
>   * @ioasid: the ID to set data
>   * @data:   the private data
>   *
>   * For IOASID that is already allocated, private data can be set
>   * via this API. Future lookup can be done via ioasid_find.
>   */
> -int ioasid_set_data(ioasid_t ioasid, void *data)
> +int ioasid_attach_data(ioasid_t ioasid, void *data)
>  {
>   struct ioasid_data *ioasid_data;
>   int ret = 0;
> @@ -287,7 +287,7 @@ int ioasid_set_data(ioasid_t ioasid, void *data)
>  
>   return ret;
>  }
> -EXPORT_SYMBOL_GPL(ioasid_set_data);
> +EXPORT_SYMBOL_GPL(ioasid_attach_data);
>  
>  /**
>   * ioasid_alloc - Allocate an IOASID
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> index 6f000d7a0ddc..9c44947a68c8 100644
> --- a/include/linux/ioasid.h
> +++ b/include/linux/ioasid.h
> @@ -39,7 +39,7 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> bool (*getter)(void *));
>  int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
>  void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> -int ioasid_set_data(ioasid_t ioasid, void *data);
> +int ioasid_attach_data(ioasid_t ioasid, void *data);
>  
>  #else /* !CONFIG_IOASID */
>  static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> @@ -67,7 +67,7 @@ static inline void ioasid_unregister_allocator(struct 
> ioasid_allocator_ops *allo
>  {
>  }
>  
> -static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> +static inline int ioasid_attach_data(ioasid_t ioasid, void *data)
>  {
>   return -ENOTSUPP;
>  }
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/9] iommu/ioasid: Introduce ioasid_set APIs

2020-08-24 Thread Jean-Philippe Brucker
On Fri, Aug 21, 2020 at 09:35:12PM -0700, Jacob Pan wrote:
> ioasid_set was introduced as an arbitrary token that are shared by a
> group of IOASIDs. For example, if IOASID #1 and #2 are allocated via the
> same ioasid_set*, they are viewed as to belong to the same set.
> 
> For guest SVA usages, system-wide IOASID resources need to be
> partitioned such that VMs can have its own quota and being managed
> separately. ioasid_set is the perfect candidate for meeting such
> requirements. This patch redefines and extends ioasid_set with the
> following new fields:
> - Quota
> - Reference count
> - Storage of its namespace
> - The token is stored in the new ioasid_set but with optional types
> 
> ioasid_set level APIs are introduced that wires up these new data.
> Existing users of IOASID APIs are converted where a host IOASID set is
> allocated for bare-metal usage.
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel/iommu.c |  27 ++-
>  drivers/iommu/intel/pasid.h |   1 +
>  drivers/iommu/intel/svm.c   |   8 +-
>  drivers/iommu/ioasid.c  | 390 
> +---
>  include/linux/ioasid.h  |  82 --
>  5 files changed, 465 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index a3a0b5c8921d..5813eeaa5edb 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -103,6 +104,9 @@
>   */
>  #define INTEL_IOMMU_PGSIZES  (~0xFFFUL)
>  
> +/* PASIDs used by host SVM */
> +struct ioasid_set *host_pasid_set;
> +
>  static inline int agaw_to_level(int agaw)
>  {
>   return agaw + 2;
> @@ -3103,8 +3107,8 @@ static void intel_vcmd_ioasid_free(ioasid_t ioasid, 
> void *data)
>* Sanity check the ioasid owner is done at upper layer, e.g. VFIO
>* We can only free the PASID when all the devices are unbound.
>*/
> - if (ioasid_find(NULL, ioasid, NULL)) {
> - pr_alert("Cannot free active IOASID %d\n", ioasid);
> + if (IS_ERR(ioasid_find(host_pasid_set, ioasid, NULL))) {
> + pr_err("Cannot free IOASID %d, not in system set\n", ioasid);
>   return;
>   }
>   vcmd_free_pasid(iommu, ioasid);
> @@ -3288,6 +3292,19 @@ static int __init init_dmars(void)
>   if (ret)
>   goto free_iommu;
>  
> + /* PASID is needed for scalable mode irrespective to SVM */
> + if (intel_iommu_sm) {
> + ioasid_install_capacity(intel_pasid_max_id);
> + /* We should not run out of IOASIDs at boot */
> + host_pasid_set = ioasid_alloc_set(NULL, PID_MAX_DEFAULT,
> + IOASID_SET_TYPE_NULL);
> + if (IS_ERR_OR_NULL(host_pasid_set)) {
> + pr_err("Failed to enable host PASID allocator %lu\n",
> + PTR_ERR(host_pasid_set));
> + intel_iommu_sm = 0;
> + }
> + }
> +
>   /*
>* for each drhd
>*   enable fault log
> @@ -5149,7 +5166,7 @@ static void auxiliary_unlink_device(struct dmar_domain 
> *domain,
>   domain->auxd_refcnt--;
>  
>   if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - ioasid_free(domain->default_pasid);
> + ioasid_free(host_pasid_set, domain->default_pasid);
>  }
>  
>  static int aux_domain_add_dev(struct dmar_domain *domain,
> @@ -5167,7 +5184,7 @@ static int aux_domain_add_dev(struct dmar_domain 
> *domain,
>   int pasid;
>  
>   /* No private data needed for the default pasid */
> - pasid = ioasid_alloc(NULL, PASID_MIN,
> + pasid = ioasid_alloc(host_pasid_set, PASID_MIN,
>pci_max_pasids(to_pci_dev(dev)) - 1,
>NULL);
>   if (pasid == INVALID_IOASID) {
> @@ -5210,7 +5227,7 @@ static int aux_domain_add_dev(struct dmar_domain 
> *domain,
>   spin_unlock(>lock);
>   spin_unlock_irqrestore(_domain_lock, flags);
>   if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - ioasid_free(domain->default_pasid);
> + ioasid_free(host_pasid_set, domain->default_pasid);
>  
>   return ret;
>  }
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index c9850766c3a9..ccdc23446015 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -99,6 +99,7 @@ static inline bool pasid_pte_is_present(struct pasid_entry 
> *pte)
>  }
>  
>  extern u32 intel_pasid_max_id;
> +extern struct ioasid_set *host_pasid_set;
>  int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);
>  void intel_pasid_free_id(int pasid);
>  void *intel_pasid_lookup_id(int pasid);
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 

Re: [PATCH v2 3/9] iommu/ioasid: Introduce ioasid_set APIs

2020-08-24 Thread Randy Dunlap
On 8/24/20 11:28 AM, Jean-Philippe Brucker wrote:
>> +/**
>> + * struct ioasid_data - Meta data about ioasid
>> + *
>> + * @id: Unique ID
>> + * @users   Number of active users
>> + * @state   Track state of the IOASID
>> + * @set Meta data of the set this IOASID belongs to
>> + * @private Private data associated with the IOASID
>> + * @rcu For free after RCU grace period
> nit: it would be nicer to follow the struct order

and use a ':' after each struct member name, as is done for @id:

-- 
~Randy

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/9] iommu/ioasid: Introduce ioasid_set APIs

2020-08-24 Thread Randy Dunlap
On 8/24/20 11:28 AM, Jean-Philippe Brucker wrote:
>> +/**
>> + * struct ioasid_set - Meta data about ioasid_set
>> + * @type:   Token types and other features
> nit: doesn't follow struct order
> 
>> + * @token:  Unique to identify an IOASID set
>> + * @xa: XArray to store ioasid_set private IDs, can be used for
>> + *  guest-host IOASID mapping, or just a private IOASID namespace.
>> + * @quota:  Max number of IOASIDs can be allocated within the set
>> + * @nr_ioasids  Number of IOASIDs currently allocated in the set

 * @nr_ioasids: Number of IOASIDs currently allocated in the set

>> + * @sid:ID of the set
>> + * @ref:Reference count of the users
>> + */
>>  struct ioasid_set {
>> -int dummy;
>> +void *token;
>> +struct xarray xa;
>> +int type;
>> +int quota;
>> +int nr_ioasids;
>> +int sid;
>> +refcount_t ref;
>> +struct rcu_head rcu;
>>  };


-- 
~Randy

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/5] dt-bindings: of: Add plumbing for restricted DMA pool

2020-08-24 Thread Tomasz Figa
On Tue, Aug 11, 2020 at 11:15 AM Tomasz Figa  wrote:
>
> On Mon, Aug 3, 2020 at 5:15 PM Tomasz Figa  wrote:
> >
> > Hi Claire and Rob,
> >
> > On Mon, Aug 3, 2020 at 4:26 PM Claire Chang  wrote:
> > >
> > > On Sat, Aug 1, 2020 at 4:58 AM Rob Herring  wrote:
> > > >
> > > > On Tue, Jul 28, 2020 at 01:01:39PM +0800, Claire Chang wrote:
> > > > > Introduce the new compatible string, device-swiotlb-pool, for 
> > > > > restricted
> > > > > DMA. One can specify the address and length of the device swiotlb 
> > > > > memory
> > > > > region by device-swiotlb-pool in the device tree.
> > > > >
> > > > > Signed-off-by: Claire Chang 
> > > > > ---
> > > > >  .../reserved-memory/reserved-memory.txt   | 35 
> > > > > +++
> > > > >  1 file changed, 35 insertions(+)
> > > > >
> > > > > diff --git 
> > > > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > >  
> > > > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > > index 4dd20de6977f..78850896e1d0 100644
> > > > > --- 
> > > > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > > +++ 
> > > > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > > @@ -51,6 +51,24 @@ compatible (optional) - standard definition
> > > > >used as a shared pool of DMA buffers for a set of devices. 
> > > > > It can
> > > > >be used by an operating system to instantiate the 
> > > > > necessary pool
> > > > >management subsystem if necessary.
> > > > > +- device-swiotlb-pool: This indicates a region of memory 
> > > > > meant to be
> > > >
> > > > swiotlb is a Linux thing. The binding should be independent.
> > > Got it. Thanks for pointing this out.
> > >
> > > >
> > > > > +  used as a pool of device swiotlb buffers for a given 
> > > > > device. When
> > > > > +  using this, the no-map and reusable properties must not be 
> > > > > set, so the
> > > > > +  operating system can create a virtual mapping that will be 
> > > > > used for
> > > > > +  synchronization. Also, there must be a restricted-dma 
> > > > > property in the
> > > > > +  device node to specify the indexes of reserved-memory 
> > > > > nodes. One can
> > > > > +  specify two reserved-memory nodes in the device tree. One 
> > > > > with
> > > > > +  shared-dma-pool to handle the coherent DMA buffer 
> > > > > allocation, and
> > > > > +  another one with device-swiotlb-pool for regular DMA 
> > > > > to/from system
> > > > > +  memory, which would be subject to bouncing. The main 
> > > > > purpose for
> > > > > +  restricted DMA is to mitigate the lack of DMA access 
> > > > > control on
> > > > > +  systems without an IOMMU, which could result in the DMA 
> > > > > accessing the
> > > > > +  system memory at unexpected times and/or unexpected 
> > > > > addresses,
> > > > > +  possibly leading to data leakage or corruption. The 
> > > > > feature on its own
> > > > > +  provides a basic level of protection against the DMA 
> > > > > overwriting buffer
> > > > > +  contents at unexpected times. However, to protect against 
> > > > > general data
> > > > > +  leakage and system memory corruption, the system needs to 
> > > > > provide a
> > > > > +  way to restrict the DMA to a predefined memory region.
> > > >
> > > > I'm pretty sure we already support per device carveouts and I don't
> > > > understand how this is different.
> > > We use this to bounce streaming DMA in and out of a specially allocated 
> > > region.
> > > I'll try to merge this with the existing one (i.e., shared-dma-pool)
> > > to see if that
> > > makes things clearer.
> > >
> >
> > Indeed, from the firmware point of view, this is just a carveout, for
> > which we have the "shared-dma-pool" compatible string defined already.
> >
> > However, depending on the device and firmware setup, the way the
> > carevout is used may change. I can see the following scenarios:
> >
> > 1) coherent DMA (dma_alloc_*) within a reserved pool and no
> > non-coherent DMA (dma_map_*).
> >
> > This is how the "memory-region" property is handled today in Linux for
> > devices which can only DMA from/to the given memory region. However,
> > I'm not sure if no non-coherent DMA is actually enforced in any way by
> > the DMA subsystem.
> >
> > 2) coherent DMA from a reserved pool and non-coherent DMA from system memory
> >
> > This is the case for the systems which have some dedicated part of
> > memory which is guaranteed to be coherent with the DMA, but still can
> > do non-coherent DMA to any part of the system memory. Linux handles it
> > the same way as 1), which is what made me believe that 1) might not
> > actually be handled correctly.
> >
> > 3) coherent DMA and bounced non-coherent DMA within a reserved pool
> > 4) coherent DMA within one pool and 

Is: virtio_gpu_object_shmem_init issues? Was:Re: upstream boot error: general protection fault in swiotlb_map

2020-08-24 Thread Konrad Rzeszutek Wilk
On Thu, Aug 06, 2020 at 03:46:23AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:47ec5303 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16fe1dea90
> kernel config:  https://syzkaller.appspot.com/x/.config?x=7c06047f622c5724
> dashboard link: https://syzkaller.appspot.com/bug?extid=3f86afd0b1e4bf1cb64c
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3f86afd0b1e4bf1cb...@syzkaller.appspotmail.com
> 
> ceph: loaded (mds proto 32)
> NET: Registered protocol family 38
> async_tx: api initialized (async)
> Key type asymmetric registered
> Asymmetric key parser 'x509' registered
> Asymmetric key parser 'pkcs8' registered
> Key type pkcs7_test registered
> Asymmetric key parser 'tpm_parser' registered
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
> io scheduler mq-deadline registered
> io scheduler kyber registered
> io scheduler bfq registered
> hgafb: HGA card not detected.
> hgafb: probe of hgafb.0 failed with error -22
> usbcore: registered new interface driver udlfb
> uvesafb: failed to execute /sbin/v86d
> uvesafb: make sure that the v86d helper is installed and executable
> uvesafb: Getting VBE info block failed (eax=0x4f00, err=-2)
> uvesafb: vbe_init() failed with -22
> uvesafb: probe of uvesafb.0 failed with error -22
> vga16fb: mapped to 0x8aac772d
> Console: switching to colour frame buffer device 80x30
> fb0: VGA16 VGA frame buffer device
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
> ACPI: Power Button [PWRF]
> ioatdma: Intel(R) QuickData Technology Driver 5.00
> PCI Interrupt Link [GSIF] enabled at IRQ 21
> PCI Interrupt Link [GSIG] enabled at IRQ 22
> PCI Interrupt Link [GSIH] enabled at IRQ 23
> N_HDLC line discipline registered with maxframe=4096
> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> Cyclades driver 2.6
> Initializing Nozomi driver 2.1d
> RocketPort device driver module, version 2.09, 12-June-2003
> No rocketport ports found; unloading driver
> Non-volatile memory driver v1.3
> Linux agpgart interface v0.103
> [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
> [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
> usbcore: registered new interface driver udl
> [drm] pci: virtio-vga detected at :00:01.0
> fb0: switching to virtiodrmfb from VGA16 VGA
> Console: switching to colour VGA+ 80x25
> virtio-pci :00:01.0: vgaarb: deactivate vga console
> Console: switching to colour dummy device 80x25
> [drm] features: -virgl +edid
> [drm] number of scanouts: 1
> [drm] number of cap sets: 0
> [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 2
> general protection fault, probably for non-canonical address 
> 0xdc00:  [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x-0x0007]
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-syzkaller #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> RIP: 0010:swiotlb_map+0x5ac/0x700 kernel/dma/swiotlb.c:683
> Code: 28 04 00 00 48 c1 ea 03 80 3c 02 00 0f 85 4d 01 00 00 4c 8b a5 18 04 00 
> 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 1e 
> 01 00 00 48 8d 7d 50 4d 8b 24 24 48 b8 00 00
> RSP: :c934f3e0 EFLAGS: 00010246
> RAX: dc00 RBX:  RCX: 8162cc1d
> RDX:  RSI: 8162cc98 RDI: 88802971a470
> RBP: 88802971a048 R08: 0001 R09: 8c5dba77
> R10:  R11:  R12: 
> R13: 7ac0 R14: dc00 R15: 1000
> FS:  () GS:88802ce0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 09a8d000 CR4: 00350ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  dma_direct_map_page include/linux/dma-direct.h:170 [inline]
>  dma_direct_map_sg+0x3bb/0x670 kernel/dma/direct.c:368
>  dma_map_sg_attrs+0xd0/0x160 kernel/dma/mapping.c:183
>  drm_gem_shmem_get_pages_sgt drivers/gpu/drm/drm_gem_shmem_helper.c:700 
> [inline]
>  drm_gem_shmem_get_pages_sgt+0x1fc/0x310 
> drivers/gpu/drm/drm_gem_shmem_helper.c:679
>  virtio_gpu_object_shmem_init drivers/gpu/drm/virtio/virtgpu_object.c:153 
> [inline]
>  virtio_gpu_object_create+0x2fd/0xa70 
> drivers/gpu/drm/virtio/virtgpu_object.c:232
>  virtio_gpu_gem_create drivers/gpu/drm/virtio/virtgpu_gem.c:45 [inline]
>  virtio_gpu_mode_dumb_create+0x298/0x530 
> drivers/gpu/drm/virtio/virtgpu_gem.c:85
>  

Re: Is: virtio_gpu_object_shmem_init issues? Was:Re: upstream boot error: general protection fault in swiotlb_map

2020-08-24 Thread Dmitry Vyukov via iommu
On Mon, Aug 24, 2020 at 5:07 PM Konrad Rzeszutek Wilk
 wrote:
>
> On Thu, Aug 06, 2020 at 03:46:23AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:47ec5303 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16fe1dea90
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=7c06047f622c5724
> > dashboard link: https://syzkaller.appspot.com/bug?extid=3f86afd0b1e4bf1cb64c
> > compiler:   gcc (GCC) 10.1.0-syz 20200507
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+3f86afd0b1e4bf1cb...@syzkaller.appspotmail.com
> >
> > ceph: loaded (mds proto 32)
> > NET: Registered protocol family 38
> > async_tx: api initialized (async)
> > Key type asymmetric registered
> > Asymmetric key parser 'x509' registered
> > Asymmetric key parser 'pkcs8' registered
> > Key type pkcs7_test registered
> > Asymmetric key parser 'tpm_parser' registered
> > Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
> > io scheduler mq-deadline registered
> > io scheduler kyber registered
> > io scheduler bfq registered
> > hgafb: HGA card not detected.
> > hgafb: probe of hgafb.0 failed with error -22
> > usbcore: registered new interface driver udlfb
> > uvesafb: failed to execute /sbin/v86d
> > uvesafb: make sure that the v86d helper is installed and executable
> > uvesafb: Getting VBE info block failed (eax=0x4f00, err=-2)
> > uvesafb: vbe_init() failed with -22
> > uvesafb: probe of uvesafb.0 failed with error -22
> > vga16fb: mapped to 0x8aac772d
> > Console: switching to colour frame buffer device 80x30
> > fb0: VGA16 VGA frame buffer device
> > input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
> > ACPI: Power Button [PWRF]
> > ioatdma: Intel(R) QuickData Technology Driver 5.00
> > PCI Interrupt Link [GSIF] enabled at IRQ 21
> > PCI Interrupt Link [GSIG] enabled at IRQ 22
> > PCI Interrupt Link [GSIH] enabled at IRQ 23
> > N_HDLC line discipline registered with maxframe=4096
> > Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > Cyclades driver 2.6
> > Initializing Nozomi driver 2.1d
> > RocketPort device driver module, version 2.09, 12-June-2003
> > No rocketport ports found; unloading driver
> > Non-volatile memory driver v1.3
> > Linux agpgart interface v0.103
> > [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
> > [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
> > usbcore: registered new interface driver udl
> > [drm] pci: virtio-vga detected at :00:01.0
> > fb0: switching to virtiodrmfb from VGA16 VGA
> > Console: switching to colour VGA+ 80x25
> > virtio-pci :00:01.0: vgaarb: deactivate vga console
> > Console: switching to colour dummy device 80x25
> > [drm] features: -virgl +edid
> > [drm] number of scanouts: 1
> > [drm] number of cap sets: 0
> > [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 2
> > general protection fault, probably for non-canonical address 
> > 0xdc00:  [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x-0x0007]
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-syzkaller #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:swiotlb_map+0x5ac/0x700 kernel/dma/swiotlb.c:683
> > Code: 28 04 00 00 48 c1 ea 03 80 3c 02 00 0f 85 4d 01 00 00 4c 8b a5 18 04 
> > 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 
> > 85 1e 01 00 00 48 8d 7d 50 4d 8b 24 24 48 b8 00 00
> > RSP: :c934f3e0 EFLAGS: 00010246
> > RAX: dc00 RBX:  RCX: 8162cc1d
> > RDX:  RSI: 8162cc98 RDI: 88802971a470
> > RBP: 88802971a048 R08: 0001 R09: 8c5dba77
> > R10:  R11:  R12: 
> > R13: 7ac0 R14: dc00 R15: 1000
> > FS:  () GS:88802ce0() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2:  CR3: 09a8d000 CR4: 00350ef0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
> > Call Trace:
> >  dma_direct_map_page include/linux/dma-direct.h:170 [inline]
> >  dma_direct_map_sg+0x3bb/0x670 kernel/dma/direct.c:368
> >  dma_map_sg_attrs+0xd0/0x160 kernel/dma/mapping.c:183
> >  drm_gem_shmem_get_pages_sgt drivers/gpu/drm/drm_gem_shmem_helper.c:700 
> > [inline]
> >  drm_gem_shmem_get_pages_sgt+0x1fc/0x310 
> > drivers/gpu/drm/drm_gem_shmem_helper.c:679
> >  virtio_gpu_object_shmem_init drivers/gpu/drm/virtio/virtgpu_object.c:153 
> > [inline]
> >  

[PATCH AUTOSEL 5.8 31/63] dma-pool: Only allocate from CMA when in same memory zone

2020-08-24 Thread Sasha Levin
From: Nicolas Saenz Julienne 

[ Upstream commit d7e673ec2c8e0ea39c4c70fc490d67d7fbda869d ]

There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. To get around this double check CMA's placement before
allocating from it.

Signed-off-by: Nicolas Saenz Julienne 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 kernel/dma/pool.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5d071d4a3cbaa..06582b488e317 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -3,7 +3,9 @@
  * Copyright (C) 2012 ARM Ltd.
  * Copyright (C) 2020 Google LLC
  */
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -55,6 +57,29 @@ static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
pool_size_kernel += size;
 }
 
+static bool cma_in_zone(gfp_t gfp)
+{
+   unsigned long size;
+   phys_addr_t end;
+   struct cma *cma;
+
+   cma = dev_get_cma_area(NULL);
+   if (!cma)
+   return false;
+
+   size = cma_get_size(cma);
+   if (!size)
+   return false;
+
+   /* CMA can't cross zone boundaries, see cma_activate_area() */
+   end = cma_get_base(cma) + size - 1;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+   return end <= DMA_BIT_MASK(zone_dma_bits);
+   if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
+   return end <= DMA_BIT_MASK(32);
+   return true;
+}
+
 static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
  gfp_t gfp)
 {
@@ -68,7 +93,11 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
do {
pool_size = 1 << (PAGE_SHIFT + order);
-   page = alloc_pages(gfp, order);
+   if (cma_in_zone(gfp))
+   page = dma_alloc_from_contiguous(NULL, 1 << order,
+order, false);
+   if (!page)
+   page = alloc_pages(gfp, order);
} while (!page && order-- > 0);
if (!page)
goto out;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.8 30/63] dma-pool: fix coherent pool allocations for IOMMU mappings

2020-08-24 Thread Sasha Levin
From: Christoph Hellwig 

[ Upstream commit 9420139f516d7fbc248ce17f35275cb005ed98ea ]

When allocating coherent pool memory for an IOMMU mapping we don't care
about the DMA mask.  Move the guess for the initial GFP mask into the
dma_direct_alloc_pages and pass dma_coherent_ok as a function pointer
argument so that it doesn't get applied to the IOMMU case.

Signed-off-by: Christoph Hellwig 
Tested-by: Amit Pundir 
Signed-off-by: Sasha Levin 
---
 drivers/iommu/dma-iommu.c   |   4 +-
 include/linux/dma-direct.h  |   3 -
 include/linux/dma-mapping.h |   5 +-
 kernel/dma/direct.c |  13 ++--
 kernel/dma/pool.c   | 114 +++-
 5 files changed, 62 insertions(+), 77 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd0..5141d49a046ba 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1035,8 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
-   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), ,
-  gfp);
+   page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), _addr,
+  gfp, NULL);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, , gfp, attrs);
if (!cpu_addr)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index ab2e20cba9514..ba22952c24e24 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -67,9 +67,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
-gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
- u64 *phys_mask);
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index a33ed3954ed46..0dc08701d7b7e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -715,8 +715,9 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
pgprot_t prot, const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size);
 
-void *dma_alloc_from_pool(struct device *dev, size_t size,
- struct page **ret_page, gfp_t flags);
+struct page *dma_alloc_from_pool(struct device *dev, size_t size,
+   void **cpu_addr, gfp_t flags,
+   bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t));
 bool dma_free_from_pool(struct device *dev, void *start, size_t size);
 
 int
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 67f060b86a73f..f17aec9d01f0c 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -45,7 +45,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
-gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
  u64 *phys_limit)
 {
u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
@@ -70,7 +70,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 
dma_mask,
return 0;
 }
 
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
+static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
return phys_to_dma_direct(dev, phys) + size - 1 <=
min_not_zero(dev->coherent_dma_mask, 
dev->bus_dma_limit);
@@ -163,8 +163,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
size = PAGE_ALIGN(size);
 
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
-   ret = dma_alloc_from_pool(dev, size, , gfp);
-   if (!ret)
+   u64 phys_mask;
+
+   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   _mask);
+   page = dma_alloc_from_pool(dev, size, , gfp,
+   dma_coherent_ok);
+   if (!page)
return NULL;
goto done;
}
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 6bc74a2d51273..5d071d4a3cbaa 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -196,93 +196,75 @@ static int __init dma_atomic_pool_init(void)
 }
 postcore_initcall(dma_atomic_pool_init);
 
-static inline struct gen_pool *dma_guess_pool_from_device(struct device *dev)
+static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 {
-   u64 phys_mask;

Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround

2020-08-24 Thread Suman Anna via iommu
On 8/20/20 6:01 PM, Robin Murphy wrote:
> On 2020-08-20 20:55, Sakari Ailus wrote:
>> On Thu, Aug 20, 2020 at 06:25:19PM +0100, Robin Murphy wrote:
>>> On 2020-08-20 17:53, Sakari Ailus wrote:
 Hi Robin,

 On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote:
> Now that arch/arm is wired up for default domains and iommu-dma, devices
> behind IOMMUs will get mappings set up automatically as appropriate, so
> there is no need for drivers to do so manually.
>
> Signed-off-by: Robin Murphy 

 Thanks for the patch.
>>>
>>> Many thanks for testing so quickly!
>>>
 I haven't looked at the details but it seems that this causes the buffer
 memory allocation to be physically contiguous, which causes a failure to
 allocate video buffers of entirely normal size. I guess that was not
 intentional?
>>>
>>> Hmm, it looks like the device ends up with the wrong DMA ops, which implies
>>> something didn't go as expected with the earlier IOMMU setup and default
>>> domain creation. Chances are that either I missed some subtlety in the
>>> omap_iommu change, or I've fundamentally misjudged how the ISP probing works
>>> and it never actually goes down the of_iommu_configure() path in the first
>>> place. Do you get any messages from the IOMMU layer earlier on during boot?

Yeah, I don't think we go through the of_iommu_configure() path, the setup is
mostly done using .probe_device() and .attach_dev() ops. Since the MMUs are
present directly in the respective sub-systems and relies on the sub-system
clocking and power, the MMU itself is turned ON and enabled during 
.attach_dev().

regards
Suman

>>
>> I do get these:
>>
>> [    2.934936] iommu: Default domain type: Translated
>> [    2.940917] omap-iommu 480bd400.mmu: 480bd400.mmu registered
>> [    2.946899] platform 480bc000.isp: Adding to iommu group 0
>>
> 
> So that much looks OK, if there are no obvious errors. Unfortunately there's 
> no
> easy way to tell exactly what of_iommu_configure() is doing (beyond enabling a
> couple of vague debug messages). The first thing I'll do tomorrow is
> double-check whether it's really working on my boards here, or whether I was
> just getting lucky with CMA... (I assume you don't have CMA enabled if you're
> ending up in remap_allocator_alloc())
> 
> Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dt-bindings: iommu: renesas,ipmmu-vmsa: Sort compatible string in increasing number of the SoC

2020-08-24 Thread Rob Herring
On Sun, 09 Aug 2020 20:35:27 +0100, Lad Prabhakar wrote:
> Sort the items in the compatible string list in increasing number of SoC.
> 
> Signed-off-by: Lad Prabhakar 
> ---
>  Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/intel: Handle 36b addressing for x86-32

2020-08-24 Thread Lu Baolu

Hi Chris,

On 2020/8/23 0:02, Chris Wilson wrote:

Beware that the address size for x86-32 may exceed unsigned long.

[0.368971] UBSAN: shift-out-of-bounds in drivers/iommu/intel/iommu.c:128:14
[0.369055] shift exponent 36 is too large for 32-bit type 'long unsigned 
int'

If we don't handle the wide addresses, the pages are mismapped and the
device read/writes go astray, detected as DMAR faults and leading to
device failure. The behaviour changed (from working to broken) in commit
fa954e683178 ("iommu/vt-d: Delegate the dma domain to upper layer"), but

commit  ("iommu/vt-d: Delegate the dma domain to upper layer")

and adjust the title as "iommu/vt-d: Handle 36bit addressing for x86-32"

with above two changes,

Acked-by: Lu Baolu 

Best regards,
baolu


the error looks older.

Fixes: fa954e683178 ("iommu/vt-d: Delegate the dma domain to upper layer")
Signed-off-by: Chris Wilson 
Cc: James Sewart 
Cc: Lu Baolu 
Cc: Joerg Roedel 
Cc:  # v5.3+
---
  drivers/iommu/intel/iommu.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2e9c8c3d0da4..ba78a2e854f9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -123,29 +123,29 @@ static inline unsigned int level_to_offset_bits(int level)
return (level - 1) * LEVEL_STRIDE;
  }
  
-static inline int pfn_level_offset(unsigned long pfn, int level)

+static inline int pfn_level_offset(u64 pfn, int level)
  {
return (pfn >> level_to_offset_bits(level)) & LEVEL_MASK;
  }
  
-static inline unsigned long level_mask(int level)

+static inline u64 level_mask(int level)
  {
-   return -1UL << level_to_offset_bits(level);
+   return -1ULL << level_to_offset_bits(level);
  }
  
-static inline unsigned long level_size(int level)

+static inline u64 level_size(int level)
  {
-   return 1UL << level_to_offset_bits(level);
+   return 1ULL << level_to_offset_bits(level);
  }
  
-static inline unsigned long align_to_level(unsigned long pfn, int level)

+static inline u64 align_to_level(u64 pfn, int level)
  {
return (pfn + level_size(level) - 1) & level_mask(level);
  }
  
  static inline unsigned long lvl_to_nr_pages(unsigned int lvl)

  {
-   return  1 << min_t(int, (lvl - 1) * LEVEL_STRIDE, MAX_AGAW_PFN_WIDTH);
+   return 1UL << min_t(int, (lvl - 1) * LEVEL_STRIDE, MAX_AGAW_PFN_WIDTH);
  }
  
  /* VT-d pages must always be _smaller_ than MM pages. Otherwise things



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 07/15] iommu/vt-d: Delegate the dma domain to upper layer

2020-08-24 Thread Lu Baolu

Hi Chris,

On 8/24/20 4:35 PM, Chris Wilson wrote:

Quoting Lu Baolu (2020-08-24 07:31:23)

Hi Chris,

On 2020/8/22 2:33, Chris Wilson wrote:

Quoting Lu Baolu (2019-05-25 06:41:28)

This allows the iommu generic layer to allocate a dma domain and
attach it to a device through the iommu api's. With all types of
domains being delegated to upper layer, we can remove an internal
flag which was used to distinguish domains mananged internally or
externally.


I'm seeing some really strange behaviour with this patch on a 32b
Skylake system (and still present on mainline). Before this patch
everything is peaceful and appears to work correctly. Applying this patch,
and we fail to initialise the GPU with a few DMAR errors reported, e.g.

[   20.279445] DMAR: DRHD: handling fault status reg 3
[   20.279508] DMAR: [DMA Read] Request device [00:02.0] fault addr 8900a000 
[fault reason 05] PTE Write access is not set

Setting an identity map for the igfx made the DMAR errors disappear, but
the GPU still failed to initialise.

There's no difference in the DMAR configuration dmesg between working and
the upset patch. And the really strange part is that switching to a 64b
kernel with this patch, it's working.

Any suggestions on what I should look for?


Can the patch titled "[PATCH] iommu/intel: Handle 36b addressing for
x86-32" solve this problem?


It does. Not sure why, but that mystery I can leave for others.


It's caused by left switching 36 bits operation against a 32-bit
integer. Your patch fixes this by converting the integer from unsigned
long to u64. It looks good to me. Thanks!


-Chris



Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init

2020-08-24 Thread Jürgen Groß

On 24.08.20 23:21, Thomas Gleixner wrote:

On Mon, Aug 24 2020 at 06:59, Jürgen Groß wrote:

On 21.08.20 02:24, Thomas Gleixner wrote:

+static __init void xen_setup_pci_msi(void)
+{
+   if (xen_initial_domain()) {
+   x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
+   pci_msi_ignore_mask = 1;


This is wrong, as a PVH initial domain shouldn't do the pv settings.

The "if (xen_initial_domain())" should be inside the pv case, like:

if (xen_pv_domain()) {
if (xen_initial_domain()) {
...
} else {
...
}
} else if (xen_hvm_domain()) {
...


I still think it does the right thing depending on the place it is
called from, but even if so, it's completely unreadable gunk. I'll fix
that proper.


The main issue is that xen_initial_domain() and xen_pv_domain() are
orthogonal to each other. So xen_initial_domain() can either be true
for xen_pv_domain() (the "classic" pv dom0) or for xen_hvm_domain()
(the new PVH dom0).


Juergen
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v11 07/11] device-mapping: Introduce DMA range map, supplanting dma_pfn_offset

2020-08-24 Thread Jim Quinlan via iommu
The new field 'dma_range_map' in struct device is used to facilitate the
use of single or multiple offsets between mapping regions of cpu addrs and
dma addrs.  It subsumes the role of "dev->dma_pfn_offset" which was only
capable of holding a single uniform offset and had no region bounds
checking.

The function of_dma_get_range() has been modified so that it takes a single
argument -- the device node -- and returns a map, NULL, or an error code.
The map is an array that holds the information regarding the DMA regions.
Each range entry contains the address offset, the cpu_start address, the
dma_start address, and the size of the region.

of_dma_configure() is the typical manner to set range offsets but there are
a number of ad hoc assignments to "dev->dma_pfn_offset" in the kernel
driver code.  These cases now invoke the function
dma_attach_offset_range(dev, cpu_addr, dma_addr, size).

Signed-off-by: Jim Quinlan 
---
 arch/arm/include/asm/dma-mapping.h| 10 +--
 arch/arm/mach-keystone/keystone.c | 17 +++--
 arch/sh/drivers/pci/pcie-sh7786.c |  9 +--
 arch/x86/pci/sta2x11-fixup.c  |  7 +-
 drivers/acpi/arm64/iort.c |  5 +-
 drivers/base/core.c   |  2 +
 drivers/gpu/drm/sun4i/sun4i_backend.c |  5 +-
 drivers/iommu/io-pgtable-arm.c|  2 +-
 .../platform/sunxi/sun4i-csi/sun4i_csi.c  |  5 +-
 .../platform/sunxi/sun6i-csi/sun6i_csi.c  |  4 +-
 drivers/of/address.c  | 72 +--
 drivers/of/device.c   | 43 ++-
 drivers/of/of_private.h   | 10 +--
 drivers/of/unittest.c | 34 ++---
 drivers/remoteproc/remoteproc_core.c  |  8 ++-
 .../staging/media/sunxi/cedrus/cedrus_hw.c|  7 +-
 drivers/usb/core/message.c|  9 ++-
 drivers/usb/core/usb.c|  7 +-
 include/linux/device.h|  4 +-
 include/linux/dma-direct.h|  8 +--
 include/linux/dma-mapping.h   | 36 ++
 kernel/dma/coherent.c | 10 +--
 kernel/dma/mapping.c  | 66 +
 23 files changed, 265 insertions(+), 115 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index bdd80ddbca34..c21893f683b5 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -35,8 +35,11 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 #ifndef __arch_pfn_to_dma
 static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
 {
-   if (dev)
-   pfn -= dev->dma_pfn_offset;
+   if (dev) {
+   phys_addr_t paddr = PFN_PHYS(pfn);
+
+   pfn -= PFN_DOWN(dma_offset_from_phys_addr(dev, paddr));
+   }
return (dma_addr_t)__pfn_to_bus(pfn);
 }
 
@@ -45,8 +48,7 @@ static inline unsigned long dma_to_pfn(struct device *dev, 
dma_addr_t addr)
unsigned long pfn = __bus_to_pfn(addr);
 
if (dev)
-   pfn += dev->dma_pfn_offset;
-
+   pfn += PFN_DOWN(dma_offset_from_dma_addr(dev, addr));
return pfn;
 }
 
diff --git a/arch/arm/mach-keystone/keystone.c 
b/arch/arm/mach-keystone/keystone.c
index 638808c4e122..78808942ad1c 100644
--- a/arch/arm/mach-keystone/keystone.c
+++ b/arch/arm/mach-keystone/keystone.c
@@ -8,6 +8,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,8 +25,6 @@
 
 #include "keystone.h"
 
-static unsigned long keystone_dma_pfn_offset __read_mostly;
-
 static int keystone_platform_notifier(struct notifier_block *nb,
  unsigned long event, void *data)
 {
@@ -38,9 +37,12 @@ static int keystone_platform_notifier(struct notifier_block 
*nb,
return NOTIFY_BAD;
 
if (!dev->of_node) {
-   dev->dma_pfn_offset = keystone_dma_pfn_offset;
-   dev_err(dev, "set dma_pfn_offset%08lx\n",
-   dev->dma_pfn_offset);
+   int ret = dma_set_offset_range(dev, KEYSTONE_HIGH_PHYS_START,
+  KEYSTONE_LOW_PHYS_START,
+  KEYSTONE_HIGH_PHYS_SIZE);
+   dev_err(dev, "set dma_offset%08llx%s\n",
+   KEYSTONE_HIGH_PHYS_START - KEYSTONE_LOW_PHYS_START,
+   ret ? " failed" : "");
}
return NOTIFY_OK;
 }
@@ -51,11 +53,8 @@ static struct notifier_block platform_nb = {
 
 static void __init keystone_init(void)
 {
-   if (PHYS_OFFSET >= KEYSTONE_HIGH_PHYS_START) {
-   keystone_dma_pfn_offset = PFN_DOWN(KEYSTONE_HIGH_PHYS_START -
-  KEYSTONE_LOW_PHYS_START);
+   if (PHYS_OFFSET >= KEYSTONE_HIGH_PHYS_START)

[PATCH v11 00/11] PCI: brcmstb: enable PCIe for STB chips

2020-08-24 Thread Jim Quinlan via iommu


Patchset Summary:
  Enhance a PCIe host controller driver.  Because of its unusual design
  we are foced to change dev->dma_pfn_offset into a more general role
  allowing multiple offsets.  See the 'v1' notes below for more info.

v11:
  Commit: "device-mapping: Introduce DMA range map, supplanting ..."
  -- Rebased to latest torvalds, Aug 20, 2020.
  -- Minor change in of_dma_get_range() to satisfy the kernel's
 robot tester.
  -- Use of PFN_DOWN(), PFN_PHYS() instead of explicit shifts (Andy)
  -- Eliminate extra return in dma_offset_from_xxx_addr() (Andy)
  -- Change dma_set_offset_range() to correctly handle the case
 of pre-existing DMA map and zero offset.

v10: 
  Commit: "device-mapping: Introduce DMA range map, supplanting ..."
  -- change title of commit; "bus core:" => "device-mapping:"
  -- instead of allocating the DMA map with devm, use kcalloc
 and call kfree() during device_release().  (RobH) Also,
 for three cases that want to use the same DMA map, copy
 the dma_range_map using a helper function.
  -- added a missing 'return = 0;' to of_dma_get_range().  (Nicolas)
  -- removed dma_range_overlaps(); instead return error if there
 is an existing DMA map. (Christoph).
  Commit: "PCI: brcmstb: Set additional internal memory DMA ..."
  -- Changed constant 1 to 1ULL. (Nicolas)
  Commit: "ata: ahci_brcm: Fix use of BCM7216 reset controller"
 This commit has been removed from this patchset and will be
 submitted on its own.

v9:
  Commit: "device core: Introduce DMA range map, supplanting ..."
  -- A number of code improvements were implemented as suggested by
 ChristophH.  Unfortunately, some of these changes reversed the
 implemented suggestions of other reviewers; for example, the new
 macros PFN_DMA_ADDR(), DMA_ADDR_PFN() have been pulled.

v8:
  Commit: "device core: Introduce DMA range map, supplanting ..."
  -- To satisfy a specific m68 compile configuration, I moved the 'struct
 bus_dma_region; definition out of #ifdef CONFIG_HAS_DMA and also defined
 three inline functions for !CONFIG_HAS_DMA (kernel test robot).
  -- The sunXi drivers -- suc4i_csi, sun6i_csi, cedrus_hw -- set
 a pfn_offset outside of_dma_configure() but the code offers no 
 insight on the size of the translation window.  V7 had me using
 SIZE_MAX as the size.  I have since contacted the sunXi maintainer and
 he said that using a size of SZ_4G would cover sunXi configurations.

v7:
  Commit: "device core: Introduce DMA range map, supplanting ..."
  -- remove second kcalloc/copy in device.c (AndyS)
  -- use PTR_ERR_OR_ZERO() and PHYS_PFN() (AndyS)
  -- indentation, sizeof(struct ...) => sizeof(*r) (AndyS)
  -- add pfn.h definitions: PFN_DMA_ADDR(), DMA_ADDR_PFN() (AndyS)
  -- Fixed compile error in "sun6i_csi.c" (kernel test robot)
  Commit "ata: ahci_brcm: Fix use of BCM7216 reset controller"
  -- correct name of function in the commit msg (SergeiS)
  
v6:
  Commit "device core: Introduce DMA range map":
  -- of_dma_get_range() now takes a single argument and returns either
 NULL, a valid map, or an ERR_PTR. (Robin)
  -- offsets are no longer a PFN value but an actual address. (Robin)
  -- the bus_dma_region struct stores the range size instead of
 the cpu_end and pci_end values. (Robin)
  -- devices that were setting a single offset with no boundaries
 have been modified to have boundaries; in a few places
 where this information was unavilable a /* FIXME: ... */
 comment was added. (Robin)
  -- dma_attach_offset_range() can be called when an offset
 map already exists; if it's range is already present
 nothing is done and success is returned. (Robin)
  All commits:
  -- Man name/style/corrections/etc changed (Bjorn)
  -- rebase to Torvalds master

v5:
  Commit "device core: Introduce multiple dma pfn offsets"
  -- in of/address.c: "map_size = 0" => "*map_size = 0"
  -- use kcalloc instead of kzalloc (AndyS)
  -- use PHYS_ADDR_MAX instead of "~(phys_addr_t)0"
  Commit "PCI: brcmstb: Set internal memory viewport sizes"
  -- now gives error on missing dma-ranges property.
  Commit "dt-bindings: PCI: Add bindings for more Brcmstb chips"
  -- removed "Allof:" from brcm,scb-sizes definition (RobH)
  All Commits:
  -- indentation style, use max chars 100 (AndyS)
  -- rebased to torvalds master

v4:
  Commit "device core: Introduce multiple dma pfn offsets"
  -- of_dma_get_range() does not take a dev param but instead
 takes two "out" params: map and map_size.  We do this so
 that the code that parses dma-ranges is separate from
 the code that modifies 'dev'.   (Nicolas)
  -- the separate case of having a single pfn offset has
 been removed and is now processed by going through the
 map array. (Nicolas)
  -- move attach_uniform_dma_pfn_offset() from of/address.c to
 dma/mapping.c so that it does not depend on CONFIG_OF. (Nicolas)
  -- devm_kcalloc => devm_kzalloc (DanC)
  -- add/fix assignment to 

Re: [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init

2020-08-24 Thread Thomas Gleixner
On Mon, Aug 24 2020 at 06:59, Jürgen Groß wrote:
> On 21.08.20 02:24, Thomas Gleixner wrote:
>> +static __init void xen_setup_pci_msi(void)
>> +{
>> +if (xen_initial_domain()) {
>> +x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
>> +x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
>> +x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
>> +pci_msi_ignore_mask = 1;
>
> This is wrong, as a PVH initial domain shouldn't do the pv settings.
>
> The "if (xen_initial_domain())" should be inside the pv case, like:
>
> if (xen_pv_domain()) {
>   if (xen_initial_domain()) {
>   ...
>   } else {
>   ...
>   }
> } else if (xen_hvm_domain()) {
>   ...

I still think it does the right thing depending on the place it is
called from, but even if so, it's completely unreadable gunk. I'll fix
that proper.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 11/18] iommu/omap: Add IOMMU_DOMAIN_DMA support

2020-08-24 Thread Suman Anna via iommu
Hi Robin,

On 8/20/20 10:08 AM, Robin Murphy wrote:
> Now that arch/arm is wired up for default domains and iommu-dma,
> implement the corresponding driver-side support for DMA domains.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/omap-iommu.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
> index 71f29c0927fc..ea25c2fe0418 100644
> --- a/drivers/iommu/omap-iommu.c
> +++ b/drivers/iommu/omap-iommu.c
> @@ -9,6 +9,7 @@
>   *   Paul Mundt and Toshihiro Kobayashi
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1574,13 +1575,19 @@ static struct iommu_domain 
> *omap_iommu_domain_alloc(unsigned type)
>  {
>   struct omap_iommu_domain *omap_domain;
>  
> - if (type != IOMMU_DOMAIN_UNMANAGED)
> + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>   return NULL;
>  
>   omap_domain = kzalloc(sizeof(*omap_domain), GFP_KERNEL);
>   if (!omap_domain)
>   return NULL;
>  
> + if (type == IOMMU_DOMAIN_DMA &&
> + iommu_get_dma_cookie(_domain->domain)) {
> + kfree(omap_domain);
> + return NULL;
> + }
> +
>   spin_lock_init(_domain->lock);
>  
>   omap_domain->domain.geometry.aperture_start = 0;
> @@ -1601,6 +1608,7 @@ static void omap_iommu_domain_free(struct iommu_domain 
> *domain)
>   if (omap_domain->dev)
>   _omap_iommu_detach_dev(omap_domain, omap_domain->dev);
>  
> + iommu_put_dma_cookie(_domain->domain);
>   kfree(omap_domain);
>  }
>  
> @@ -1736,6 +1744,17 @@ static struct iommu_group 
> *omap_iommu_device_group(struct device *dev)
>   return group;
>  }
>  
> +static int omap_iommu_of_xlate(struct device *dev,
> +struct of_phandle_args *args)
> +{
> + /*
> +  * Logically, some of the housekeeping from _omap_iommu_add_device()
> +  * should probably move here, but the minimum we *need* is simply to
> +  * cooperate with of_iommu at all to let iommu-dma work.
> +  */
> + return 0;
> +}
> +

I have tested this series, and it is breaking the OMAP remoteproc functionality.
We definitely need some more plumbing. I am currently getting MMU faults and
also the DMA allocated addresses are not coming from the device-specific CMA
pools (opposite of what Sakari has reported with OMAP3 ISP). Just removing the
of_xlate gets me back the expected allocations, and no MMU faults, but I don't
see any valid traces.

The MMU devices that the OMAP IOMMU driver deals with are not traditional
bus-level IOMMU devices, but local MMU devices that are present within a remote
processor sub-system or hardware accelerator (eg: OMAP3 ISP). The usage is also
slightly different between remoteprocs and OMAP3 ISP. The former uses the CMA
pools and iommu_map/unmap API (UNMANAGED iommu domain), as the allocated regions
need to be mapped using specific device addresses adhering to the firmware
linker map, while OMAP3 ISP uses it like a traditional DMA pool.

regards
Suman

>  static const struct iommu_ops omap_iommu_ops = {
>   .domain_alloc   = omap_iommu_domain_alloc,
>   .domain_free= omap_iommu_domain_free,
> @@ -1747,6 +1766,7 @@ static const struct iommu_ops omap_iommu_ops = {
>   .probe_device   = omap_iommu_probe_device,
>   .release_device = omap_iommu_release_device,
>   .device_group   = omap_iommu_device_group,
> + .of_xlate   = omap_iommu_of_xlate,
>   .pgsize_bitmap  = OMAP_IOMMU_PGSIZES,
>  };
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu