Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-20 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  wrote:
>
> Qualcomm SoCs have an additional level of cache called as
> System cache, aka. Last level cache (LLC). This cache sits right
> before the DDR, and is tightly coupled with the memory controller.
> The clients using this cache request their slices from this
> system cache, make it active, and can then start using it.
> For these clients with smmu, to start using the system cache for
> buffers and, related page tables [1], memory attributes need to be
> set accordingly. This series add the required support.
>

Does this actually improve performance on reads from a device? The
non-cache coherent DMA routines perform an unconditional D-cache
invalidate by VA to the PoC before reading from the buffers filled by
the device, and I would expect the PoC to be defined as lying beyond
the LLC to still guarantee the architected behavior.



> This change is a realisation of following changes from downstream msm-4.9:
> iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2]
> iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3]
>
> Changes since v2:
>  - Split the patches into io-pgtable-arm driver and arm-smmu driver.
>  - Converted smmu domain attributes to a bitmap, so multiple attributes
>can be managed easily.
>  - With addition of non-coherent page table mapping support [4], this
>patch series now aligns with the understanding of upgrading the
>non-coherent devices to use some level of outer cache.
>  - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE.
>  - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on
>stage-1 mapping.
>  - Added change to disable the attribute from arm_smmu_domain_set_attr()
>when needed.
>  - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API
>level.
>
> Goes on top of the non-coherent page tables support patch series [4]
>
> [1] https://patchwork.kernel.org/patch/10302791/
> [2] 
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=bf762276796e79ca90014992f4d9da5593fa7d51
> [3] 
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602
> [4] https://lore.kernel.org/patchwork/cover/1032938/
>
> Vivek Gautam (3):
>   iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
>   iommu/io-pgtable-arm: Add support to use system cache
>   iommu/arm-smmu: Add support to use system cache
>
>  drivers/iommu/arm-smmu.c   | 28 
>  drivers/iommu/io-pgtable-arm.c | 15 +--
>  drivers/iommu/io-pgtable.h |  4 
>  include/linux/iommu.h  |  2 ++
>  4 files changed, 43 insertions(+), 6 deletions(-)
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings

2019-01-20 Thread Vivek Gautam
Hi Will,


On Sun, Jan 20, 2019 at 5:31 AM Will Deacon  wrote:
>
> On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> > Adding a device tree option for arm smmu to enable non-cacheable
> > memory for page tables.
> > We already enable a smmu feature for coherent walk based on
> > whether the smmu device is dma-coherent or not. Have an option
> > to enable non-cacheable page table memory to force set it for
> > particular smmu devices.
>
> Hmm, I must be missing something here. What is the difference between this
> new property, and simply omitting dma-coherent on the SMMU?

So, this is what I understood from the email thread for Last level
cache support -
Robin pointed to the fact that we may need to add support for setting
non-cacheable
mappings in the TCR.
Currently, we don't do that for SMMUs that omit dma-coherent.
We rely on the interconnect to handle the configuration set in TCR,
and let interconnect
ignore the cacheability if it can't support.

Moreover, Robin suggested that we should take care of SMMUs, for which
removing snoop latency on walks by making mappings as non-cacheable
outweighs the cost of cache maintenance on PTE updates.

So, this change adds another property to do this non-cacheable mappings
explicitly. As I pointed, omitting 'dma-coherent', and corresponding
IO_PGTABLE_QUIRK_NO_DMA' does takes care of few things.

Should we handle the TCR settings too with this quirk?

Regards
Vivek
>
> Will
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu/io-pgtable-arm: Add support to use system cache

2019-01-20 Thread Vivek Gautam
Few Qualcomm platforms such as, sdm845 have an additional outer
cache called as System cache, aka. Last level cache (LLC) that
allows non-coherent devices to upgrade to using caching.

There is a fundamental assumption that non-coherent devices can't
access caches. This change adds an exception where they *can* use
some level of cache despite still being non-coherent overall.
The coherent devices that use cacheable memory, and CPU make use of
this system cache by default.

Looking at memory types, we have following -
a) Normal uncached :- MAIR 0x44, inner non-cacheable,
  outer non-cacheable;
b) Normal cached :-   MAIR 0xff, inner read write-back non-transient,
  outer read write-back non-transient;
  attribute setting for coherenet I/O devices.
and, for non-coherent i/o devices that can allocate in system cache
another type gets added -
c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable,
outer read write-back non-transient

Coherent I/O devices use system cache by marking the memory as
normal cached.
Non-coherent I/O devices should mark the memory as normal
sys-cached in page tables to use system cache.

Signed-off-by: Vivek Gautam 
---
 drivers/iommu/io-pgtable-arm.c | 15 +--
 drivers/iommu/io-pgtable.h |  4 
 include/linux/iommu.h  |  2 ++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index c76919c30f1a..0e55772702da 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -168,10 +168,12 @@
 #define ARM_LPAE_MAIR_ATTR_MASK0xff
 #define ARM_LPAE_MAIR_ATTR_DEVICE  0x04
 #define ARM_LPAE_MAIR_ATTR_NC  0x44
+#define ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE  0xf4
 #define ARM_LPAE_MAIR_ATTR_WBRWA   0xff
 #define ARM_LPAE_MAIR_ATTR_IDX_NC  0
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE   1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2
+#define ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE  3
 
 /* IOPTE accessors */
 #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
@@ -443,6 +445,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+   else if (prot & IOMMU_QCOM_SYS_CACHE)
+   pte |= (ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE
+   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
} else {
pte = ARM_LPAE_PTE_HAP_FAULT;
if (prot & IOMMU_READ)
@@ -781,7 +786,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
 
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
IO_PGTABLE_QUIRK_NON_STRICT |
-   IO_PGTABLE_QUIRK_NON_COHERENT))
+   IO_PGTABLE_QUIRK_NON_COHERENT |
+   IO_PGTABLE_QUIRK_QCOM_SYS_CACHE))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
@@ -794,6 +800,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
   ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
+   else if (cfg->quirks & IO_PGTABLE_QUIRK_QCOM_SYS_CACHE)
+   reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
+ ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
else
reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
   ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
@@ -848,7 +857,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
  (ARM_LPAE_MAIR_ATTR_WBRWA
   << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
  (ARM_LPAE_MAIR_ATTR_DEVICE
-  << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
+  << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
+ (ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE
+  << 
ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE));
 
cfg->arm_lpae_s1_cfg.mair[0] = reg;
cfg->arm_lpae_s1_cfg.mair[1] = 0;
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 46604cf7b017..fb237e8aa9f1 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -80,6 +80,9 @@ struct io_pgtable_cfg {
 *  pagetables even on a coherent SMMU for cases where reducing
 *  snoop traffic/latency on walks outweighs the cost of cache
 *  maintenance on PTE updates.
+*
+* IO_PGTABLE_QUIRK_QCOM_SYS_CACHE: Force using outer system cache
+*  for 

[PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-20 Thread Vivek Gautam
Qualcomm SoCs have an additional level of cache called as
System cache, aka. Last level cache (LLC). This cache sits right
before the DDR, and is tightly coupled with the memory controller.
The clients using this cache request their slices from this
system cache, make it active, and can then start using it.
For these clients with smmu, to start using the system cache for
buffers and, related page tables [1], memory attributes need to be
set accordingly. This series add the required support.

This change is a realisation of following changes from downstream msm-4.9:
iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2]
iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3]

Changes since v2:
 - Split the patches into io-pgtable-arm driver and arm-smmu driver.
 - Converted smmu domain attributes to a bitmap, so multiple attributes
   can be managed easily.
 - With addition of non-coherent page table mapping support [4], this
   patch series now aligns with the understanding of upgrading the
   non-coherent devices to use some level of outer cache.
 - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE.
 - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on
   stage-1 mapping.
 - Added change to disable the attribute from arm_smmu_domain_set_attr()
   when needed.
 - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API
   level.

Goes on top of the non-coherent page tables support patch series [4]

[1] https://patchwork.kernel.org/patch/10302791/
[2] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=bf762276796e79ca90014992f4d9da5593fa7d51
[3] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602
[4] https://lore.kernel.org/patchwork/cover/1032938/

Vivek Gautam (3):
  iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
  iommu/io-pgtable-arm: Add support to use system cache
  iommu/arm-smmu: Add support to use system cache

 drivers/iommu/arm-smmu.c   | 28 
 drivers/iommu/io-pgtable-arm.c | 15 +--
 drivers/iommu/io-pgtable.h |  4 
 include/linux/iommu.h  |  2 ++
 4 files changed, 43 insertions(+), 6 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes

2019-01-20 Thread Vivek Gautam
A number of arm_smmu_domain's attributes can be assigned based
on the iommu domains's attributes. These local attributes better
be managed by a bitmap.
So remove boolean flags and move to a 32-bit bitmap, and enable
each bits separtely.

Signed-off-by: Vivek Gautam 
---
 drivers/iommu/arm-smmu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 7ebbcf1b2eb3..52b300dfc096 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -257,10 +257,11 @@ struct arm_smmu_domain {
const struct iommu_gather_ops   *tlb_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
-   boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
+#define ARM_SMMU_DOMAIN_ATTR_NON_STRICTBIT(0)
+   unsigned intattr;
 };
 
 struct arm_smmu_option_prop {
@@ -901,7 +902,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
 
-   if (smmu_domain->non_strict)
+   if (smmu_domain->attr & ARM_SMMU_DOMAIN_ATTR_NON_STRICT)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
/* Non coherent page table mappings only for Stage-1 */
@@ -1598,7 +1599,8 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
+   *(int *)data = !!(smmu_domain->attr &
+ ARM_SMMU_DOMAIN_ATTR_NON_STRICT);
return 0;
default:
return -ENODEV;
@@ -1638,7 +1640,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
+   smmu_domain->attr |= ARM_SMMU_DOMAIN_ATTR_NON_STRICT;
break;
default:
ret = -ENODEV;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu