Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache
On Mon, 21 Jan 2019 at 06:54, Vivek Gautam wrote: > > Qualcomm SoCs have an additional level of cache called as > System cache, aka. Last level cache (LLC). This cache sits right > before the DDR, and is tightly coupled with the memory controller. > The clients using this cache request their slices from this > system cache, make it active, and can then start using it. > For these clients with smmu, to start using the system cache for > buffers and, related page tables [1], memory attributes need to be > set accordingly. This series add the required support. > Does this actually improve performance on reads from a device? The non-cache coherent DMA routines perform an unconditional D-cache invalidate by VA to the PoC before reading from the buffers filled by the device, and I would expect the PoC to be defined as lying beyond the LLC to still guarantee the architected behavior. > This change is a realisation of following changes from downstream msm-4.9: > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] > > Changes since v2: > - Split the patches into io-pgtable-arm driver and arm-smmu driver. > - Converted smmu domain attributes to a bitmap, so multiple attributes >can be managed easily. > - With addition of non-coherent page table mapping support [4], this >patch series now aligns with the understanding of upgrading the >non-coherent devices to use some level of outer cache. > - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE. > - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on >stage-1 mapping. > - Added change to disable the attribute from arm_smmu_domain_set_attr() >when needed. > - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API >level. > > Goes on top of the non-coherent page tables support patch series [4] > > [1] https://patchwork.kernel.org/patch/10302791/ > [2] > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=bf762276796e79ca90014992f4d9da5593fa7d51 > [3] > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602 > [4] https://lore.kernel.org/patchwork/cover/1032938/ > > Vivek Gautam (3): > iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes > iommu/io-pgtable-arm: Add support to use system cache > iommu/arm-smmu: Add support to use system cache > > drivers/iommu/arm-smmu.c | 28 > drivers/iommu/io-pgtable-arm.c | 15 +-- > drivers/iommu/io-pgtable.h | 4 > include/linux/iommu.h | 2 ++ > 4 files changed, 43 insertions(+), 6 deletions(-) > > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > of Code Aurora Forum, hosted by The Linux Foundation > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
Hi Will, On Sun, Jan 20, 2019 at 5:31 AM Will Deacon wrote: > > On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote: > > Adding a device tree option for arm smmu to enable non-cacheable > > memory for page tables. > > We already enable a smmu feature for coherent walk based on > > whether the smmu device is dma-coherent or not. Have an option > > to enable non-cacheable page table memory to force set it for > > particular smmu devices. > > Hmm, I must be missing something here. What is the difference between this > new property, and simply omitting dma-coherent on the SMMU? So, this is what I understood from the email thread for Last level cache support - Robin pointed to the fact that we may need to add support for setting non-cacheable mappings in the TCR. Currently, we don't do that for SMMUs that omit dma-coherent. We rely on the interconnect to handle the configuration set in TCR, and let interconnect ignore the cacheability if it can't support. Moreover, Robin suggested that we should take care of SMMUs, for which removing snoop latency on walks by making mappings as non-cacheable outweighs the cost of cache maintenance on PTE updates. So, this change adds another property to do this non-cacheable mappings explicitly. As I pointed, omitting 'dma-coherent', and corresponding IO_PGTABLE_QUIRK_NO_DMA' does takes care of few things. Should we handle the TCR settings too with this quirk? Regards Vivek > > Will > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/3] iommu/io-pgtable-arm: Add support to use system cache
Few Qualcomm platforms such as, sdm845 have an additional outer cache called as System cache, aka. Last level cache (LLC) that allows non-coherent devices to upgrade to using caching. There is a fundamental assumption that non-coherent devices can't access caches. This change adds an exception where they *can* use some level of cache despite still being non-coherent overall. The coherent devices that use cacheable memory, and CPU make use of this system cache by default. Looking at memory types, we have following - a) Normal uncached :- MAIR 0x44, inner non-cacheable, outer non-cacheable; b) Normal cached :- MAIR 0xff, inner read write-back non-transient, outer read write-back non-transient; attribute setting for coherenet I/O devices. and, for non-coherent i/o devices that can allocate in system cache another type gets added - c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable, outer read write-back non-transient Coherent I/O devices use system cache by marking the memory as normal cached. Non-coherent I/O devices should mark the memory as normal sys-cached in page tables to use system cache. Signed-off-by: Vivek Gautam --- drivers/iommu/io-pgtable-arm.c | 15 +-- drivers/iommu/io-pgtable.h | 4 include/linux/iommu.h | 2 ++ 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index c76919c30f1a..0e55772702da 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -168,10 +168,12 @@ #define ARM_LPAE_MAIR_ATTR_MASK0xff #define ARM_LPAE_MAIR_ATTR_DEVICE 0x04 #define ARM_LPAE_MAIR_ATTR_NC 0x44 +#define ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE 0xf4 #define ARM_LPAE_MAIR_ATTR_WBRWA 0xff #define ARM_LPAE_MAIR_ATTR_IDX_NC 0 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE 1 #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2 +#define ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE 3 /* IOPTE accessors */ #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d)) @@ -443,6 +445,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, else if (prot & IOMMU_CACHE) pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE << ARM_LPAE_PTE_ATTRINDX_SHIFT); + else if (prot & IOMMU_QCOM_SYS_CACHE) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE + << ARM_LPAE_PTE_ATTRINDX_SHIFT); } else { pte = ARM_LPAE_PTE_HAP_FAULT; if (prot & IOMMU_READ) @@ -781,7 +786,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA | IO_PGTABLE_QUIRK_NON_STRICT | - IO_PGTABLE_QUIRK_NON_COHERENT)) + IO_PGTABLE_QUIRK_NON_COHERENT | + IO_PGTABLE_QUIRK_QCOM_SYS_CACHE)) return NULL; data = arm_lpae_alloc_pgtable(cfg); @@ -794,6 +800,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT) reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT | ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT; + else if (cfg->quirks & IO_PGTABLE_QUIRK_QCOM_SYS_CACHE) + reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT | + ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT; else reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT | ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT; @@ -848,7 +857,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) (ARM_LPAE_MAIR_ATTR_WBRWA << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) | (ARM_LPAE_MAIR_ATTR_DEVICE - << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)); + << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) | + (ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE + << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE)); cfg->arm_lpae_s1_cfg.mair[0] = reg; cfg->arm_lpae_s1_cfg.mair[1] = 0; diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h index 46604cf7b017..fb237e8aa9f1 100644 --- a/drivers/iommu/io-pgtable.h +++ b/drivers/iommu/io-pgtable.h @@ -80,6 +80,9 @@ struct io_pgtable_cfg { * pagetables even on a coherent SMMU for cases where reducing * snoop traffic/latency on walks outweighs the cost of cache * maintenance on PTE updates. +* +* IO_PGTABLE_QUIRK_QCOM_SYS_CACHE: Force using outer system cache +* for
[PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache
Qualcomm SoCs have an additional level of cache called as System cache, aka. Last level cache (LLC). This cache sits right before the DDR, and is tightly coupled with the memory controller. The clients using this cache request their slices from this system cache, make it active, and can then start using it. For these clients with smmu, to start using the system cache for buffers and, related page tables [1], memory attributes need to be set accordingly. This series add the required support. This change is a realisation of following changes from downstream msm-4.9: iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] Changes since v2: - Split the patches into io-pgtable-arm driver and arm-smmu driver. - Converted smmu domain attributes to a bitmap, so multiple attributes can be managed easily. - With addition of non-coherent page table mapping support [4], this patch series now aligns with the understanding of upgrading the non-coherent devices to use some level of outer cache. - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE. - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on stage-1 mapping. - Added change to disable the attribute from arm_smmu_domain_set_attr() when needed. - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API level. Goes on top of the non-coherent page tables support patch series [4] [1] https://patchwork.kernel.org/patch/10302791/ [2] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=bf762276796e79ca90014992f4d9da5593fa7d51 [3] https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602 [4] https://lore.kernel.org/patchwork/cover/1032938/ Vivek Gautam (3): iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes iommu/io-pgtable-arm: Add support to use system cache iommu/arm-smmu: Add support to use system cache drivers/iommu/arm-smmu.c | 28 drivers/iommu/io-pgtable-arm.c | 15 +-- drivers/iommu/io-pgtable.h | 4 include/linux/iommu.h | 2 ++ 4 files changed, 43 insertions(+), 6 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/3] iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
A number of arm_smmu_domain's attributes can be assigned based on the iommu domains's attributes. These local attributes better be managed by a bitmap. So remove boolean flags and move to a 32-bit bitmap, and enable each bits separtely. Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 7ebbcf1b2eb3..52b300dfc096 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -257,10 +257,11 @@ struct arm_smmu_domain { const struct iommu_gather_ops *tlb_ops; struct arm_smmu_cfg cfg; enum arm_smmu_domain_stage stage; - boolnon_strict; struct mutexinit_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ struct iommu_domain domain; +#define ARM_SMMU_DOMAIN_ATTR_NON_STRICTBIT(0) + unsigned intattr; }; struct arm_smmu_option_prop { @@ -901,7 +902,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA; - if (smmu_domain->non_strict) + if (smmu_domain->attr & ARM_SMMU_DOMAIN_ATTR_NON_STRICT) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; /* Non coherent page table mappings only for Stage-1 */ @@ -1598,7 +1599,8 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = smmu_domain->non_strict; + *(int *)data = !!(smmu_domain->attr & + ARM_SMMU_DOMAIN_ATTR_NON_STRICT); return 0; default: return -ENODEV; @@ -1638,7 +1640,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - smmu_domain->non_strict = *(int *)data; + smmu_domain->attr |= ARM_SMMU_DOMAIN_ATTR_NON_STRICT; break; default: ret = -ENODEV; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu