Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation
> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + dma_cache_wback(paddr, size); > +} > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size) > { > + dma_cache_wback_inv(paddr, size); > +} There are the only calls for the three functions for each of the involved functions. So I'd rather rename the low-level symbols (and drop the pointless exports for two of them) rather than adding these wrapppers. The same is probably true for many other architectures. > +static inline bool arch_sync_dma_clean_before_fromdevice(void) > +{ > + return false; > +} > > +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void) > +{ > + return true; > } Is there a way to cut down on this boilerplate code by just having sane default, and Kconfig options to override them if they are not runtime decisions? > +#include I can't really say I like the #include version here despite your rationale in the commit log. I can probably live with it if you think it is absolutely worth it, but I'm really not in favor of it. > +config ARCH_DMA_MARK_DCACHE_CLEAN > + def_bool y What do we need this symbol for? Unless I'm missing something it is always enable for arm32, and only used in arm32 code. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules
On Mon, Mar 27, 2023 at 5:14 AM Arnd Bergmann wrote: > > From: Arnd Bergmann > > xtensa is one of the platforms that has both write-back and write-through > caches, and needs to account for both in its DMA mapping operations. > > It does this through a set of operations that is different from any > architecture. This is not a problem by itself, but it makes it rather > hard to figure out whether this is correct or not, and to unify this > implementation with the others. > > Change the semantics to the usual ones for non-speculating CPUs: > > - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the >writeback even on writethrough caches, where this is a nop. > > - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather >than afterwards. > > - On DMA_BIDIRECTIONAL, combine the pre-writeback with the >post-invalidate into a call to __flush_invalidate_dcache_range() >that turns into a simple invalidate on writeback caches. > > Signed-off-by: Arnd Bergmann > --- > arch/xtensa/Kconfig | 1 - > arch/xtensa/include/asm/cacheflush.h | 6 +++--- > arch/xtensa/kernel/pci-dma.c | 29 +--- > 3 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Max Filippov -- Thanks. -- Max ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper
On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. ... because this is an arm32 specific feature. Generically, it's PG_arch_1, which is a page flag free for architecture use. On arm32 we decided to use this to mark whether we can skip dcache writebacks when establishing a PTE - and thus it was decided to call it PG_dcache_clean to reflect how arm32 decided to use that bit. This isn't just a DMA thing, there are other places that we update the bit, such as flush_dcache_page() and copy_user_highpage(). So thinking that the arm32 PG_dcache_clean is something for DMA is actually wrong. Other architectures are free to do their own other optimisations using that bit, and their implementations may be DMA-centric. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device
On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann wrote: > > From: Arnd Bergmann > > csky is the only architecture that does a full flush for the > dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement > is only make sure there are no dirty cache lines for the buffer, > which can be either done through an invalidate operation (as on most > architectures including arm32, mips and arc), or a writeback (as on > arm64 and riscv). The cache also has to be invalidated eventually but > csky already does that after the transfer. > > Use a 'clean' operation here for consistency with arm64 and riscv. > > Signed-off-by: Arnd Bergmann > --- > arch/csky/mm/dma-mapping.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > index 82447029feb4..c90f912e2822 100644 > --- a/arch/csky/mm/dma-mapping.c > +++ b/arch/csky/mm/dma-mapping.c > @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t > size, > { > switch (dir) { > case DMA_TO_DEVICE: > - cache_op(paddr, size, dma_wb_range); > - break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - cache_op(paddr, size, dma_wbinv_range); > + cache_op(paddr, size, dma_wb_range); Reviewed-by: Guo Ren > break; > default: > BUG(); > -- > 2.39.2 > -- Best Regards Guo Ren ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range
On Mon, Mar 27, 2023 at 02:13:12PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann > > These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: > remove dmac_clean_range and dmac_inv_range") in an effort to sanitize > the dma-mapping API. Really no, please no. Let's not go back to this, let's keep the buffer ownership model that came at around that time. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing
On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote: > Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : >> From: Arnd Bergmann >> >> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other >> architectures. Reduce it to what everyone else does: >> >> - No flush is needed after data has been sent to a device >> >> - When data has been received from a device, the cache only needs to >> be invalidated to clear out cache lines that were speculatively >> prefetched. >> >> In particular, the second flushing of partial cache lines of bidirectional >> buffers is actively harmful -- if a single cache line is written by both >> the CPU and the device, flushing it again does not maintain coherency >> but instead overwrite the data that was just received from the device. > > Hum . Who is right ? > > That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent > memory corruption due to cache invalidation of unaligned DMA buffer") > > I think your commit log should explain why that commit was wrong, and > maybe say that your patch is a revert of that commit ? Ok, I'll try to explain this better. To clarify here: the __dma_sync() function in commit 03d70617b8a7 is used both before and after a DMA, but my patch 05/21 splits this in two, and patch 06/21 only changes the part that gets called after the DMA-from-device but leaves the part before DMA-from-device unchanged, which Andrew's patch addressed. As I mentioned in the cover letter, it is still unclear whether we want to consider this the expected behavior as the documentation seems unclear, but my series does not attempt to answer that question. Arnd ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing
Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : > From: Arnd Bergmann > > The powerpc dma_sync_*_for_cpu() variants do more flushes than on other > architectures. Reduce it to what everyone else does: > > - No flush is needed after data has been sent to a device > > - When data has been received from a device, the cache only needs to > be invalidated to clear out cache lines that were speculatively > prefetched. > > In particular, the second flushing of partial cache lines of bidirectional > buffers is actively harmful -- if a single cache line is written by both > the CPU and the device, flushing it again does not maintain coherency > but instead overwrite the data that was just received from the device. Hum . Who is right ? That behaviour was introduced by commit 03d70617b8a7 ("powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer") I think your commit log should explain why that commit was wrong, and maybe say that your patch is a revert of that commit ? Christophe > > Signed-off-by: Arnd Bergmann > --- > arch/powerpc/mm/dma-noncoherent.c | 18 -- > 1 file changed, 4 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/mm/dma-noncoherent.c > b/arch/powerpc/mm/dma-noncoherent.c > index f10869d27de5..e108cacf877f 100644 > --- a/arch/powerpc/mm/dma-noncoherent.c > +++ b/arch/powerpc/mm/dma-noncoherent.c > @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t > size, > switch (direction) { > case DMA_NONE: > BUG(); > - case DMA_FROM_DEVICE: > - /* > - * invalidate only when cache-line aligned otherwise there is > - * the potential for discarding uncommitted data from the cache > - */ > - if ((start | end) & (L1_CACHE_BYTES - 1)) > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > - else > - __dma_phys_op(start, end, DMA_CACHE_INVAL); > - break; > - case DMA_TO_DEVICE: /* writeback only */ > - __dma_phys_op(start, end, DMA_CACHE_CLEAN); > + case DMA_TO_DEVICE: > break; > - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ > - __dma_phys_op(start, end, DMA_CACHE_FLUSH); > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + __dma_phys_op(start, end, DMA_CACHE_INVAL); > break; > } > } ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper
On 2023-03-27 13:13, Arnd Bergmann wrote: From: Arnd Bergmann The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] In AMBA at least, if a snooping write hits in a cache then the data is most likely going to get routed directly into that cache. If it has write-back write-allocate attributes it could also land in any cache along its normal path to RAM; it wouldn't have to go all the way. Hence all the fun we have where treating a coherent device as non-coherent can still be almost as broken as the other way round :) Cheers, Robin. Signed-off-by: Arnd Bergmann --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++ 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, >flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* -* Mark the D-cache clean for these pages to avoid extra flushing. -*/ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, >flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if
[PATCH 21/21] dma-mapping: replace custom code with generic implementation
From: Arnd Bergmann Now that all of these have consistent behavior, replace them with a single shared implementation of arch_sync_dma_for_device() and arch_sync_dma_for_cpu() and three parameters to pick how they should operate: - If the CPU has speculative prefetching, then the cache has to be invalidated after a transfer from the device. On the rarer CPUs without prefetching, this can be skipped, with all cache management happening before the transfer. This flag can be runtime detected, but is usually fixed per architecture. - Some architectures currently clean the caches before DMA from a device, while others invalidate it. There has not been a conclusion regarding whether we should change all architectures to use clean instead, so this adds an architecture specific flag that we can change later on. - On 32-bit Arm, the arch_sync_dma_for_cpu() function keeps track pages that are marked clean in the page cache, to avoid flushing them again. The implementation for this is generic enough to work on all architectures that use the PG_dcache_clean page flag, but a Kconfig symbol is used to only enable it on Arm to preserve the existing behavior. For the function naming, I picked 'wback' over 'clean', and 'wback_inv' over 'flush', to avoid any ambiguity of what the helper functions are supposed to do. Moving the global functions into a header file is usually a bad idea as it prevents the header from being included more than once, but it helps keep the behavior as close as possible to the previous state, including the possibility of inlining most of it into these functions where that was done before. This also helps keep the global namespace clean, by hiding the new arch_dma_cache{_wback,_inv,_wback_inv} from device drivers that might use them incorrectly. It would be possible to do this one architecture at a time, but as the change is the same everywhere, the combined patch helps explain it better once. Signed-off-by: Arnd Bergmann --- arch/arc/mm/dma.c | 66 +- arch/arm/Kconfig | 3 + arch/arm/mm/dma-mapping-nommu.c | 39 ++- arch/arm/mm/dma-mapping.c | 64 +++--- arch/arm64/mm/dma-mapping.c | 28 +--- arch/csky/mm/dma-mapping.c| 44 ++-- arch/hexagon/kernel/dma.c | 44 ++-- arch/m68k/kernel/dma.c| 43 +++- arch/microblaze/kernel/dma.c | 48 +++--- arch/mips/mm/dma-noncoherent.c| 60 +++-- arch/nios2/mm/dma-mapping.c | 57 +++- arch/openrisc/kernel/dma.c| 63 +++--- arch/parisc/kernel/pci-dma.c | 46 ++--- arch/powerpc/mm/dma-noncoherent.c | 34 ++ arch/riscv/mm/dma-noncoherent.c | 51 +++--- arch/sh/kernel/dma-coherent.c | 43 +++- arch/sparc/kernel/ioport.c| 38 --- arch/xtensa/kernel/pci-dma.c | 40 ++- include/linux/dma-sync.h | 107 ++ 19 files changed, 527 insertions(+), 391 deletions(-) create mode 100644 include/linux/dma-sync.h diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index ddb96786f765..61cd01646222 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -30,63 +30,33 @@ void arch_dma_prep_coherent(struct page *page, size_t size) dma_cache_wback_inv(page_to_phys(page), size); } -/* - * Cache operations depending on function and direction argument, inspired by - * https://lore.kernel.org/lkml/20180518175004.gf17...@n2100.armlinux.org.uk - * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] - * dma-mapping: provide a generic dma-noncoherent implementation)" - * - * | map == for_device | unmap == for_cpu - * | - * TO_DEV | writebackwriteback | none none - * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR| writebackwriteback | invalidateinvalidate - * - * [*] needed for CPU speculative prefetches - * - * NOTE: we don't check the validity of direction argument as it is done in - * upper layer functions (in include/linux/dma-mapping.h) - */ - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) { - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback(paddr, size); - break; - - case DMA_FROM_DEVICE: - dma_cache_inv(paddr, size); - break; - - case DMA_BIDIRECTIONAL: - dma_cache_wback(paddr, size); - break; + dma_cache_wback(paddr, size); +} - default: - break; - } +static inline void arch_dma_cache_inv(phys_addr_t
[PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers
From: Arnd Bergmann As the final step of the conversion to generic arch_sync_dma_* helpers, change the Arm implementation to look the same as the new generic version, by calling the dmac_{clean,inv,flush}_area low-level functions instead of the abstracted dmac_{map,unmap}_area version. On ARMv6/v7, this invalidates the caches after a DMA transfer from a device because of speculative prefetching, while on earlier versions it only needs to do this before the transfer. This should not change any of the current behavior. FIXME: address CONFIG_DMA_CACHE_RWFO properly. Signed-off-by: Arnd Bergmann --- arch/arm/mm/dma-mapping-nommu.c | 11 +++ arch/arm/mm/dma-mapping.c | 53 +++-- 2 files changed, 43 insertions(+), 21 deletions(-) diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index cfd9c933d2f0..12b5c6ae93fc 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -16,12 +16,13 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dmac_map_area(__va(paddr), size, dir); - - if (dir == DMA_FROM_DEVICE) + if (dir == DMA_FROM_DEVICE) { + dmac_inv_range(__va(paddr), __va(paddr + size)); outer_inv_range(paddr, paddr + size); - else + } else { + dmac_clean_range(__va(paddr), __va(paddr + size)); outer_clean_range(paddr, paddr + size); + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, @@ -29,7 +30,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, { if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dmac_unmap_area(__va(paddr), size, dir); + dmac_inv_range(__va(paddr), __va(paddr)); } } diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index ce4b74f34a58..cc702cb27ae7 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -623,8 +623,7 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, } static void dma_cache_maint(phys_addr_t paddr, - size_t size, enum dma_data_direction dir, - void (*op)(const void *, size_t, int)) + size_t size, void (*op)(const void *, const void *)) { unsigned long pfn = PFN_DOWN(paddr); unsigned long offset = paddr % PAGE_SIZE; @@ -647,18 +646,18 @@ static void dma_cache_maint(phys_addr_t paddr, if (cache_is_vipt_nonaliasing()) { vaddr = kmap_atomic(page); - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_atomic(vaddr); } else { vaddr = kmap_high_get(page); if (vaddr) { - op(vaddr + offset, len, dir); + op(vaddr + offset, vaddr + offset + len); kunmap_high(page); } } } else { vaddr = page_address(page) + offset; - op(vaddr, len, dir); + op(vaddr, vaddr + len); } offset = 0; pfn++; @@ -666,6 +665,18 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +static bool arch_sync_dma_cpu_needs_post_dma_flush(void) +{ + if (IS_ENABLED(CONFIG_CPU_V6) || + IS_ENABLED(CONFIG_CPU_V6K) || + IS_ENABLED(CONFIG_CPU_V7) || + IS_ENABLED(CONFIG_CPU_V7M)) + return true; + + /* FIXME: runtime detection */ + return false; +} + /* * Make an area consistent for devices. * Note: Drivers should NOT use this function directly. @@ -674,25 +685,35 @@ static void dma_cache_maint(phys_addr_t paddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_cache_maint(paddr, size, dir, dmac_map_area); - - if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); - } else { + switch (dir) { + case DMA_TO_DEVICE: + dma_cache_maint(paddr, size, dmac_clean_range); outer_clean_range(paddr, paddr + size); + break; + case DMA_FROM_DEVICE: + dma_cache_maint(paddr, size, dmac_inv_range); + outer_inv_range(paddr, paddr + size); + break; + case DMA_BIDIRECTIONAL: + if (arch_sync_dma_cpu_needs_post_dma_flush()) { + dma_cache_maint(paddr, size, dmac_clean_range); + outer_clean_range(paddr, paddr + size); + } else { +
[PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper
From: Arnd Bergmann The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical arch_sync_dma_for_cpu() semantics as all other architectures. Splitting this out has multiple effects: - for dma-direct, this now gets called after arch_sync_dma_for_cpu() for DMA_FROM_DEVICE mappings, but not for DMA_BIDIRECTIONAL. While it would not be harmful to keep doing it for bidirectional mappings, those are apparently not used in any callers that care about the flag. - Since arm has its own dma-iommu abstraction, this now also needs to call the same function, so the calls are added there to mirror the dma-direct version. - Like dma-direct, the dma-iommu version now marks the dcache clean for both coherent and noncoherent devices after a DMA, but it only does this for DMA_FROM_DEVICE, not DMA_BIDIRECTIONAL. [ HELP NEEDED: can anyone confirm that it is a correct assumption on arm that a cache-coherent device writing to a page always results in it being in a PG_dcache_clean state like on ia64, or can a device write directly into the dcache?] Signed-off-by: Arnd Bergmann --- arch/arm/Kconfig | 1 + arch/arm/mm/dma-mapping.c | 71 +++ 2 files changed, 43 insertions(+), 29 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e24a9820e12f..125d58c54ab1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,6 +7,7 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL if MMU + select ARCH_HAS_DMA_MARK_CLEAN if MMU select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cc702cb27ae7..b703cb83d27e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -665,6 +665,28 @@ static void dma_cache_maint(phys_addr_t paddr, } while (left); } +/* + * Mark the D-cache clean for these pages to avoid extra flushing. + */ +void arch_dma_mark_clean(phys_addr_t paddr, size_t size) +{ + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); + size_t left = size; + + if (size < PAGE_SIZE) + return; + + if (off) + left -= PAGE_SIZE - off; + + while (left >= PAGE_SIZE) { + struct page *page = pfn_to_page(pfn++); + set_bit(PG_dcache_clean, >flags); + left -= PAGE_SIZE; + } +} + static bool arch_sync_dma_cpu_needs_post_dma_flush(void) { if (IS_ENABLED(CONFIG_CPU_V6) || @@ -715,24 +737,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, outer_inv_range(paddr, paddr + size); dma_cache_maint(paddr, size, dmac_inv_range); } - - /* -* Mark the D-cache clean for these pages to avoid extra flushing. -*/ - if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn = PFN_UP(paddr); - unsigned long off = paddr & (PAGE_SIZE - 1); - size_t left = size; - - if (off) - left -= PAGE_SIZE - off; - - while (left >= PAGE_SIZE) { - struct page *page = pfn_to_page(pfn++); - set_bit(PG_dcache_clean, >flags); - left -= PAGE_SIZE; - } - } } #ifdef CONFIG_ARM_DMA_USE_IOMMU @@ -1294,6 +1298,17 @@ static int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, return -EINVAL; } +static void arm_iommu_sync_dma_for_cpu(phys_addr_t phys, size_t len, + enum dma_data_direction dir, + bool dma_coherent) +{ + if (!dma_coherent) + arch_sync_dma_for_cpu(phys, s->length, dir); + + if (dir == DMA_FROM_DEVICE) + arch_dma_mark_clean(phys, s->length); +} + /** * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer @@ -1316,8 +1331,9 @@ static void arm_iommu_unmap_sg(struct device *dev, if (sg_dma_len(s)) __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); - if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - arch_sync_dma_for_cpu(sg_phys(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + arm_iommu_sync_dma_for_cpu(sg_phys(s), s->length, dir, + dev->dma_coherent);
[PATCH 18/21] ARM: drop SMP support for ARM11MPCore
From: Arnd Bergmann The cache management operations for noncoherent DMA on ARMv6 work in two different ways: * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight DMA buffers lead to data corruption when the prefetched data is written back on top of data from the device. * When CONFIG_DMA_CACHE_RWFO is disabled, a cache flush on one CPU is not seen by the other core(s), leading to inconsistent contents accross the system. As a consequence, neither configuration is actually safe to use in a general-purpose kernel that is used on both MPCore systems and ARM1176 with prefetching enabled. We could add further workarounds to make the behavior more dynamic based on the system, but realistically, there are close to zero remaining users on any ARM11MPCore anyway, and nobody seems too interested in it, compared to the more popular ARM1176 used in BMC2835 and AST2500. The Oxnas platform has some minimal support in OpenWRT, but most of the drivers and dts files never made it into the mainline kernel, while the Arm Versatile/Realview platform mainly serves as a reference system but is not necessary to be kept working once all other ARM11MPCore are gone. Take the easy way out here and drop support for multiprocessing on ARMv6, along with the CONFIG_DMA_CACHE_RWFO option and the cache management implementation for it. This also helps with other ARMv6 issues, but for the moment leaves the ability to build a kernel that can run on both ARMv7 SMP and single-processor ARMv6, which we probably want to stop supporting as well, but not as part of this series. Cc: Neil Armstrong Cc: Daniel Golle Cc: Linus Walleij Cc: linux-ox...@groups.io Signed-off-by: Arnd Bergmann --- I could use some help clarifying the above changelog text to describe the exact problem, and how the CONFIG_DMA_CACHE_RWFO actually works on MPCore. The TRMs for both 1176 and 11MPCore only describe prefetching into the instruction cache, not the data cache, but this can end up in the outercache as a result. The 1176 has some extra control bits to control prefetching, but I found no reference that explains why an MPCore does not run into the problem. --- arch/arm/mach-oxnas/Kconfig| 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 -- arch/arm/mach-oxnas/platsmp.c | 96 -- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig| 19 - arch/arm/mm/cache-v6.S | 31 --- 7 files changed, 178 deletions(-) delete mode 100644 arch/arm/mach-oxnas/headsmp.S delete mode 100644 arch/arm/mach-oxnas/platsmp.c diff --git a/arch/arm/mach-oxnas/Kconfig b/arch/arm/mach-oxnas/Kconfig index a9ded7079268..a054235c3d6c 100644 --- a/arch/arm/mach-oxnas/Kconfig +++ b/arch/arm/mach-oxnas/Kconfig @@ -28,10 +28,6 @@ config MACH_OX820 bool "Support OX820 Based Products" depends on ARCH_MULTI_V6 select ARM_GIC - select DMA_CACHE_RWFO if SMP - select HAVE_SMP - select HAVE_ARM_SCU if SMP - select HAVE_ARM_TWD if SMP help Include Support for the Oxford Semiconductor OX820 SoC Based Products. diff --git a/arch/arm/mach-oxnas/Makefile b/arch/arm/mach-oxnas/Makefile index 0e78ecfe6c49..a4e40e534e6a 100644 --- a/arch/arm/mach-oxnas/Makefile +++ b/arch/arm/mach-oxnas/Makefile @@ -1,2 +1 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_SMP) += platsmp.o headsmp.o diff --git a/arch/arm/mach-oxnas/headsmp.S b/arch/arm/mach-oxnas/headsmp.S deleted file mode 100644 index 9c0f1479f33a.. --- a/arch/arm/mach-oxnas/headsmp.S +++ /dev/null @@ -1,23 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2013 Ma Haijun - * Copyright (c) 2003 ARM Limited - * All Rights Reserved - */ -#include -#include - - __INIT - -/* - * OX820 specific entry point for secondary CPUs. - */ -ENTRY(ox820_secondary_startup) - mov r4, #0 - /* invalidate both caches and branch target cache */ - mcr p15, 0, r4, c7, c7, 0 - /* -* we've been released from the holding pen: secondary_stack -* should now contain the SVC stack for this core -*/ - b secondary_startup diff --git a/arch/arm/mach-oxnas/platsmp.c b/arch/arm/mach-oxnas/platsmp.c deleted file mode 100644 index f0a50b9e61df.. --- a/arch/arm/mach-oxnas/platsmp.c +++ /dev/null @@ -1,96 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Copyright (C) 2016 Neil Armstrong - * Copyright (C) 2013 Ma Haijun - * Copyright (C) 2002 ARM Ltd. - * All Rights Reserved - */ -#include -#include -#include -#include - -#include -#include -#include -#include - -extern void ox820_secondary_startup(void); - -static void __iomem *cpu_ctrl; -static void __iomem *gic_cpu_ctrl; - -#define HOLDINGPEN_CPU_OFFSET 0xc8 -#define HOLDINGPEN_LOCATION_OFFSET
[PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally
From: Arnd Bergmann The arm specific iommu code in dma-mapping.c uses the page+offset based __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() wrappers around the. In order to be able to move the latter part set of functions into common code, change the iommu implementation to use them directly and remove the internal ones as a separate interface. As page+offset and phys_address are equivalent, but are used in different parts of the code here, this allows removing some of the conversion but adds them elsewhere. Signed-off-by: Arnd Bergmann --- arch/arm/mm/dma-mapping.c | 93 ++- 1 file changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8bc01071474a..ce4b74f34a58 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -622,16 +622,14 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr, kfree(buf); } -static void dma_cache_maint_page(struct page *page, unsigned long offset, +static void dma_cache_maint(phys_addr_t paddr, size_t size, enum dma_data_direction dir, void (*op)(const void *, size_t, int)) { - unsigned long pfn; + unsigned long pfn = PFN_DOWN(paddr); + unsigned long offset = paddr % PAGE_SIZE; size_t left = size; - pfn = page_to_pfn(page) + offset / PAGE_SIZE; - offset %= PAGE_SIZE; - /* * A single sg entry may refer to multiple physically contiguous * pages. But we still need to process highmem pages individually. @@ -641,8 +639,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, do { size_t len = left; void *vaddr; - - page = pfn_to_page(pfn); + struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) { if (len + offset > PAGE_SIZE) @@ -674,14 +671,11 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, * Note: Drivers should NOT use this function directly. * Use the driver DMA support - see dma-mapping.h (dma_sync_*) */ -static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr; + dma_cache_maint(paddr, size, dir, dmac_map_area); - dma_cache_maint_page(page, off, size, dir, dmac_map_area); - - paddr = page_to_phys(page) + off; if (dir == DMA_FROM_DEVICE) { outer_inv_range(paddr, paddr + size); } else { @@ -690,34 +684,30 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, /* FIXME: non-speculating: flush on bidirectional mappings? */ } -static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { - phys_addr_t paddr = page_to_phys(page) + off; - /* FIXME: non-speculating: not required */ /* in any case, don't bother invalidating if DMA to device */ if (dir != DMA_TO_DEVICE) { outer_inv_range(paddr, paddr + size); - dma_cache_maint_page(page, off, size, dir, dmac_unmap_area); + dma_cache_maint(paddr, size, dir, dmac_unmap_area); } /* * Mark the D-cache clean for these pages to avoid extra flushing. */ if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) { - unsigned long pfn; + unsigned long pfn = PFN_UP(paddr); + unsigned long off = paddr & (PAGE_SIZE - 1); size_t left = size; - pfn = page_to_pfn(page) + off / PAGE_SIZE; - off %= PAGE_SIZE; - if (off) { - pfn++; + if (off) left -= PAGE_SIZE - off; - } + while (left >= PAGE_SIZE) { - page = pfn_to_page(pfn++); + struct page *page = pfn_to_page(pfn++); set_bit(PG_dcache_clean, >flags); left -= PAGE_SIZE; } @@ -1204,7 +1194,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, unsigned int len = PAGE_ALIGN(s->offset + s->length); if (!dev->dma_coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + arch_sync_dma_for_device(phys + s->offset, s->length, dir); prot = __dma_info_to_prot(dir, attrs); @@ -1306,8 +1296,7
[PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range
From: Arnd Bergmann These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: remove dmac_clean_range and dmac_inv_range") in an effort to sanitize the dma-mapping API. Now this logic is getting moved into the generic dma-mapping implementation in order to give architectures less control over it, which requires reverting that earlier work. Signed-off-by: Arnd Bergmann --- arch/arm/include/asm/cacheflush.h | 21 + arch/arm/include/asm/glue-cache.h | 4 arch/arm/mm/cache-fa.S| 4 ++-- arch/arm/mm/cache-nop.S | 6 ++ arch/arm/mm/cache-v4.S| 5 + arch/arm/mm/cache-v4wb.S | 4 ++-- arch/arm/mm/cache-v4wt.S | 14 +- arch/arm/mm/cache-v6.S| 4 ++-- arch/arm/mm/cache-v7.S| 6 -- arch/arm/mm/cache-v7m.S | 4 ++-- arch/arm/mm/proc-arm1020.S| 4 ++-- arch/arm/mm/proc-arm1020e.S | 4 ++-- arch/arm/mm/proc-arm1022.S| 4 ++-- arch/arm/mm/proc-arm1026.S| 4 ++-- arch/arm/mm/proc-arm920.S | 4 ++-- arch/arm/mm/proc-arm922.S | 4 ++-- arch/arm/mm/proc-arm925.S | 4 ++-- arch/arm/mm/proc-arm926.S | 4 ++-- arch/arm/mm/proc-arm940.S | 4 ++-- arch/arm/mm/proc-arm946.S | 4 ++-- arch/arm/mm/proc-feroceon.S | 8 arch/arm/mm/proc-macros.S | 2 ++ arch/arm/mm/proc-mohawk.S | 4 ++-- arch/arm/mm/proc-xsc3.S | 4 ++-- arch/arm/mm/proc-xscale.S | 6 -- 25 files changed, 95 insertions(+), 41 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index a094f964c869..04462bfe9130 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -91,6 +91,21 @@ * DMA Cache Coherency * === * + * dma_inv_range(start, end) + * + * Invalidate (discard) the specified virtual address range. + * May not write back any entries. If 'start' or 'end' + * are not cache line aligned, those lines must be written + * back. + * - start - virtual start address + * - end- virtual end address + * + * dma_clean_range(start, end) + * + * Clean (write back) the specified virtual address range. + * - start - virtual start address + * - end- virtual end address + * * dma_flush_range(start, end) * * Clean and invalidate the specified virtual address range. @@ -112,6 +127,8 @@ struct cpu_cache_fns { void (*dma_map_area)(const void *, size_t, int); void (*dma_unmap_area)(const void *, size_t, int); + void (*dma_clean_range)(const void *, const void *); + void (*dma_inv_range)(const void *, const void *); void (*dma_flush_range)(const void *, const void *); } __no_randomize_layout; @@ -137,6 +154,8 @@ extern struct cpu_cache_fns cpu_cache; * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +#define dmac_clean_range cpu_cache.dma_clean_range +#define dmac_inv_range cpu_cache.dma_inv_range #define dmac_flush_range cpu_cache.dma_flush_range #else @@ -156,6 +175,8 @@ extern void __cpuc_flush_dcache_area(void *, size_t); * is visible to DMA, or data written by DMA to system memory is * visible to the CPU. */ +extern void dmac_clean_range(const void *, const void *); +extern void dmac_inv_range(const void *, const void *); extern void dmac_flush_range(const void *, const void *); #endif diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h index 724f8dac1e5b..d8c93b483adf 100644 --- a/arch/arm/include/asm/glue-cache.h +++ b/arch/arm/include/asm/glue-cache.h @@ -139,6 +139,8 @@ static inline int nop_coherent_user_range(unsigned long a, unsigned long b) { return 0; } static inline void nop_flush_kern_dcache_area(void *a, size_t s) { } +static inline void nop_dma_clean_range(const void *a, const void *b) { } +static inline void nop_dma_inv_range(const void *a, const void *b) { } static inline void nop_dma_flush_range(const void *a, const void *b) { } static inline void nop_dma_map_area(const void *s, size_t l, int f) { } @@ -155,6 +157,8 @@ static inline void nop_dma_unmap_area(const void *s, size_t l, int f) { } #define __cpuc_coherent_user_range __glue(_CACHE,_coherent_user_range) #define __cpuc_flush_dcache_area __glue(_CACHE,_flush_kern_dcache_area) +#define dmac_clean_range __glue(_CACHE,_dma_clean_range) +#define dmac_inv_range __glue(_CACHE,_dma_inv_range) #define dmac_flush_range __glue(_CACHE,_dma_flush_range) #endif diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 3a464d1649b4..abc3d58948dd 100644 --- a/arch/arm/mm/cache-fa.S +++
[PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA
From: Arnd Bergmann Most ARM CPUs can have write-back caches and that require cache management to be done in the dma_sync_*_for_device() operation. This is typically done in both writeback and writethrough mode. The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S (arm920t, arm940t) implementations are the exception here, and only do the cache management after the DMA is complete, in the dma_sync_*_for_cpu() operation. Change this for consistency with the other platforms. This should have no user visible effect. Signed-off-by: Arnd Bergmann --- arch/arm/mm/cache-v4.S | 8 arch/arm/mm/cache-v4wt.S | 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 7787057e4990..e2b104876340 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -117,23 +117,23 @@ ENTRY(v4_dma_flush_range) ret lr /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_unmap_area) +ENTRY(v4_dma_map_area) teq r2, #DMA_TO_DEVICE bne v4_dma_flush_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4_dma_map_area) +ENTRY(v4_dma_unmap_area) ret lr ENDPROC(v4_dma_unmap_area) ENDPROC(v4_dma_map_area) diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index 0b290c25a99d..652218752f88 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -172,24 +172,24 @@ v4wt_dma_inv_range: .equv4wt_dma_flush_range, v4wt_dma_inv_range /* - * dma_unmap_area(start, size, dir) + * dma_map_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_unmap_area) +ENTRY(v4wt_dma_map_area) add r1, r1, r0 teq r2, #DMA_TO_DEVICE bne v4wt_dma_inv_range /* FALLTHROUGH */ /* - * dma_map_area(start, size, dir) + * dma_unmap_area(start, size, dir) * - start - kernel virtual start address * - size - size of region * - dir - DMA direction */ -ENTRY(v4wt_dma_map_area) +ENTRY(v4wt_dma_unmap_area) ret lr ENDPROC(v4wt_dma_unmap_area) ENDPROC(v4wt_dma_map_area) -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops
From: Arnd Bergmann non-coherent devices on parisc traditionally use a full flush+invalidate before and after each DMA, which is more expensive that what we do on other architectures. Before transfers to a device, the cache only has to be written back, but apparently there is no operation for this on parisc. There is no need to flush it again after the transfer though. After transfers from a device, the second writeback can be skipped because the CPU was not allowed to write to the buffer anyway, instead a purge (invalidate without flush) can be used. The DMA_FROM_DEVICE is handled differently across architectures, most use only an invalidate (purge) operation, but some have moved to flush in order to preserve dirty data when the device does not write to the buffer, see the link below. As parisc already did the full flush here, keep that behavior. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann --- I'm not really sure I understand the semantics of the 'flush' and 'purge' operations on parisc correctly, please double-check that this makes sense in the context of this architecture. --- arch/parisc/include/asm/cacheflush.h | 6 +- arch/parisc/kernel/pci-dma.c | 25 +++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 0bdee6724132..a4c5042f1821 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -33,8 +33,12 @@ void flush_cache_mm(struct mm_struct *mm); void flush_kernel_dcache_page_addr(const void *addr); +#define clean_kernel_dcache_range(start,size) \ + flush_kernel_dcache_range((start), (size)) #define flush_kernel_dcache_range(start,size) \ - flush_kernel_dcache_range_asm((start), (start)+(size)); + flush_kernel_dcache_range_asm((start), (start)+(size)) +#define purge_kernel_dcache_range(start,size) \ + purge_kernel_dcache_range_asm((start), (start)+(size)) #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1 void flush_kernel_vmap_range(void *vaddr, int size); diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index ba87f791323b..6d3d3cffb316 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -446,11 +446,32 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_FROM_DEVICE: + clean_kernel_dcache_range(virt, size); + break; + case DMA_BIDIRECTIONAL: + flush_kernel_dcache_range(virt, size); + break; + } } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); + unsigned long virt = (unsigned long)phys_to_virt(paddr); + + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + purge_kernel_dcache_range(virt, size); + break; + } } -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA
From: Arnd Bergmann Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. arc is one of the architectures that does both, which seems unnecessary. Change it to behave like arm/arm64/xtensa instead, and use just a writeback before the DMA when we do the invalidate afterwards. Signed-off-by: Arnd Bergmann --- arch/arc/mm/dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 2a7fbbb83b70..ddb96786f765 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -40,7 +40,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) * | * TO_DEV | writebackwriteback | none none * FROM_DEV | invalidate invalidate | invalidate* invalidate* - * BIDIR| writeback+invwriteback+inv | invalidateinvalidate + * BIDIR| writebackwriteback | invalidateinvalidate * * [*] needed for CPU speculative prefetches * @@ -61,7 +61,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv(paddr, size); + dma_cache_wback(paddr, size); break; default: -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 12/21] mips: dma-mapping: split out cache operation logic
From: Arnd Bergmann The mips arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave the same way as on other architectures, but in order to unify the implementations, the code needs to be rearranged to pick the type of cache operation in the outermost function. Signed-off-by: Arnd Bergmann --- arch/mips/mm/dma-noncoherent.c | 75 ++ 1 file changed, 30 insertions(+), 45 deletions(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index b4350faf4f1e..b9d68bcc5d53 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -54,50 +54,13 @@ void *arch_dma_set_uncached(void *addr, size_t size) return (void *)(__pa(addr) + UNCAC_BASE); } -static inline void dma_sync_virt_for_device(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - dma_cache_wback((unsigned long)addr, size); - break; - case DMA_FROM_DEVICE: - dma_cache_inv((unsigned long)addr, size); - break; - case DMA_BIDIRECTIONAL: - if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && - cpu_needs_post_dma_flush()) - dma_cache_wback((unsigned long)addr, size); - else - dma_cache_wback_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - -static inline void dma_sync_virt_for_cpu(void *addr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: - break; - case DMA_FROM_DEVICE: - case DMA_BIDIRECTIONAL: - dma_cache_inv((unsigned long)addr, size); - break; - default: - BUG(); - } -} - /* * A single sg entry may refer to multiple physically contiguous pages. But * we still need to process highmem pages individually. If highmem is not * configured then the bulk of this loop gets optimized out. */ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, - enum dma_data_direction dir, bool for_device) + void(*cache_op)(unsigned long start, unsigned long size)) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned long offset = paddr & ~PAGE_MASK; @@ -113,10 +76,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, } addr = kmap_atomic(page); - if (for_device) - dma_sync_virt_for_device(addr + offset, len, dir); - else - dma_sync_virt_for_cpu(addr + offset, len, dir); + cache_op((unsigned long)addr + offset, len); kunmap_atomic(addr); offset = 0; @@ -128,15 +88,40 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size, void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - dma_sync_phys(paddr, size, dir, true); + switch (dir) { + case DMA_TO_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_wback); + break; + case DMA_FROM_DEVICE: + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + case DMA_BIDIRECTIONAL: + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_wback); + else + dma_sync_phys(paddr, size, _dma_cache_wback_inv); + break; + default: + break; + } } #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - if (cpu_needs_post_dma_flush()) - dma_sync_phys(paddr, size, dir, false); + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + if (cpu_needs_post_dma_flush()) + dma_sync_phys(paddr, size, _dma_cache_inv); + break; + default: + break; + } } #endif -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA
From: Arnd Bergmann Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the DMA. The behavior on mips is slightly inconsistent, as it always does the invalidation before bidirectional DMA and conditionally does it a second time. In order to make the behavior the same as the rest, change it so that there is exactly one invalidation here, either before or after the DMA. Signed-off-by: Arnd Bergmann --- arch/mips/mm/dma-noncoherent.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 3c4fc97b9f39..b4350faf4f1e 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -65,7 +65,11 @@ static inline void dma_sync_virt_for_device(void *addr, size_t size, dma_cache_inv((unsigned long)addr, size); break; case DMA_BIDIRECTIONAL: - dma_cache_wback_inv((unsigned long)addr, size); + if (IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && + cpu_needs_post_dma_flush()) + dma_cache_wback((unsigned long)addr, size); + else + dma_cache_wback_inv((unsigned long)addr, size); break; default: BUG(); -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device
From: Arnd Bergmann csky is the only architecture that does a full flush for the dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement is only make sure there are no dirty cache lines for the buffer, which can be either done through an invalidate operation (as on most architectures including arm32, mips and arc), or a writeback (as on arm64 and riscv). The cache also has to be invalidated eventually but csky already does that after the transfer. Use a 'clean' operation here for consistency with arm64 and riscv. Signed-off-by: Arnd Bergmann --- arch/csky/mm/dma-mapping.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index 82447029feb4..c90f912e2822 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -60,11 +60,9 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, { switch (dir) { case DMA_TO_DEVICE: - cache_op(paddr, size, dma_wb_range); - break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - cache_op(paddr, size, dma_wbinv_range); + cache_op(paddr, size, dma_wb_range); break; default: BUG(); -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA
From: Arnd Bergmann For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned first to let the device see data written by the CPU, and invalidated after the transfer to let the CPU see data written by the device. riscv also invalidates the caches before the transfer, which does not appear to serve any purpose. Signed-off-by: Arnd Bergmann --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 640f4c496d26..69c80b2155a1 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush
From: Arnd Bergmann No other architecture intentionally writes back dirty cache lines into a buffer that a device has just finished writing into. If the cache is clean, this has no effect at all, but if a cacheline in the buffer has actually been written by the CPU, there is a drive bug that is likely made worse by overwriting that buffer. Signed-off-by: Arnd Bergmann --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index d919efab6eba..640f4c496d26 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break; -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op
From: Arnd Bergmann The powerpc implementation of arch_sync_dma_for_device() is unique in that it sometimes performs a full flush for the arch_sync_dma_for_device(paddr, size, DMA_FROM_DEVICE) operation when the address is unaligned, but otherwise invalidates the caches. Since the _for_cpu() counterpart has to invalidate the cache already in order to avoid stale data from prefetching, this operation only really needs to ensure that there are no dirty cache lines, which can be done using either invalidation or cleaning the cache, but not necessarily both. Most architectures traditionally go for invalidation here, but as Will Deacon points out, this can leak old data to user space if a DMA is started but the device ends up not actually filling the entire buffer, see the link below. The same argument applies to DMA_BIDIRECTIONAL transfers. Using a cache-clean operation is the safe choice here, followed by invalidating the cache after the DMA to get rid of stale data that was prefetched before the completion of the DMA. Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Signed-off-by: Arnd Bergmann --- arch/powerpc/mm/dma-noncoherent.c | 21 + 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index e108cacf877f..00e59a4faa2b 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -104,26 +104,7 @@ static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* -* invalidate only when cache-line aligned otherwise there is -* the potential for discarding uncommitted data from the cache -*/ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); - break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - break; - } + __dma_phys_op(start, end, DMA_CACHE_CLEAN); } void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing
From: Arnd Bergmann The powerpc dma_sync_*_for_cpu() variants do more flushes than on other architectures. Reduce it to what everyone else does: - No flush is needed after data has been sent to a device - When data has been received from a device, the cache only needs to be invalidated to clear out cache lines that were speculatively prefetched. In particular, the second flushing of partial cache lines of bidirectional buffers is actively harmful -- if a single cache line is written by both the CPU and the device, flushing it again does not maintain coherency but instead overwrite the data that was just received from the device. Signed-off-by: Arnd Bergmann --- arch/powerpc/mm/dma-noncoherent.c | 18 -- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index f10869d27de5..e108cacf877f 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -132,21 +132,11 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, switch (direction) { case DMA_NONE: BUG(); - case DMA_FROM_DEVICE: - /* -* invalidate only when cache-line aligned otherwise there is -* the potential for discarding uncommitted data from the cache -*/ - if ((start | end) & (L1_CACHE_BYTES - 1)) - __dma_phys_op(start, end, DMA_CACHE_FLUSH); - else - __dma_phys_op(start, end, DMA_CACHE_INVAL); - break; - case DMA_TO_DEVICE: /* writeback only */ - __dma_phys_op(start, end, DMA_CACHE_CLEAN); + case DMA_TO_DEVICE: break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - __dma_phys_op(start, end, DMA_CACHE_FLUSH); + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + __dma_phys_op(start, end, DMA_CACHE_INVAL); break; } } -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 05/21] powerpc: dma-mapping: split out cache operation logic
From: Arnd Bergmann The powerpc arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave differently from all other architectures, at least for some of the operations. As a preparation for making the behavior more consistent, reorder the logic in which they decide whether to flush, invalidate or clean the. No change in behavior is intended. Signed-off-by: Arnd Bergmann --- arch/powerpc/mm/dma-noncoherent.c | 91 +-- 1 file changed, 63 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 30260b5d146d..f10869d27de5 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -16,31 +16,28 @@ #include #include +enum dma_cache_op { + DMA_CACHE_CLEAN, + DMA_CACHE_INVAL, + DMA_CACHE_FLUSH, +}; + /* * make an area consistent. */ -static void __dma_sync(void *vaddr, size_t size, int direction) +static void __dma_op(void *vaddr, size_t size, enum dma_cache_op op) { unsigned long start = (unsigned long)vaddr; unsigned long end = start + size; - switch (direction) { - case DMA_NONE: - BUG(); - case DMA_FROM_DEVICE: - /* -* invalidate only when cache-line aligned otherwise there is -* the potential for discarding uncommitted data from the cache -*/ - if ((start | end) & (L1_CACHE_BYTES - 1)) - flush_dcache_range(start, end); - else - invalidate_dcache_range(start, end); - break; - case DMA_TO_DEVICE: /* writeback only */ + switch (op) { + case DMA_CACHE_CLEAN: clean_dcache_range(start, end); break; - case DMA_BIDIRECTIONAL: /* writeback and invalidate */ + case DMA_CACHE_INVAL: + invalidate_dcache_range(start, end); + break; + case DMA_CACHE_FLUSH: flush_dcache_range(start, end); break; } @@ -48,16 +45,16 @@ static void __dma_sync(void *vaddr, size_t size, int direction) #ifdef CONFIG_HIGHMEM /* - * __dma_sync_page() implementation for systems using highmem. + * __dma_highmem_op() implementation for systems using highmem. * In this case, each page of a buffer must be kmapped/kunmapped - * in order to have a virtual address for __dma_sync(). This must + * in order to have a virtual address for __dma_op(). This must * not sleep so kmap_atomic()/kunmap_atomic() are used. * * Note: yes, it is possible and correct to have a buffer extend * beyond the first page. */ -static inline void __dma_sync_page_highmem(struct page *page, - unsigned long offset, size_t size, int direction) +static inline void __dma_highmem_op(struct page *page, + unsigned long offset, size_t size, enum dma_cache_op op) { size_t seg_size = min((size_t)(PAGE_SIZE - offset), size); size_t cur_size = seg_size; @@ -71,7 +68,7 @@ static inline void __dma_sync_page_highmem(struct page *page, start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset; /* Sync this buffer segment */ - __dma_sync((void *)start, seg_size, direction); + __dma_op((void *)start, seg_size, op); kunmap_atomic((void *)start); seg_nr++; @@ -88,32 +85,70 @@ static inline void __dma_sync_page_highmem(struct page *page, #endif /* CONFIG_HIGHMEM */ /* - * __dma_sync_page makes memory consistent. identical to __dma_sync, but - * takes a struct page instead of a virtual address + * __dma_phys_op makes memory consistent. identical to __dma_op, but + * takes a phys_addr_t instead of a virtual address */ -static void __dma_sync_page(phys_addr_t paddr, size_t size, int dir) +static void __dma_phys_op(phys_addr_t paddr, size_t size, enum dma_cache_op op) { struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); unsigned offset = paddr & ~PAGE_MASK; #ifdef CONFIG_HIGHMEM - __dma_sync_page_highmem(page, offset, size, dir); + __dma_highmem_op(page, offset, size, op); #else unsigned long start = (unsigned long)page_address(page) + offset; - __dma_sync((void *)start, size, dir); + __dma_op((void *)start, size, op); #endif } void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync_page(paddr, size, dir); + switch (direction) { + case DMA_NONE: + BUG(); + case DMA_FROM_DEVICE: + /* +* invalidate only when cache-line aligned otherwise there is +* the potential for discarding uncommitted data from the cache +*/ + if ((start | end) & (L1_CACHE_BYTES - 1)) +
[PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes
From: Arnd Bergmann The microblaze dma_sync_* implementation uses the same function for both _for_cpu() and _for_device(), which is inconsistent with other architectures and slightly more expensive. Split it up into separate functions and skip the parts that are not needed: - on dma_sync_*_for_cpu(..., DMA_TO_DEVICE), skip the second writeback, which does nothing. - on dma_sync_*_for_cpu(..., DMA_BIDIRECTIONAL), only invalidate the cache to clear out cache lines that got loaded speculatively, but skip the extraneous writeback. Signed-off-by: Arnd Bergmann --- arch/microblaze/kernel/dma.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index 04d091ade417..b4c4e45fd45e 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -14,8 +14,8 @@ #include #include -static void __dma_sync(phys_addr_t paddr, size_t size, - enum dma_data_direction direction) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, + enum dma_data_direction dir) { switch (direction) { case DMA_TO_DEVICE: @@ -30,14 +30,16 @@ static void __dma_sync(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - __dma_sync(paddr, size, dir); -} - void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { - __dma_sync(paddr, size, dir); -} + switch (direction) { + case DMA_TO_DEVICE: + break; + case DMA_BIDIRECTIONAL: + case DMA_FROM_DEVICE: + invalidate_dcache_range(paddr, paddr + size); + break; + default: + BUG(); + }} -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 03/21] sparc32: flush caches in dma_sync_*for_device
From: Arnd Bergmann Leon has a very minimalistic cache that has no range operations and requires being flushed entirely to deal with noncoherent DMA. Most in-order architectures do their cache management in the dma_sync_*for_device() operations rather than dma_sync_*for_cpu. Since the cache is write-through only, both should have the same effect, so change it for consistency with the other architectures. Signed-off-by: Arnd Bergmann --- arch/sparc/Kconfig | 2 +- arch/sparc/kernel/ioport.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 84437a4c6545..637da50e236c 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -51,7 +51,7 @@ config SPARC config SPARC32 def_bool !64BIT select ARCH_32BIT_OFF_T - select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SYNC_DMA_FOR_DEVICE select CLZ_TAB select DMA_DIRECT_REMAP select GENERIC_ATOMIC64 diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 4e4f3d3263e4..4f3d26066ec2 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -306,7 +306,7 @@ arch_initcall(sparc_register_ioport); * On LEON systems without cache snooping, the entire D-CACHE must be flushed to * make DMA to cacheable memory coherent. */ -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { if (dir != DMA_TO_DEVICE && -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules
From: Arnd Bergmann xtensa is one of the platforms that has both write-back and write-through caches, and needs to account for both in its DMA mapping operations. It does this through a set of operations that is different from any architecture. This is not a problem by itself, but it makes it rather hard to figure out whether this is correct or not, and to unify this implementation with the others. Change the semantics to the usual ones for non-speculating CPUs: - On DMA_TO_DEVICE, call __flush_dcache_range() to perform the writeback even on writethrough caches, where this is a nop. - On DMA_FROM_DEVICE, invalidate the mapping before the DMA rather than afterwards. - On DMA_BIDIRECTIONAL, combine the pre-writeback with the post-invalidate into a call to __flush_invalidate_dcache_range() that turns into a simple invalidate on writeback caches. Signed-off-by: Arnd Bergmann --- arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/cacheflush.h | 6 +++--- arch/xtensa/kernel/pci-dma.c | 29 +--- 3 files changed, 8 insertions(+), 28 deletions(-) diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index bcb0c5d2abc2..b938bacbb9af 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -8,7 +8,6 @@ config XTENSA select ARCH_HAS_DMA_PREP_COHERENT if MMU select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV - select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU select ARCH_HAS_DMA_SET_UNCACHED if MMU select ARCH_HAS_STRNCPY_FROM_USER if !KASAN diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 7b4359312c25..2f645d25565a 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -61,9 +61,9 @@ static inline void __flush_dcache_page(unsigned long va) static inline void __flush_dcache_range(unsigned long va, unsigned long sz) { } -# define __flush_invalidate_dcache_all() __invalidate_dcache_all() -# define __flush_invalidate_dcache_page(p) __invalidate_dcache_page(p) -# define __flush_invalidate_dcache_range(p,s) __invalidate_dcache_range(p,s) +# define __flush_invalidate_dcache_all __invalidate_dcache_all +# define __flush_invalidate_dcache_page__invalidate_dcache_page +# define __flush_invalidate_dcache_range __invalidate_dcache_range #endif #if defined(CONFIG_MMU) && (DCACHE_WAY_SIZE > PAGE_SIZE) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 94955caa4488..ff3bf015eca4 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -43,38 +43,19 @@ static void do_cache_op(phys_addr_t paddr, size_t size, } } -void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) { switch (dir) { - case DMA_BIDIRECTIONAL: + case DMA_TO_DEVICE: + do_cache_op(paddr, size, __flush_dcache_range); + break; case DMA_FROM_DEVICE: do_cache_op(paddr, size, __invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - - default: - break; - } -} - -void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, - enum dma_data_direction dir) -{ - switch (dir) { case DMA_BIDIRECTIONAL: - case DMA_TO_DEVICE: - if (XCHAL_DCACHE_IS_WRITEBACK) - do_cache_op(paddr, size, __flush_dcache_range); + do_cache_op(paddr, size, __flush_invalidate_dcache_range); break; - - case DMA_NONE: - BUG(); - break; - default: break; } -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings
From: Arnd Bergmann The cache management operations on DMA are different from the other architectures: - on DMA_TO_DEVICE, Openrisc currently invalidates the cache after the writeback, where a simple writeback without invalidation should be sufficient. - on DMA_BIDIRECTIONAL, Openrisc does nothing, while most architectures either flush before DMA, or writeback before and invalidate after DMA. The separate invalidation for DMA_BIDIRECTIONAL/DMA_FROM_DEVICE is only required on CPUs that can do speculative prefetches. Change both to have the normal set of operations. Signed-off-by: Arnd Bergmann --- arch/openrisc/kernel/dma.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b3edbb33b621..91a00d09ffad 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -103,10 +103,10 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, switch (dir) { case DMA_TO_DEVICE: - /* Flush the dcache for the requested range */ + /* Write back the dcache for the requested range */ for (cl = addr; cl < addr + size; cl += cpuinfo->dcache_block_size) - mtspr(SPR_DCBFR, cl); + mtspr(SPR_DCBWR, cl); break; case DMA_FROM_DEVICE: /* Invalidate the dcache for the requested range */ @@ -114,12 +114,13 @@ void arch_sync_dma_for_device(phys_addr_t addr, size_t size, cl += cpuinfo->dcache_block_size) mtspr(SPR_DCBIR, cl); break; + case DMA_BIDIRECTIONAL: + /* Flush the dcache for the requested range */ + for (cl = addr; cl < addr + size; +cl += cpuinfo->dcache_block_size) + mtspr(SPR_DCBFR, cl); + break; default: - /* -* NOTE: If dir == DMA_BIDIRECTIONAL then there's no need to -* flush nor invalidate the cache here as the area will need -* to be manually synced anyway. -*/ break; } } -- 2.39.2 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH 00/21] dma-mapping: unify support for cache flushes
From: Arnd Bergmann After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the same thing. As we had determined earlier, the behavior is wildly different across architectures, but most of the differences come down to either bugs (when required flushes are missing) or extra flushes that are harmless but might hurt performance. I finally found the time to come up with an implementation of this, which starts by replacing every outlier with one of the three common options: 1. architectures without speculative prefetching (hegagon, m68k, openrisc, sh, sparc, and certain armv4 and xtensa implementations) only flush their caches before a DMA, by cleaning write-back caches (if any) before a DMA to the device, and by invalidating the caches before a DMA from a device 2. arc, microblaze, mips, nios2, sh and later xtensa now follow the normal 32-bit arm model and invalidate their writeback caches again after a DMA from the device, to remove stale cache lines that got prefetched during the DMA. arc, csky and mips used to invalidate buffers also before the bidirectional DMA, but this is now skipped whenever we know it gets invalidated again after the DMA. 3. parisc, powerpc and riscv already flushed buffers before a DMA_FROM_DEVICE, and these get moved to the arm64 behavior that does the writeback before and invalidate after both DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the problem of accidentally leaking stale data if the DMA does not actually happen[2]. The last patch in the series replaces the architecture specific code with a shared version that implements all three based on architecture specific parameters that are almost always determined at compile time. The difference between cases 1. and 2. is hardware specific, while between 2. and 3. we need to decide which semantics we want, but I explicitly avoid this question in my series and leave it to be decided later. Another difference that I do not address here is what cache invalidation does for partical cache lines. On arm32, arm64 and powerpc, a partial cache line always gets written back before invalidation in order to ensure that data before or after the buffer is not discarded. On all other architectures, the assumption is cache lines are never shared between DMA buffer and data that is accessed by the CPU. If we end up always writing back dirty cache lines before a DMA (option 3 above), then this point becomes moot, otherwise we should probably address this in a follow-up series to document one behavior or the other and implement it consistently. Please review! Arnd [1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad...@bp.renesas.com/ [2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ Arnd Bergmann (21): openrisc: dma-mapping: flush bidirectional mappings xtensa: dma-mapping: use normal cache invalidation rules sparc32: flush caches in dma_sync_*for_device microblaze: dma-mapping: skip extra DMA flushes powerpc: dma-mapping: split out cache operation logic powerpc: dma-mapping: minimize for_cpu flushing powerpc: dma-mapping: always clean cache in _for_device() op riscv: dma-mapping: only invalidate after DMA, not flush riscv: dma-mapping: skip invalidation before bidirectional DMA csky: dma-mapping: skip invalidating before DMA from device mips: dma-mapping: skip invalidating before bidirectional DMA mips: dma-mapping: split out cache operation logic arc: dma-mapping: skip invalidating before bidirectional DMA parisc: dma-mapping: use regular flush/invalidate ops ARM: dma-mapping: always invalidate WT caches before DMA ARM: dma-mapping: bring back dmac_{clean,inv}_range ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally ARM: drop SMP support for ARM11MPCore ARM: dma-mapping: use generic form of arch_sync_dma_* helpers ARM: dma-mapping: split out arch_dma_mark_clean() helper dma-mapping: replace custom code with generic implementation arch/arc/mm/dma.c | 66 ++-- arch/arm/Kconfig | 4 + arch/arm/include/asm/cacheflush.h | 21 +++ arch/arm/include/asm/glue-cache.h | 4 + arch/arm/mach-oxnas/Kconfig| 4 - arch/arm/mach-oxnas/Makefile | 1 - arch/arm/mach-oxnas/headsmp.S | 23 --- arch/arm/mach-oxnas/platsmp.c | 96 --- arch/arm/mach-versatile/platsmp-realview.c | 4 - arch/arm/mm/Kconfig| 19 --- arch/arm/mm/cache-fa.S | 4 +- arch/arm/mm/cache-nop.S| 6 + arch/arm/mm/cache-v4.S | 13 +-