Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send
Dear Ian, On Sun, 2014-04-27 at 19:47 +0100, Ian Campbell wrote: This is the driver for one particular ARM cache controller and not the one used for the SoC. In any case it does proper start/end handling only for cache flush operations, not cache invalidate. Cache invalidate is a potentially destructive operation (throwing away data in the caches), having it operate on anything more than the precise region requested would be very surprising to almost anyone I think. ... I think you are missing the important differences between a cache flush and a cache invalidate. IMHO cache invalidation and flush operations are sort of antipodes. With invalidation you discard all data in corresponding line in cache and replace it with freshly read data from memory. With flush you move cache line to corresponding memory location overriding previously existing values in memory. So if you deal with 2 independent data fields which both share the same one cache line it's potentially dangerous to do both flush and invalidate of this cache line. In case of MMU utilization we have a luxury of uncached access, so we may safely access control structures in memory with granularity which is available for this particular CPU. This is AFAIK drivers deal with buffer descriptors in Linux kernel. In case of U-Boot where we prefer to keep things simple we don't use MMU. So no generic way for cache bypassing. Still some architectures like ARC700 have special instructions for accessing memory bypassing cache but I prever to not use them and keep sources platform-independent. And in this situation IMHO the only safe solution could be in proper design of data layout. In other words we need to keep independent data blocks aligned to cache line. And as you may see from designware.h buffer descriptor structure is aligned: == struct dmamacdescr { u32 txrx_status; u32 dmamac_cntl; void *dmamac_addr; struct dmamacdescr *dmamac_next; } __aligned(ARCH_DMA_MINALIGN); == Regards, Alexey ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send
On Mon, 2014-04-28 at 12:05 +, Alexey Brodkin wrote: And in this situation IMHO the only safe solution could be in proper design of data layout. In other words we need to keep independent data blocks aligned to cache line. And as you may see from designware.h buffer descriptor structure is aligned: There's no point in taking all this care if you then go and flush subfields, as the driver does, since they are not necessarily going to have the required alignment. That was the entire point of this patch! I'm going to do the roundup thing you asked for, even though it seems like a pointless optimisation to me given the context. == struct dmamacdescr { u32 txrx_status; u32 dmamac_cntl; void *dmamac_addr; struct dmamacdescr *dmamac_next; } __aligned(ARCH_DMA_MINALIGN); == Regards, Alexey ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send
CCing the ARM custodian. Albert, what do you think of Alexey's comments below? Actually, having read it properly myself I think Alexey is confusing cache flushing with cache invalidation, I've left the CC in place though in case you have any thoughts on the matter. On Fri, 2014-04-25 at 08:48 +, Alexey Brodkin wrote: I thought a bit more about this situation and now I'm not that sure if we need to align addresses we pass to cache invalidate/flush functions. Because IMHO drivers shouldn't care about specifics of particular platform or architecture. Otherwise we'll need to patch each and every driver only for cache invalidate/flush functions. I looked how these functions are used in other drivers and see that in most of cases no additional alignment precautions were implemented. People just pass start and end addresses. In its turn platform and architecture provides cache invalidate/flush functions implement its functionality depending on hardware specifics. For example on architectures that may only flush/invalidate with granularity of 1 cache line cache invalidate/flush functions make sure to start processing from the start of the cache line to which start address falls and end processing when cache line where end address falls is processed. I may assume that there're architectures that automatically understand from which cache line to start and at which line to stop processing. But if your architecture requires cache line aligned addresses to be used for start/end addresses you may look for examples in ARC (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/arc/cpu/arc700/cache.c),, MIPS (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/mips/cpu/mips32/cpu.c), SH (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/sh/cpu/sh4/cache.c), and what's interesting even implementation you use have semi-proper start/end addresses handling - http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/lib/cache-pl310.c This is the driver for one particular ARM cache controller and not the one used for the SoC. In any case it does proper start/end handling only for cache flush operations, not cache invalidate. Cache invalidate is a potentially destructive operation (throwing away data in the caches), having it operate on anything more than the precise region requested would be very surprising to almost anyone I think. Here's your invalidation procedure: /* invalidate memory from start to stop-1 */ void v7_outer_cache_inval_range(u32 start, u32 stop) { /* PL310 currently supports only 32 bytes cache line */ u32 pa, line_size = 32; /* * If start address is not aligned to cache-line do not * invalidate the first cache-line */ if (start (line_size - 1)) { printf(ERROR: %s - start address is not aligned - 0x%08x\n, __func__, start); /* move to next cache line */ start = (start + line_size - 1) ~(line_size - 1); } /* * If stop address is not aligned to cache-line do not * invalidate the last cache-line */ if (stop (line_size - 1)) { printf(ERROR: %s - stop address is not aligned - 0x%08x\n, __func__, stop); /* align to the beginning of this cache line */ stop = ~(line_size - 1); } for (pa = start; pa stop; pa = pa + line_size) writel(pa, pl310-pl310_inv_line_pa); pl310_cache_sync(); } 1. I don't understand why start from the next cache line if start address is not aligned to cache line boundary? I'd say that you want to invalidate cache line that contains unaligned start address. Otherwise first bytes won't be invalidated, right? 2. Why do we throw _error_ message. I may understand if you emit _warning_ message in case of debug build (with DEBUG defined). Well in current implementation (see 1) it could be error because behavior is really dangerous. But if you start from correct cache line only warning in debug mode makes sense (IMHO). 3. Stop/end address in contrast might need to be extended depending on HW implementation (see above comment). And here's your flush procedure: === void v7_outer_cache_flush_range(u32 start, u32 stop) { /* PL310 currently supports only 32 bytes cache line */ u32 pa, line_size = 32; /* * Align to the beginning of cache-line - this ensures that * the first 5 bits are 0 as required by PL310 TRM */ start = ~(line_size - 1); for (pa = start; pa stop; pa = pa + line_size) writel(pa, pl310-pl310_clean_inv_line_pa); pl310_cache_sync(); } === Which looks very correct to me. I'm wondering if there was a reason to have so different implementation of functions that do very similar things. I think
Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send
Hi Ian, On Thu, 2014-04-24 at 20:14 +0100, Ian Campbell wrote: On Thu, 2014-04-24 at 17:41 +, Alexey Brodkin wrote: 1. Don't invalidate sizeof(struct dmamacdescr) but only roundup(sizeof(desc_p-txrx_status), ARCH_DMA_MINALIGN)). OK. (Although given the realities of the real world values of ARCH_DMA_MINALIGN on every arch and the sizes of the structs fields involved this isn't actually buying you anything at all) Well this particular structure is of size sizeof(uint32_t) * 4 = 16 bytes. And I may suppose that cache lines could be shorter than 16 bytes even though it could be pretty rare situation. So definitely not a big deal. But since we're dealing with macros here all mentioned calculations will be done by pre-processor and execution performance won't be affected. 2. In the following lines implements rounding as well: Will fix as well. 3. Check carefully if there're other instances of probably unaligned cache operations. I thought a bit more about this situation and now I'm not that sure if we need to align addresses we pass to cache invalidate/flush functions. Because IMHO drivers shouldn't care about specifics of particular platform or architecture. Otherwise we'll need to patch each and every driver only for cache invalidate/flush functions. I looked how these functions are used in other drivers and see that in most of cases no additional alignment precautions were implemented. People just pass start and end addresses. In its turn platform and architecture provides cache invalidate/flush functions implement its functionality depending on hardware specifics. For example on architectures that may only flush/invalidate with granularity of 1 cache line cache invalidate/flush functions make sure to start processing from the start of the cache line to which start address falls and end processing when cache line where end address falls is processed. I may assume that there're architectures that automatically understand from which cache line to start and at which line to stop processing. But if your architecture requires cache line aligned addresses to be used for start/end addresses you may look for examples in ARC (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/arc/cpu/arc700/cache.c),, MIPS (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/mips/cpu/mips32/cpu.c), SH (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/sh/cpu/sh4/cache.c), and what's interesting even implementation you use have semi-proper start/end addresses handling - http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/lib/cache-pl310.c Here's your invalidation procedure: /* invalidate memory from start to stop-1 */ void v7_outer_cache_inval_range(u32 start, u32 stop) { /* PL310 currently supports only 32 bytes cache line */ u32 pa, line_size = 32; /* * If start address is not aligned to cache-line do not * invalidate the first cache-line */ if (start (line_size - 1)) { printf(ERROR: %s - start address is not aligned - 0x%08x\n, __func__, start); /* move to next cache line */ start = (start + line_size - 1) ~(line_size - 1); } /* * If stop address is not aligned to cache-line do not * invalidate the last cache-line */ if (stop (line_size - 1)) { printf(ERROR: %s - stop address is not aligned - 0x%08x\n, __func__, stop); /* align to the beginning of this cache line */ stop = ~(line_size - 1); } for (pa = start; pa stop; pa = pa + line_size) writel(pa, pl310-pl310_inv_line_pa); pl310_cache_sync(); } 1. I don't understand why start from the next cache line if start address is not aligned to cache line boundary? I'd say that you want to invalidate cache line that contains unaligned start address. Otherwise first bytes won't be invalidated, right? 2. Why do we throw _error_ message. I may understand if you emit _warning_ message in case of debug build (with DEBUG defined). Well in current implementation (see 1) it could be error because behavior is really dangerous. But if you start from correct cache line only warning in debug mode makes sense (IMHO). 3. Stop/end address in contrast might need to be extended depending on HW implementation (see above comment). And here's your flush procedure: === void v7_outer_cache_flush_range(u32 start, u32 stop) { /* PL310 currently supports only 32 bytes cache line */ u32 pa, line_size = 32; /* * Align to the beginning of cache-line - this ensures that * the first 5 bits are 0 as required by PL310 TRM */ start = ~(line_size - 1); for (pa = start; pa stop; pa = pa + line_size) writel(pa, pl310-pl310_clean_inv_line_pa);
Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send
Dear Ian, On Sat, 2014-04-19 at 14:52 +0100, Ian Campbell wrote: - /* Invalidate only status field for the following check */ - invalidate_dcache_range((unsigned long)desc_p-txrx_status, - (unsigned long)desc_p-txrx_status + - sizeof(desc_p-txrx_status)); + /* Strictly we only need to invalidate the status field for + * the following check, but on some platforms we cannot + * invalidate only 4 bytes, so invalidate the the whole thing + * which is known to be DMA aligned. */ + invalidate_dcache_range((unsigned long)desc_p, + (unsigned long)desc_p + + sizeof(struct dmamacdescr)); /* Check if the descriptor is owned by CPU */ if (desc_p-txrx_status DESC_TXSTS_OWNBYDMA) { Unfortunately I cannot recall exactly why I wanted to invalidate only status field. Now looking at this code I may assume that I wanted to save some CPU cycles. Because: 1. We don't care about all other fields except status. GMAC only changes status field when it resets OWNED_BY_DMA flag and all other fields CPU writes but not reads while sending packets. 2. We may save quite a few CPU cycles if only invalidating minimum amount of bytes (remember each read from external memory may cost 100s of cycles). So I would advise: 1. Don't invalidate sizeof(struct dmamacdescr) but only roundup(sizeof(desc_p-txrx_status), ARCH_DMA_MINALIGN)). 2. In the following lines implements rounding as well: /* Flush data to be sent */ flush_dcache_range((unsigned long)desc_p-dmamac_addr, (unsigned long)desc_p-dmamac_addr + length); We may be sure desc_p-dmamac_addr is properly aligned, but length could be not-aligned. So I'd replace length with roundup(length, ARCH_DMA_MINALIGN) as you did in 3rd patch. 3. Check carefully if there're other instances of probably unaligned cache operations. I erroneously didn't care about alignment on cache invalidation/flushing because my implementation of those cache operations deals with non-aligned start/end internally within invalidate/flush functions - which might be not that good even if it's convenient for me. 4. Why don't you squeeze all 3 patches in 1 and name it like fix alignment issues with caches on some platforms? Basically with all 3 patches you fix one and only issue and application of any one of those 3 patches doesn't solve your problem, right? Regards, Alexey ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send
On Thu, 2014-04-24 at 17:41 +, Alexey Brodkin wrote: 1. Don't invalidate sizeof(struct dmamacdescr) but only roundup(sizeof(desc_p-txrx_status), ARCH_DMA_MINALIGN)). OK. (Although given the realities of the real world values of ARCH_DMA_MINALIGN on every arch and the sizes of the structs fields involved this isn't actually buying you anything at all) 2. In the following lines implements rounding as well: Will fix as well. 3. Check carefully if there're other instances of probably unaligned cache operations. I'm not seeing any others, in practice or by eye-balling the code. 4. Why don't you squeeze all 3 patches in 1 and name it like fix alignment issues with caches on some platforms? Basically with all 3 patches you fix one and only issue and application of any one of those 3 patches doesn't solve your problem, right? These are the issues as I discovered them one by one. I can fold them if you like but doing them separately will aid bisection if one of them turns out to be wrong in some way. As you prefer. Ian. ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send
1. Don't invalidate sizeof(struct dmamacdescr) but only roundup(sizeof(desc_p-txrx_status), ARCH_DMA_MINALIGN)). I'm not sure I like this: if ARCH_DMA_MINALIGN is too large and ends up invalidating more than the struct, it could be an error, so it's safer to ask it to invalidate the struct (which we know can be safely invalidates). If invalidate_dcache_range is used often, then I'd suggest to change its API so it receives 2 bounds: the one that has to be invalidated and the surrounding one that can safely be invalidated. Stefan ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
[U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send
Some platforms cannot invalidate the cache at four byte intervals, so invalidate the entire descriptor. Signed-off-by: Ian Campbell i...@hellion.org.uk --- drivers/net/designware.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/designware.c b/drivers/net/designware.c index 1120f70..7d14cec 100644 --- a/drivers/net/designware.c +++ b/drivers/net/designware.c @@ -280,10 +280,13 @@ static int dw_eth_send(struct eth_device *dev, void *packet, int length) u32 desc_num = priv-tx_currdescnum; struct dmamacdescr *desc_p = priv-tx_mac_descrtable[desc_num]; - /* Invalidate only status field for the following check */ - invalidate_dcache_range((unsigned long)desc_p-txrx_status, - (unsigned long)desc_p-txrx_status + - sizeof(desc_p-txrx_status)); + /* Strictly we only need to invalidate the status field for +* the following check, but on some platforms we cannot +* invalidate only 4 bytes, so invalidate the the whole thing +* which is known to be DMA aligned. */ + invalidate_dcache_range((unsigned long)desc_p, + (unsigned long)desc_p + + sizeof(struct dmamacdescr)); /* Check if the descriptor is owned by CPU */ if (desc_p-txrx_status DESC_TXSTS_OWNBYDMA) { -- 1.9.0 ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot