Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-28 Thread Alexey Brodkin
Dear Ian,

On Sun, 2014-04-27 at 19:47 +0100, Ian Campbell wrote:

 This is the driver for one particular ARM cache controller and not the
 one used for the SoC. In any case it does proper start/end handling
 only for cache flush operations, not cache invalidate.
 
 Cache invalidate is a potentially destructive operation (throwing away
 data in the caches), having it operate on anything more than the precise
 region requested would be very surprising to almost anyone I think.
...
 I think you are missing the important differences between a cache 
 flush and a cache invalidate.

IMHO cache invalidation and flush operations are sort of antipodes.

With invalidation you discard all data in corresponding line in cache
and replace it with freshly read data from memory.

With flush you move cache line to corresponding memory location
overriding previously existing values in memory.

So if you deal with 2 independent data fields which both share the same
one cache line it's potentially dangerous to do both flush and
invalidate of this cache line.

In case of MMU utilization we have a luxury of uncached access, so we
may safely access control structures in memory with granularity which is
available for this particular CPU. This is AFAIK drivers deal with
buffer descriptors in Linux kernel.

In case of U-Boot where we prefer to keep things simple we don't use
MMU. So no generic way for cache bypassing. Still some architectures
like ARC700 have special instructions for accessing memory bypassing
cache but I prever to not use them and keep sources
platform-independent.

And in this situation IMHO the only safe solution could be in proper
design of data layout. In other words we need to keep independent data
blocks aligned to cache line.

And as you may see from designware.h buffer descriptor structure is
aligned:
==
struct dmamacdescr {
u32 txrx_status;
u32 dmamac_cntl;
void *dmamac_addr;
struct dmamacdescr *dmamac_next;
} __aligned(ARCH_DMA_MINALIGN);
==

Regards,
Alexey


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-28 Thread Ian Campbell
On Mon, 2014-04-28 at 12:05 +, Alexey Brodkin wrote:

 And in this situation IMHO the only safe solution could be in proper
 design of data layout. In other words we need to keep independent data
 blocks aligned to cache line.
 
 And as you may see from designware.h buffer descriptor structure is
 aligned:

There's no point in taking all this care if you then go and flush
subfields, as the driver does, since they are not necessarily going to
have the required alignment. That was the entire point of this patch!

I'm going to do the roundup thing you asked for, even though it seems
like a pointless optimisation to me given the context.

 ==
 struct dmamacdescr {
   u32 txrx_status;
   u32 dmamac_cntl;
   void *dmamac_addr;
   struct dmamacdescr *dmamac_next;
 } __aligned(ARCH_DMA_MINALIGN);
 ==
 
 Regards,
 Alexey
 
 
 


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-27 Thread Ian Campbell
CCing the ARM custodian. Albert, what do you think of Alexey's comments
below? Actually, having read it properly myself I think Alexey is
confusing cache flushing with cache invalidation, I've left the CC in
place though in case you have any thoughts on the matter.

On Fri, 2014-04-25 at 08:48 +, Alexey Brodkin wrote:
 I thought a bit more about this situation and now I'm not that sure if
 we need to align addresses we pass to cache invalidate/flush functions.
 
 Because IMHO drivers shouldn't care about specifics of particular
 platform or architecture. Otherwise we'll need to patch each and every
 driver only for cache invalidate/flush functions. I looked how these
 functions are used in other drivers and see that in most of cases no
 additional alignment precautions were implemented. People just pass
 start and end addresses.
 
 In its turn platform and architecture provides cache invalidate/flush
 functions implement its functionality depending on hardware specifics.
 
 For example on architectures that may only flush/invalidate with
 granularity of 1 cache line cache invalidate/flush functions make sure
 to start processing from the start of the cache line to which start
 address falls and end processing when cache line where end address falls
 is processed.
 
 I may assume that there're architectures that automatically understand
 from which cache line to start and at which line to stop processing.
 
 But if your architecture requires cache line aligned addresses to be
 used for start/end addresses you may look for examples in ARC
 (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/arc/cpu/arc700/cache.c),,
  MIPS 
 (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/mips/cpu/mips32/cpu.c),
  SH 
 (http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/sh/cpu/sh4/cache.c),
 
 and what's interesting even implementation you use have semi-proper
 start/end addresses handling -
 http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/lib/cache-pl310.c

This is the driver for one particular ARM cache controller and not the
one used for the SoC. In any case it does proper start/end handling
only for cache flush operations, not cache invalidate.

Cache invalidate is a potentially destructive operation (throwing away
data in the caches), having it operate on anything more than the precise
region requested would be very surprising to almost anyone I think.

 
 Here's your invalidation procedure:
 
 /* invalidate memory from start to stop-1 */
 void v7_outer_cache_inval_range(u32 start, u32 stop)
 {
   /* PL310 currently supports only 32 bytes cache line */
   u32 pa, line_size = 32;
 
   /*
* If start address is not aligned to cache-line do not
* invalidate the first cache-line
*/
   if (start  (line_size - 1)) {
   printf(ERROR: %s - start address is not aligned - 0x%08x\n,
   __func__, start);
   /* move to next cache line */
   start = (start + line_size - 1)  ~(line_size - 1);
   }
 
   /*
* If stop address is not aligned to cache-line do not
* invalidate the last cache-line
*/
   if (stop  (line_size - 1)) {
   printf(ERROR: %s - stop address is not aligned - 0x%08x\n,
   __func__, stop);
   /* align to the beginning of this cache line */
   stop = ~(line_size - 1);
   }
 
   for (pa = start; pa  stop; pa = pa + line_size)
   writel(pa, pl310-pl310_inv_line_pa);
 
   pl310_cache_sync();
 }
 
 
 1. I don't understand why start from the next cache line if start
 address is not aligned to cache line boundary? I'd say that you want to
 invalidate cache line that contains unaligned start address. Otherwise
 first bytes won't be invalidated, right?
 
 2. Why do we throw _error_ message. I may understand if you emit
 _warning_ message in case of debug build (with DEBUG defined). Well in
 current implementation (see 1) it could be error because behavior is
 really dangerous. But if you start from correct cache line only warning
 in debug mode makes sense (IMHO).
 
 3. Stop/end address in contrast might need to be extended depending on
 HW implementation (see above comment).
 
 And here's your flush procedure:
 ===
 void v7_outer_cache_flush_range(u32 start, u32 stop)
 {
   /* PL310 currently supports only 32 bytes cache line */
   u32 pa, line_size = 32;
 
   /*
* Align to the beginning of cache-line - this ensures that
* the first 5 bits are 0 as required by PL310 TRM
*/
   start = ~(line_size - 1);
 
   for (pa = start; pa  stop; pa = pa + line_size)
   writel(pa, pl310-pl310_clean_inv_line_pa);
 
   pl310_cache_sync();
 }
 ===
 
 Which looks very correct to me. I'm wondering if there was a reason to
 have so different implementation of functions that do very similar
 things.

I think 

Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-25 Thread Alexey Brodkin
Hi Ian,

On Thu, 2014-04-24 at 20:14 +0100, Ian Campbell wrote:
 On Thu, 2014-04-24 at 17:41 +, Alexey Brodkin wrote:
 
  1. Don't invalidate sizeof(struct dmamacdescr) but only
  roundup(sizeof(desc_p-txrx_status), ARCH_DMA_MINALIGN)).
 
 OK. (Although given the realities of the real world values of
 ARCH_DMA_MINALIGN on every arch and the sizes of the structs  fields
 involved this isn't actually buying you anything at all)

Well this particular structure is of size sizeof(uint32_t) * 4 = 16
bytes. And I may suppose that cache lines could be shorter than 16 bytes
even though it could be pretty rare situation. So definitely not a big
deal.

But since we're dealing with macros here all mentioned calculations will
be done by pre-processor and execution performance won't be affected.

  2. In the following lines implements rounding as well:
 
 Will fix as well.
 
  3. Check carefully if there're other instances of probably unaligned
  cache operations.

I thought a bit more about this situation and now I'm not that sure if
we need to align addresses we pass to cache invalidate/flush functions.

Because IMHO drivers shouldn't care about specifics of particular
platform or architecture. Otherwise we'll need to patch each and every
driver only for cache invalidate/flush functions. I looked how these
functions are used in other drivers and see that in most of cases no
additional alignment precautions were implemented. People just pass
start and end addresses.

In its turn platform and architecture provides cache invalidate/flush
functions implement its functionality depending on hardware specifics.

For example on architectures that may only flush/invalidate with
granularity of 1 cache line cache invalidate/flush functions make sure
to start processing from the start of the cache line to which start
address falls and end processing when cache line where end address falls
is processed.

I may assume that there're architectures that automatically understand
from which cache line to start and at which line to stop processing.

But if your architecture requires cache line aligned addresses to be
used for start/end addresses you may look for examples in ARC
(http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/arc/cpu/arc700/cache.c),,
 MIPS 
(http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/mips/cpu/mips32/cpu.c),
 SH 
(http://git.denx.de/?p=u-boot/u-boot-arc.git;a=blob;f=arch/sh/cpu/sh4/cache.c),

and what's interesting even implementation you use have semi-proper
start/end addresses handling -
http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/lib/cache-pl310.c

Here's your invalidation procedure:

/* invalidate memory from start to stop-1 */
void v7_outer_cache_inval_range(u32 start, u32 stop)
{
/* PL310 currently supports only 32 bytes cache line */
u32 pa, line_size = 32;

/*
 * If start address is not aligned to cache-line do not
 * invalidate the first cache-line
 */
if (start  (line_size - 1)) {
printf(ERROR: %s - start address is not aligned - 0x%08x\n,
__func__, start);
/* move to next cache line */
start = (start + line_size - 1)  ~(line_size - 1);
}

/*
 * If stop address is not aligned to cache-line do not
 * invalidate the last cache-line
 */
if (stop  (line_size - 1)) {
printf(ERROR: %s - stop address is not aligned - 0x%08x\n,
__func__, stop);
/* align to the beginning of this cache line */
stop = ~(line_size - 1);
}

for (pa = start; pa  stop; pa = pa + line_size)
writel(pa, pl310-pl310_inv_line_pa);

pl310_cache_sync();
}


1. I don't understand why start from the next cache line if start
address is not aligned to cache line boundary? I'd say that you want to
invalidate cache line that contains unaligned start address. Otherwise
first bytes won't be invalidated, right?

2. Why do we throw _error_ message. I may understand if you emit
_warning_ message in case of debug build (with DEBUG defined). Well in
current implementation (see 1) it could be error because behavior is
really dangerous. But if you start from correct cache line only warning
in debug mode makes sense (IMHO).

3. Stop/end address in contrast might need to be extended depending on
HW implementation (see above comment).

And here's your flush procedure:
===
void v7_outer_cache_flush_range(u32 start, u32 stop)
{
/* PL310 currently supports only 32 bytes cache line */
u32 pa, line_size = 32;

/*
 * Align to the beginning of cache-line - this ensures that
 * the first 5 bits are 0 as required by PL310 TRM
 */
start = ~(line_size - 1);

for (pa = start; pa  stop; pa = pa + line_size)
writel(pa, pl310-pl310_clean_inv_line_pa);

 

Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-24 Thread Alexey Brodkin
Dear Ian,

On Sat, 2014-04-19 at 14:52 +0100, Ian Campbell wrote:
 - /* Invalidate only status field for the following check */
 - invalidate_dcache_range((unsigned long)desc_p-txrx_status,
 - (unsigned long)desc_p-txrx_status +
 - sizeof(desc_p-txrx_status));
 + /* Strictly we only need to invalidate the status field for
 +  * the following check, but on some platforms we cannot
 +  * invalidate only 4 bytes, so invalidate the the whole thing
 +  * which is known to be DMA aligned. */
 + invalidate_dcache_range((unsigned long)desc_p,
 + (unsigned long)desc_p +
 + sizeof(struct dmamacdescr));
  
   /* Check if the descriptor is owned by CPU */
   if (desc_p-txrx_status  DESC_TXSTS_OWNBYDMA) {

Unfortunately I cannot recall exactly why I wanted to invalidate only
status field.

Now looking at this code I may assume that I wanted to save some CPU
cycles. Because:

1. We don't care about all other fields except status. GMAC only
changes status field when it resets OWNED_BY_DMA flag and all other
fields CPU writes but not reads while sending packets.

2. We may save quite a few CPU cycles if only invalidating minimum
amount of bytes (remember each read from external memory may cost 100s
of cycles).

So I would advise:

1. Don't invalidate sizeof(struct dmamacdescr) but only
roundup(sizeof(desc_p-txrx_status), ARCH_DMA_MINALIGN)).

2. In the following lines implements rounding as well:

/* Flush data to be sent */
flush_dcache_range((unsigned long)desc_p-dmamac_addr,
   (unsigned long)desc_p-dmamac_addr + length);


We may be sure desc_p-dmamac_addr is properly aligned, but length
could be not-aligned. So I'd replace length with roundup(length,
ARCH_DMA_MINALIGN) as you did in 3rd patch.

3. Check carefully if there're other instances of probably unaligned
cache operations. I erroneously didn't care about alignment on cache
invalidation/flushing because my implementation of those cache
operations deals with non-aligned start/end internally within
invalidate/flush functions - which might be not that good even if it's
convenient for me.

4. Why don't you squeeze all 3 patches in 1 and name it like fix
alignment issues with caches on some platforms? Basically with all 3
patches you fix one and only issue and application of any one of those 3
patches doesn't solve your problem, right?

Regards,
Alexey


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-24 Thread Ian Campbell
On Thu, 2014-04-24 at 17:41 +, Alexey Brodkin wrote:

 1. Don't invalidate sizeof(struct dmamacdescr) but only
 roundup(sizeof(desc_p-txrx_status), ARCH_DMA_MINALIGN)).

OK. (Although given the realities of the real world values of
ARCH_DMA_MINALIGN on every arch and the sizes of the structs  fields
involved this isn't actually buying you anything at all)

 2. In the following lines implements rounding as well:

Will fix as well.

 3. Check carefully if there're other instances of probably unaligned
 cache operations.

I'm not seeing any others, in practice or by eye-balling the code.

 4. Why don't you squeeze all 3 patches in 1 and name it like fix
 alignment issues with caches on some platforms? Basically with all 3
 patches you fix one and only issue and application of any one of those 3
 patches doesn't solve your problem, right?

These are the issues as I discovered them one by one. I can fold them if
you like but doing them separately will aid bisection if one of them
turns out to be wrong in some way. As you prefer.

Ian.

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-24 Thread Stefan Monnier
 1. Don't invalidate sizeof(struct dmamacdescr) but only
 roundup(sizeof(desc_p-txrx_status), ARCH_DMA_MINALIGN)).

I'm not sure I like this: if ARCH_DMA_MINALIGN is too large and ends
up invalidating more than the struct, it could be an error, so it's
safer to ask it to invalidate the struct (which we know can be safely
invalidates).

If invalidate_dcache_range is used often, then I'd suggest to change
its API so it receives 2 bounds: the one that has to be invalidated
and the surrounding one that can safely be invalidated.


Stefan

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


[U-Boot] [PATCH 2/3] net/designware: invalidate entire descriptor in dw_eth_send

2014-04-19 Thread Ian Campbell
Some platforms cannot invalidate the cache at four byte intervals, so
invalidate the entire descriptor.

Signed-off-by: Ian Campbell i...@hellion.org.uk
---
 drivers/net/designware.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/designware.c b/drivers/net/designware.c
index 1120f70..7d14cec 100644
--- a/drivers/net/designware.c
+++ b/drivers/net/designware.c
@@ -280,10 +280,13 @@ static int dw_eth_send(struct eth_device *dev, void 
*packet, int length)
u32 desc_num = priv-tx_currdescnum;
struct dmamacdescr *desc_p = priv-tx_mac_descrtable[desc_num];
 
-   /* Invalidate only status field for the following check */
-   invalidate_dcache_range((unsigned long)desc_p-txrx_status,
-   (unsigned long)desc_p-txrx_status +
-   sizeof(desc_p-txrx_status));
+   /* Strictly we only need to invalidate the status field for
+* the following check, but on some platforms we cannot
+* invalidate only 4 bytes, so invalidate the the whole thing
+* which is known to be DMA aligned. */
+   invalidate_dcache_range((unsigned long)desc_p,
+   (unsigned long)desc_p +
+   sizeof(struct dmamacdescr));
 
/* Check if the descriptor is owned by CPU */
if (desc_p-txrx_status  DESC_TXSTS_OWNBYDMA) {
-- 
1.9.0

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot