Re: [PATCH 1/3] powerpc: move the testing of CPU_FTR_COHERENT_ICACHE into __flush_icache_range

2013-08-07 Thread Kevin Hao
On Tue, Aug 06, 2013 at 08:35:10PM +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2013-08-06 at 18:23 +0800, Kevin Hao wrote:
  In function flush_icache_range(), we use cpu_has_feature() to test
  the feature bit of CPU_FTR_COHERENT_ICACHE. But this seems not optimal
  for two reasons:
   a) For ppc32, the function __flush_icache_range() already do this
  check with the macro END_FTR_SECTION_IFSET.
   b) Compare with the cpu_has_feature(), the method of using macro
  END_FTR_SECTION_IFSET will not introduce any runtime overhead.
 
 Nak.
 
 It adds the overhead of calling into a function :-)
 
 What about modifying cpu_has_feature to use jump labels ? 

That's a great idea. I would like to gave it a try later.

It might solve
 the problem of no runtime overhead ... however it might also be hard to
 keep the ability to remove the whole statement at compile time if the
 bit doesn't fit in the POSSIBLE mask... unless you find the right macro
 magic.
 
 In any case, I suspect the function call introduces more overhead than
 the bit test + conditional branch which will generally predict very
 well, so the patch as-is is probably a regression.

I don't think so. For the 64bit CPU which has a non-coherent icache,
there is no any effect introduced by this patch since 
(!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
always yield true at compile time. But for the 64bit CPU which does have
the coherent icache, the following is the asm code before applying this patch:

  c0023b4c: e9 22 86 68 ld  r9,-31128(r2)
  c0023b50: e9 29 00 00 ld  r9,0(r9)
  c0023b54: e9 29 00 10 ld  r9,16(r9)
  c0023b58: 79 2a 07 e1 clrldi. r10,r9,63
  c0023b5c: 41 82 00 94 beq c0023bf0 
.handle_rt_signal64+0x670
  ... 
  c0023bf0: 7f 23 cb 78 mr  r3,r25
  c0023bf4: 7f 64 db 78 mr  r4,r27
  c0023bf8: 48 7a 65 99 bl  c07ca190 
.__flush_icache_range

After applying this patch, the following code is generated:
  c0023b48: 7f 23 cb 78 mr  r3,r25
  c0023b4c: 7f 64 db 78 mr  r4,r27
  c0023b50: 48 7a 65 81 bl  c07ca0d0 
.flush_icache_range


The run path for these two cases are:

before  after
  ld r9,-31128(r2)mr r3,r25
  ld r9,0(r9) mr r4,r27
  ld r9,16(r9)bl flush_icache_range
  clrldi. r10,r9,63   blr
  beq   

So I don't think this will introduce more overhead than before.
On the contrary, I believe it yield less overhead than before.
Correct me if I am wrong.

 
 Did you measure ?

No. I don't have a access to 64bit CPU which has a coherent icache.
For a non-coherent icache 64bit CPU this patch will not cause any effect.

Thanks,
Kevin

 
 Ben.
 
 
  Signed-off-by: Kevin Hao haoke...@gmail.com
  ---
   arch/powerpc/include/asm/cacheflush.h | 3 +--
   arch/powerpc/kernel/misc_64.S | 4 +++-
   2 files changed, 4 insertions(+), 3 deletions(-)
  
  diff --git a/arch/powerpc/include/asm/cacheflush.h 
  b/arch/powerpc/include/asm/cacheflush.h
  index b843e35..60b620d 100644
  --- a/arch/powerpc/include/asm/cacheflush.h
  +++ b/arch/powerpc/include/asm/cacheflush.h
  @@ -35,8 +35,7 @@ extern void __flush_disable_L1(void);
   extern void __flush_icache_range(unsigned long, unsigned long);
   static inline void flush_icache_range(unsigned long start, unsigned long 
  stop)
   {
  -   if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
  -   __flush_icache_range(start, stop);
  +   __flush_icache_range(start, stop);
   }
   
   extern void flush_icache_user_range(struct vm_area_struct *vma,
  diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
  index 6820e45..74d87f1 100644
  --- a/arch/powerpc/kernel/misc_64.S
  +++ b/arch/powerpc/kernel/misc_64.S
  @@ -68,7 +68,9 @@ PPC64_CACHES:
*/
   
   _KPROBE(__flush_icache_range)
  -
  +BEGIN_FTR_SECTION
  +   blr
  +END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
   /*
* Flush the data cache to memory 
* 
 
 


pgpnxVg_fWBzi.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] powerpc: move the testing of CPU_FTR_COHERENT_ICACHE into __flush_icache_range

2013-08-06 Thread Kevin Hao
In function flush_icache_range(), we use cpu_has_feature() to test
the feature bit of CPU_FTR_COHERENT_ICACHE. But this seems not optimal
for two reasons:
 a) For ppc32, the function __flush_icache_range() already do this
check with the macro END_FTR_SECTION_IFSET.
 b) Compare with the cpu_has_feature(), the method of using macro
END_FTR_SECTION_IFSET will not introduce any runtime overhead.

Signed-off-by: Kevin Hao haoke...@gmail.com
---
 arch/powerpc/include/asm/cacheflush.h | 3 +--
 arch/powerpc/kernel/misc_64.S | 4 +++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index b843e35..60b620d 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -35,8 +35,7 @@ extern void __flush_disable_L1(void);
 extern void __flush_icache_range(unsigned long, unsigned long);
 static inline void flush_icache_range(unsigned long start, unsigned long stop)
 {
-   if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
-   __flush_icache_range(start, stop);
+   __flush_icache_range(start, stop);
 }
 
 extern void flush_icache_user_range(struct vm_area_struct *vma,
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 6820e45..74d87f1 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -68,7 +68,9 @@ PPC64_CACHES:
  */
 
 _KPROBE(__flush_icache_range)
-
+BEGIN_FTR_SECTION
+   blr
+END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
 /*
  * Flush the data cache to memory 
  * 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/3] powerpc: move the testing of CPU_FTR_COHERENT_ICACHE into __flush_icache_range

2013-08-06 Thread Benjamin Herrenschmidt
On Tue, 2013-08-06 at 18:23 +0800, Kevin Hao wrote:
 In function flush_icache_range(), we use cpu_has_feature() to test
 the feature bit of CPU_FTR_COHERENT_ICACHE. But this seems not optimal
 for two reasons:
  a) For ppc32, the function __flush_icache_range() already do this
 check with the macro END_FTR_SECTION_IFSET.
  b) Compare with the cpu_has_feature(), the method of using macro
 END_FTR_SECTION_IFSET will not introduce any runtime overhead.

Nak.

It adds the overhead of calling into a function :-)

What about modifying cpu_has_feature to use jump labels ? It might solve
the problem of no runtime overhead ... however it might also be hard to
keep the ability to remove the whole statement at compile time if the
bit doesn't fit in the POSSIBLE mask... unless you find the right macro
magic.

In any case, I suspect the function call introduces more overhead than
the bit test + conditional branch which will generally predict very
well, so the patch as-is is probably a regression.

Did you measure ?

Ben.


 Signed-off-by: Kevin Hao haoke...@gmail.com
 ---
  arch/powerpc/include/asm/cacheflush.h | 3 +--
  arch/powerpc/kernel/misc_64.S | 4 +++-
  2 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/cacheflush.h 
 b/arch/powerpc/include/asm/cacheflush.h
 index b843e35..60b620d 100644
 --- a/arch/powerpc/include/asm/cacheflush.h
 +++ b/arch/powerpc/include/asm/cacheflush.h
 @@ -35,8 +35,7 @@ extern void __flush_disable_L1(void);
  extern void __flush_icache_range(unsigned long, unsigned long);
  static inline void flush_icache_range(unsigned long start, unsigned long 
 stop)
  {
 - if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
 - __flush_icache_range(start, stop);
 + __flush_icache_range(start, stop);
  }
  
  extern void flush_icache_user_range(struct vm_area_struct *vma,
 diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
 index 6820e45..74d87f1 100644
 --- a/arch/powerpc/kernel/misc_64.S
 +++ b/arch/powerpc/kernel/misc_64.S
 @@ -68,7 +68,9 @@ PPC64_CACHES:
   */
  
  _KPROBE(__flush_icache_range)
 -
 +BEGIN_FTR_SECTION
 + blr
 +END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
  /*
   * Flush the data cache to memory 
   * 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev