Re: [PATCH 1/3] powerpc: move the testing of CPU_FTR_COHERENT_ICACHE into __flush_icache_range
On Tue, Aug 06, 2013 at 08:35:10PM +1000, Benjamin Herrenschmidt wrote: On Tue, 2013-08-06 at 18:23 +0800, Kevin Hao wrote: In function flush_icache_range(), we use cpu_has_feature() to test the feature bit of CPU_FTR_COHERENT_ICACHE. But this seems not optimal for two reasons: a) For ppc32, the function __flush_icache_range() already do this check with the macro END_FTR_SECTION_IFSET. b) Compare with the cpu_has_feature(), the method of using macro END_FTR_SECTION_IFSET will not introduce any runtime overhead. Nak. It adds the overhead of calling into a function :-) What about modifying cpu_has_feature to use jump labels ? That's a great idea. I would like to gave it a try later. It might solve the problem of no runtime overhead ... however it might also be hard to keep the ability to remove the whole statement at compile time if the bit doesn't fit in the POSSIBLE mask... unless you find the right macro magic. In any case, I suspect the function call introduces more overhead than the bit test + conditional branch which will generally predict very well, so the patch as-is is probably a regression. I don't think so. For the 64bit CPU which has a non-coherent icache, there is no any effect introduced by this patch since (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) always yield true at compile time. But for the 64bit CPU which does have the coherent icache, the following is the asm code before applying this patch: c0023b4c: e9 22 86 68 ld r9,-31128(r2) c0023b50: e9 29 00 00 ld r9,0(r9) c0023b54: e9 29 00 10 ld r9,16(r9) c0023b58: 79 2a 07 e1 clrldi. r10,r9,63 c0023b5c: 41 82 00 94 beq c0023bf0 .handle_rt_signal64+0x670 ... c0023bf0: 7f 23 cb 78 mr r3,r25 c0023bf4: 7f 64 db 78 mr r4,r27 c0023bf8: 48 7a 65 99 bl c07ca190 .__flush_icache_range After applying this patch, the following code is generated: c0023b48: 7f 23 cb 78 mr r3,r25 c0023b4c: 7f 64 db 78 mr r4,r27 c0023b50: 48 7a 65 81 bl c07ca0d0 .flush_icache_range The run path for these two cases are: before after ld r9,-31128(r2)mr r3,r25 ld r9,0(r9) mr r4,r27 ld r9,16(r9)bl flush_icache_range clrldi. r10,r9,63 blr beq So I don't think this will introduce more overhead than before. On the contrary, I believe it yield less overhead than before. Correct me if I am wrong. Did you measure ? No. I don't have a access to 64bit CPU which has a coherent icache. For a non-coherent icache 64bit CPU this patch will not cause any effect. Thanks, Kevin Ben. Signed-off-by: Kevin Hao haoke...@gmail.com --- arch/powerpc/include/asm/cacheflush.h | 3 +-- arch/powerpc/kernel/misc_64.S | 4 +++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h index b843e35..60b620d 100644 --- a/arch/powerpc/include/asm/cacheflush.h +++ b/arch/powerpc/include/asm/cacheflush.h @@ -35,8 +35,7 @@ extern void __flush_disable_L1(void); extern void __flush_icache_range(unsigned long, unsigned long); static inline void flush_icache_range(unsigned long start, unsigned long stop) { - if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) - __flush_icache_range(start, stop); + __flush_icache_range(start, stop); } extern void flush_icache_user_range(struct vm_area_struct *vma, diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 6820e45..74d87f1 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -68,7 +68,9 @@ PPC64_CACHES: */ _KPROBE(__flush_icache_range) - +BEGIN_FTR_SECTION + blr +END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) /* * Flush the data cache to memory * pgpnxVg_fWBzi.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/3] powerpc: move the testing of CPU_FTR_COHERENT_ICACHE into __flush_icache_range
In function flush_icache_range(), we use cpu_has_feature() to test the feature bit of CPU_FTR_COHERENT_ICACHE. But this seems not optimal for two reasons: a) For ppc32, the function __flush_icache_range() already do this check with the macro END_FTR_SECTION_IFSET. b) Compare with the cpu_has_feature(), the method of using macro END_FTR_SECTION_IFSET will not introduce any runtime overhead. Signed-off-by: Kevin Hao haoke...@gmail.com --- arch/powerpc/include/asm/cacheflush.h | 3 +-- arch/powerpc/kernel/misc_64.S | 4 +++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h index b843e35..60b620d 100644 --- a/arch/powerpc/include/asm/cacheflush.h +++ b/arch/powerpc/include/asm/cacheflush.h @@ -35,8 +35,7 @@ extern void __flush_disable_L1(void); extern void __flush_icache_range(unsigned long, unsigned long); static inline void flush_icache_range(unsigned long start, unsigned long stop) { - if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) - __flush_icache_range(start, stop); + __flush_icache_range(start, stop); } extern void flush_icache_user_range(struct vm_area_struct *vma, diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 6820e45..74d87f1 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -68,7 +68,9 @@ PPC64_CACHES: */ _KPROBE(__flush_icache_range) - +BEGIN_FTR_SECTION + blr +END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) /* * Flush the data cache to memory * -- 1.8.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] powerpc: move the testing of CPU_FTR_COHERENT_ICACHE into __flush_icache_range
On Tue, 2013-08-06 at 18:23 +0800, Kevin Hao wrote: In function flush_icache_range(), we use cpu_has_feature() to test the feature bit of CPU_FTR_COHERENT_ICACHE. But this seems not optimal for two reasons: a) For ppc32, the function __flush_icache_range() already do this check with the macro END_FTR_SECTION_IFSET. b) Compare with the cpu_has_feature(), the method of using macro END_FTR_SECTION_IFSET will not introduce any runtime overhead. Nak. It adds the overhead of calling into a function :-) What about modifying cpu_has_feature to use jump labels ? It might solve the problem of no runtime overhead ... however it might also be hard to keep the ability to remove the whole statement at compile time if the bit doesn't fit in the POSSIBLE mask... unless you find the right macro magic. In any case, I suspect the function call introduces more overhead than the bit test + conditional branch which will generally predict very well, so the patch as-is is probably a regression. Did you measure ? Ben. Signed-off-by: Kevin Hao haoke...@gmail.com --- arch/powerpc/include/asm/cacheflush.h | 3 +-- arch/powerpc/kernel/misc_64.S | 4 +++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h index b843e35..60b620d 100644 --- a/arch/powerpc/include/asm/cacheflush.h +++ b/arch/powerpc/include/asm/cacheflush.h @@ -35,8 +35,7 @@ extern void __flush_disable_L1(void); extern void __flush_icache_range(unsigned long, unsigned long); static inline void flush_icache_range(unsigned long start, unsigned long stop) { - if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) - __flush_icache_range(start, stop); + __flush_icache_range(start, stop); } extern void flush_icache_user_range(struct vm_area_struct *vma, diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 6820e45..74d87f1 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -68,7 +68,9 @@ PPC64_CACHES: */ _KPROBE(__flush_icache_range) - +BEGIN_FTR_SECTION + blr +END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) /* * Flush the data cache to memory * ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev