Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
On Thu, Aug 11, 2016 at 05:11:19PM -0500, Segher Boessenkool wrote: > On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote: > > On the other hand gcc did at the time a very poor job (quite an > > understatement) at bswapdi when compiling for 64 bit processors > > (see the example). > > > > But what do modern compilers generate for bswapdi these days? Do they > > still call the library or not? > > Nope. Great, could then these functions be removed from misc_32.S, or are compilers that use libcalls still supported for kernel builds? > > > After all, bswapdi on 32 bit processors only takes 6 instructions if the > > input and output registers don't overlap. > > For this testcase: > === > typedef unsigned long long u64; > u64 bs(u64 x) { return __builtin_bswap64(x); } > === > > we get with -m32: > === > bs: > mr 9,3 > rotlwi 3,4,24 > rlwimi 3,4,8,8,15 > rlwimi 3,4,8,24,31 > rotlwi 4,9,24 > rlwimi 4,9,8,8,15 > rlwimi 4,9,8,24,31 > blr In this case the compiler is constrained by the fact that the input and ouput registers are the same. When inlined with other things it can probably perform better scheduling and interleaving of operations. > === > > and with -m64: > === > .L.bs: > srdi 10,3,32 > mr 9,3 > rotlwi 3,3,24 > rotlwi 8,10,24 > rlwimi 3,9,8,8,15 > rlwimi 8,10,8,8,15 > rlwimi 3,9,8,24,31 > rlwimi 8,10,8,24,31 > sldi 3,3,32 > or 3,3,8 > blr > === > As demonstrated here where the two halves of the 64 bit quantity are byte swapped in an interleaved fashion. Not perfect (I think that with proper ordering the last 2 instructions could be replaced by a rldimi), but reasonable. > Neither as tight as possible, but neither horrible either. > Indeed. Gabriel
Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote: > On the other hand gcc did at the time a very poor job (quite an > understatement) at bswapdi when compiling for 64 bit processors > (see the example). > > But what do modern compilers generate for bswapdi these days? Do they > still call the library or not? Nope. > After all, bswapdi on 32 bit processors only takes 6 instructions if the > input and output registers don't overlap. For this testcase: === typedef unsigned long long u64; u64 bs(u64 x) { return __builtin_bswap64(x); } === we get with -m32: === bs: mr 9,3 rotlwi 3,4,24 rlwimi 3,4,8,8,15 rlwimi 3,4,8,24,31 rotlwi 4,9,24 rlwimi 4,9,8,8,15 rlwimi 4,9,8,24,31 blr === and with -m64: === .L.bs: srdi 10,3,32 mr 9,3 rotlwi 3,3,24 rotlwi 8,10,24 rlwimi 3,9,8,8,15 rlwimi 8,10,8,8,15 rlwimi 3,9,8,24,31 rlwimi 8,10,8,24,31 sldi 3,3,32 or 3,3,8 blr === Neither as tight as possible, but neither horrible either. Segher
Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
On Wed, Aug 10, 2016 at 12:18:15PM +0200, Christophe Leroy wrote: > > > Le 10/08/2016 à 10:56, Gabriel Paubert a écrit : > >On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote: > >>Signed-off-by: Christophe Leroy > >>--- > >> arch/powerpc/kernel/misc_32.S | 3 +-- > >> 1 file changed, 1 insertion(+), 2 deletions(-) > >> > >>diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S > >>index e025230..e18055c 100644 > >>--- a/arch/powerpc/kernel/misc_32.S > >>+++ b/arch/powerpc/kernel/misc_32.S > >>@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2) > >>rlwimi r9,r4,24,0,7 > >>rlwimi r10,r3,24,0,7 > >>rlwimi r9,r4,24,16,23 > >>- rlwimi r10,r3,24,16,23 > >>+ rlwimi r4,r3,24,16,23 > >>mr r3,r9 > >>- mr r4,r10 > >>blr > >> > > > >Hmmm, are you sure that it works? rlwimi is a bit special since the > >first operand is both an input and an output of the instruction. > > > > > > Oops, you are right ... I just found this: http://hardwarebug.org/2010/01/14/beware-the-builtins/ the bswapdi2 suggested sequence only needs a single mr instruction, the other one is absorbed in a rotlwi. The scheduling looks poor, but it seems impossible to interleave the operations between the two halves without adding another instructions, and the routine is 8 instructions long, which happens to be exactly a cache line on most 32 bit processors. On the other hand gcc did at the time a very poor job (quite an understatement) at bswapdi when compiling for 64 bit processors (see the example). But what do modern compilers generate for bswapdi these days? Do they still call the library or not? After all, bswapdi on 32 bit processors only takes 6 instructions if the input and output registers don't overlap. Gabriel
Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
Le 10/08/2016 à 10:56, Gabriel Paubert a écrit : On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote: Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/misc_32.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index e025230..e18055c 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2) rlwimi r9,r4,24,0,7 rlwimi r10,r3,24,0,7 rlwimi r9,r4,24,16,23 - rlwimi r10,r3,24,16,23 + rlwimi r4,r3,24,16,23 mr r3,r9 - mr r4,r10 blr Hmmm, are you sure that it works? rlwimi is a bit special since the first operand is both an input and an output of the instruction. Oops, you are right ...
Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote: > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/misc_32.S | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S > index e025230..e18055c 100644 > --- a/arch/powerpc/kernel/misc_32.S > +++ b/arch/powerpc/kernel/misc_32.S > @@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2) > rlwimi r9,r4,24,0,7 > rlwimi r10,r3,24,0,7 > rlwimi r9,r4,24,16,23 > - rlwimi r10,r3,24,16,23 > + rlwimi r4,r3,24,16,23 > mr r3,r9 > - mr r4,r10 > blr > Hmmm, are you sure that it works? rlwimi is a bit special since the first operand is both an input and an output of the instruction. In other words, did you test the code? Gabriel
[PATCH] powerpc/32: Remove one insn in __bswapdi2
Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/misc_32.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index e025230..e18055c 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2) rlwimi r9,r4,24,0,7 rlwimi r10,r3,24,0,7 rlwimi r9,r4,24,16,23 - rlwimi r10,r3,24,16,23 + rlwimi r4,r3,24,16,23 mr r3,r9 - mr r4,r10 blr #ifdef CONFIG_SMP -- 2.1.0