Re: [PATCH] add strncmp to PowerPC

2008-03-04 Thread Paul Mackerras
Benjamin Herrenschmidt writes:

 Do we have any indication that it performs better than the C one ?

I would expect it to, given that the assembler one has two branches in
the per-byte loop compared to 3 in the C version.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] add strncmp to PowerPC

2008-03-04 Thread Segher Boessenkool
 Do we have any indication that it performs better than the C one ?

 I would expect it to, given that the assembler one has two branches in
 the per-byte loop compared to 3 in the C version.

But really, does it matter for strncmp() in the kernel?

Anyway, this asm code has bugs, as do both the current C version in the
kernel, and the code I posted.  We need to do better :-)


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] add strncmp to PowerPC

2008-03-04 Thread Segher Boessenkool
 Anyway, this asm code has bugs, as do both the current C version in 
 the
 kernel, and the code I posted.  We need to do better :-)

 The only bug I know of in the asm code is the behaviour when the count
 is zero.  Do you know of any other?

No, that's the bug I meant.  Sorry for using such inexact language.


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] add strncmp to PowerPC

2008-03-04 Thread Paul Mackerras
Segher Boessenkool writes:

 Anyway, this asm code has bugs, as do both the current C version in the
 kernel, and the code I posted.  We need to do better :-)

The only bug I know of in the asm code is the behaviour when the count
is zero.  Do you know of any other?

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] add strncmp to PowerPC

2008-03-03 Thread Andreas Schwab
Gabriel Paubert [EMAIL PROTECTED] writes:

 Now that I think a bit more about it, I believe that the C version is
 incorrect: the clrldi/extsb dance takes a value between -255 and +255
 and collapses it into the -128 to 127 range, meaning that the return
 value may be wrong if we rely on the sign of the result. So unless I
 miss something, the problem is much more serious than just stupid code
 (I had just a look at the libc version in C and characters are cast to
 unsigned char before the comparison).

The latter is explicitly required by the C standard.  Ie. even if your
characters are signed they are always compared as unsigned by
strcmp/strncmp/memcmp.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] add strncmp to PowerPC

2008-03-03 Thread Segher Boessenkool
 Even if it was logically faster (which I still doubt) it's a hell of 
 a lot
 of cache lines to waste.

Yeah, 1 on 64-bit and 3 on 32-bit, that's a terrible lot./sarcasm

 Indeed, but there are some corner cases that the C code handles. Like
 a length of 0 which may lead to infinite loop in the asm code.

 OTOH, I'm a bit surprised by the extsb instructions in the compiler 
 generated
 code. We don't compile with -fsigned-char, do we? The clrldi
 instructions are also extremely stupid.

Those are both necessary to be equivalent to the C code, which uses
signed char explicitly.  It is generally considered a Good Thing(tm)
for the compiler to generate assembler code equivalent to the C code,
even if the C code is wrong.

 Now that I think a bit more about it, I believe that the C version is
 incorrect

It is.  It's a great entry for the IOCCC as well.

I just tested the following (can't guarantee it's correct, just a PoC):

int strncmp(const char *s1, const char *s2, unsigned long /*size_t*/ 
len)
{
 while (len--) {
 unsigned char c1, c2;
 c1 = *s1++;
 c2 = *s2++;
 int cmp = c1 - c2;
 if (cmp)
 return cmp;
 if (c1 == 0 || c2 == 0)
 break;
 }
 return 0;
}

which generates (with GCC-4.2.3)

strncmp:
 addi 5,5,1
 mtctr 5
.L2:
 bdz .L11
 lbz 0,0(3)
 addi 3,3,1
 lbz 9,0(4)
 addi 4,4,1
 cmpwi 7,0,0
 subf. 0,9,0
 cmpwi 6,9,0
 bne- 0,.L4
 beq- 7,.L4
 bne+ 6,.L2
.L4:
 mr 3,0
 blr
.L11:
 li 0,0
 mr 3,0
 blr

which isn't horrid, although it does some weirdish things obviously.

Current GCC-4.4.0 generates

strncmp:
 addi 5,5,1
 mr 10,3
 mtctr 5
 li 11,0
 bdz .L7
 .p2align 4,,15
.L4:
 lbzx 0,10,11
 lbzx 9,4,11
 addi 11,11,1
 subf. 3,9,0
 cmpwi 6,9,0
 cmpwi 7,0,0
 bnelr 0
 beqlr 7
 beqlr 6
 bdnz .L4
.L7:
 li 3,0
 blr

which is about as good as it can get (well, it didn't realise you
only need to test one of c1, c2 for zero.  Did I say this was just
proof-of-concept code?)


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] add strncmp to PowerPC

2008-02-29 Thread Benjamin Herrenschmidt

On Fri, 2008-02-29 at 11:04 -0500, Steven Rostedt wrote:
 strncmp is defined in assembly for bootup, but it is not defined in the
 normal running kernel. This patch takes the strncmp code from the bootup
 and copies it to the kernel proper.
 
 Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
 ---

Do we have any indication that it performs better than the C one ?

Ben.

  arch/powerpc/kernel/ppc_ksyms.c |1 +
  arch/powerpc/lib/string.S   |   12 
  include/asm-powerpc/string.h|2 ++
  3 files changed, 15 insertions(+)
 
 Index: linux-sched-devel.git/arch/powerpc/kernel/ppc_ksyms.c
 ===
 --- linux-sched-devel.git.orig/arch/powerpc/kernel/ppc_ksyms.c
 2008-02-27 14:01:38.0 -0800
 +++ linux-sched-devel.git/arch/powerpc/kernel/ppc_ksyms.c 2008-02-29 
 07:24:22.0 -0800
 @@ -78,6 +78,7 @@ EXPORT_SYMBOL(strncpy);
  EXPORT_SYMBOL(strcat);
  EXPORT_SYMBOL(strlen);
  EXPORT_SYMBOL(strcmp);
 +EXPORT_SYMBOL(strncmp);
  
  EXPORT_SYMBOL(csum_partial);
  EXPORT_SYMBOL(csum_partial_copy_generic);
 Index: linux-sched-devel.git/arch/powerpc/lib/string.S
 ===
 --- linux-sched-devel.git.orig/arch/powerpc/lib/string.S  2008-02-27 
 14:01:38.0 -0800
 +++ linux-sched-devel.git/arch/powerpc/lib/string.S   2008-02-29 
 07:24:22.0 -0800
 @@ -75,6 +75,18 @@ _GLOBAL(strcmp)
   beq 1b
   blr
  
 +_GLOBAL(strncmp)
 + mtctr   r5
 + addir5,r3,-1
 + addir4,r4,-1
 +1:   lbzur3,1(r5)
 + cmpwi   1,r3,0
 + lbzur0,1(r4)
 + subf.   r3,r0,r3
 + beqlr   1
 + bdnzt   eq,1b
 + blr
 +
  _GLOBAL(strlen)
   addir4,r3,-1
  1:   lbzur0,1(r4)
 Index: linux-sched-devel.git/include/asm-powerpc/string.h
 ===
 --- linux-sched-devel.git.orig/include/asm-powerpc/string.h   2008-02-27 
 14:01:58.0 -0800
 +++ linux-sched-devel.git/include/asm-powerpc/string.h2008-02-29 
 07:24:22.0 -0800
 @@ -7,6 +7,7 @@
  #define __HAVE_ARCH_STRNCPY
  #define __HAVE_ARCH_STRLEN
  #define __HAVE_ARCH_STRCMP
 +#define __HAVE_ARCH_STRNCMP
  #define __HAVE_ARCH_STRCAT
  #define __HAVE_ARCH_MEMSET
  #define __HAVE_ARCH_MEMCPY
 @@ -18,6 +19,7 @@ extern char * strcpy(char *,const char *
  extern char * strncpy(char *,const char *, __kernel_size_t);
  extern __kernel_size_t strlen(const char *);
  extern int strcmp(const char *,const char *);
 +extern int strncmp(const char *,const char *,__kernel_size_t);
  extern char * strcat(char *, const char *);
  extern void * memset(void *,int,__kernel_size_t);
  extern void * memcpy(void *,const void *,__kernel_size_t);
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@ozlabs.org
 https://ozlabs.org/mailman/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] add strncmp to PowerPC

2008-02-29 Thread Steven Rostedt

On Sat, 1 Mar 2008, Benjamin Herrenschmidt wrote:

 Do we have any indication that it performs better than the C one ?

See below.


 Ben.


 
  +_GLOBAL(strncmp)
  +   mtctr   r5
  +   addir5,r3,-1
  +   addir4,r4,-1
  +1: lbzur3,1(r5)
  +   cmpwi   1,r3,0
  +   lbzur0,1(r4)
  +   subf.   r3,r0,r3
  +   beqlr   1
  +   bdnzt   eq,1b
  +   blr
  +


And here's the objdump of the C version:

0080 .strncmp:
  80:   fb e1 ff f0 std r31,-16(r1)
  84:   f8 21 ff c1 stdur1,-64(r1)
  88:   7c 69 1b 78 mr  r9,r3
  8c:   7c a0 2b 79 mr. r0,r5
  90:   38 60 00 00 li  r3,0
  94:   7c 09 03 a6 mtctr   r0
  98:   7c 3f 0b 78 mr  r31,r1
  9c:   41 82 00 68 beq-104 .strncmp+0x84
  a0:   89 69 00 00 lbz r11,0(r9)
  a4:   88 04 00 00 lbz r0,0(r4)
  a8:   7c 00 58 50 subfr0,r0,r11
  ac:   78 00 06 20 clrldi  r0,r0,56
  b0:   2f a0 00 00 cmpdi   cr7,r0,0
  b4:   7c 00 07 74 extsb   r0,r0
  b8:   7c 03 03 78 mr  r3,r0
  bc:   40 9e 00 48 bne-cr7,104 .strncmp+0x84
  c0:   2f ab 00 00 cmpdi   cr7,r11,0
  c4:   41 9e 00 40 beq-cr7,104 .strncmp+0x84
  c8:   38 84 00 01 addir4,r4,1
  cc:   38 69 00 01 addir3,r9,1
  d0:   42 40 00 30 bdz-100 .strncmp+0x80
  d4:   88 03 00 00 lbz r0,0(r3)
  d8:   89 24 00 00 lbz r9,0(r4)
  dc:   38 63 00 01 addir3,r3,1
  e0:   38 84 00 01 addir4,r4,1
  e4:   2f 20 00 00 cmpdi   cr6,r0,0
  e8:   7c 09 00 50 subfr0,r9,r0
  ec:   78 00 06 20 clrldi  r0,r0,56
  f0:   2f a0 00 00 cmpdi   cr7,r0,0
  f4:   7c 00 07 74 extsb   r0,r0
  f8:   40 9e 00 08 bne-cr7,100 .strncmp+0x80
  fc:   40 9a ff d4 bne+cr6,d0 .strncmp+0x50
 100:   7c 03 03 78 mr  r3,r0
 104:   e8 21 00 00 ld  r1,0(r1)
 108:   eb e1 ff f0 ld  r31,-16(r1)
 10c:   4e 80 00 20 blr


I'll let you decide ;-)

Even if it was logically faster (which I still doubt) it's a hell of a lot
of cache lines to waste.

-- Steve

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev