Benjamin Herrenschmidt writes:
Do we have any indication that it performs better than the C one ?
I would expect it to, given that the assembler one has two branches in
the per-byte loop compared to 3 in the C version.
Paul.
___
Linuxppc-dev mailing
Do we have any indication that it performs better than the C one ?
I would expect it to, given that the assembler one has two branches in
the per-byte loop compared to 3 in the C version.
But really, does it matter for strncmp() in the kernel?
Anyway, this asm code has bugs, as do both the
Anyway, this asm code has bugs, as do both the current C version in
the
kernel, and the code I posted. We need to do better :-)
The only bug I know of in the asm code is the behaviour when the count
is zero. Do you know of any other?
No, that's the bug I meant. Sorry for using such
Segher Boessenkool writes:
Anyway, this asm code has bugs, as do both the current C version in the
kernel, and the code I posted. We need to do better :-)
The only bug I know of in the asm code is the behaviour when the count
is zero. Do you know of any other?
Paul.
Gabriel Paubert [EMAIL PROTECTED] writes:
Now that I think a bit more about it, I believe that the C version is
incorrect: the clrldi/extsb dance takes a value between -255 and +255
and collapses it into the -128 to 127 range, meaning that the return
value may be wrong if we rely on the sign
Even if it was logically faster (which I still doubt) it's a hell of
a lot
of cache lines to waste.
Yeah, 1 on 64-bit and 3 on 32-bit, that's a terrible lot./sarcasm
Indeed, but there are some corner cases that the C code handles. Like
a length of 0 which may lead to infinite loop in the
On Fri, 2008-02-29 at 11:04 -0500, Steven Rostedt wrote:
strncmp is defined in assembly for bootup, but it is not defined in the
normal running kernel. This patch takes the strncmp code from the bootup
and copies it to the kernel proper.
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
On Sat, 1 Mar 2008, Benjamin Herrenschmidt wrote:
Do we have any indication that it performs better than the C one ?
See below.
Ben.
+_GLOBAL(strncmp)
+ mtctr r5
+ addir5,r3,-1
+ addir4,r4,-1
+1: lbzur3,1(r5)
+ cmpwi 1,r3,0
+ lbzur0,1(r4)
+