On Wed, Oct 03, 2012 at 08:38:04PM +0200, Gabriel Gonzalez wrote: > Hi Rich, > > You replied before I was able to run the test but, yeah, you are > right, my algorithm currently does the test word-based after it hits > an aligned address, using a 3 instruction check to look for the null > character. > > As requested I attach a plot of muslstrlen vs mystrlen, as you > stated my ASM version outperforms the C version.
Thanks. It's a shame GCC can't get things like this right, because we really shouldn't have to be writing per-arch asm when the desired asm is _identical_ on each arch except for the mnemonic and register names. I wish GCC's optimizer had the ability to detect loops with <N arithmetic operations and basically brute-force search for a way to do the same thing with less register shuffling and nonsensical overhead inside the loop. On all but the tightest loops, it doesn't seem to matter; their inefficient register shuffling gives 95% of better performance compared to hand-written asm. But on loops with really short bodies like this, the performance really suffers. Rich _______________________________________________ uClibc mailing list [email protected] http://lists.busybox.net/mailman/listinfo/uclibc
