Re: [osol-code] ffs vs. lowbit

Nils Goroll Thu, 24 Sep 2009 11:02:57 -0700

today I noticed this inefficient implementation for the "find first setbit" function:


This is interesting:


I've tried this tiny benchmark with a[] being filled with the powers of two:

        start = gethrtime();
        for (j = 0; j < iters; j++)
                for (i = 0; i <= 33; i++) {
                        x[i] = ffs(a[i]);
                }
        end = gethrtime();

        printf("Avg ffs time = %lld nsec\n", (end - start) / iters);

        start = gethrtime();
        for (j = 0; j < iters; j++)
                for (i = 0; i <= 33; i++) {
                        y[i] = lowbit(a[i]);
                }
        end = gethrtime();

        printf("Avg lowbit time = %lld nsec\n", (end - start) / iters);

If I just compile with gcc, I get

Avg ffs time = 794 nsec
Avg lowbit time = 111 nsec

but it I compile with gcc -m64 -c -funroll-loops -O2 lowbit_ffs.c, I will notonly get unrolled loops, but gcc will also replace the ffs calls with bsfl


    main+0xc4:              48 89 04 d5 00 21  movq   %rax,0x412100(,%rdx,8)    
<x>
                            41 00
    main+0xcc:              49 63 d4           movslq %r12d,%rdx
    main+0xcf:              0f bc 04 d5 60 1e  bsfl   0x411e60(,%rdx,8),%eax    
<a>
                            41 00

so it's not surprising that calling lowbit will be slower, even if it's assembly- just because of the function call (yes, there's -finline-functions, butthat's not realistic for the kernel).


Avg ffs time = 62 nsec
Avg lowbit time = 82 nsec

The interesting part is really that I see those calls to (inefficient) ffs withdtrace. To get an idea of the amount of function calls, I provided a comparisonwith swtch() calls:

r...@haggis:~# dtrace -n 'fbt::ffs:entry { @ffs=count(); } fbt::swtch:entry {@swtch=count(); }'

dtrace: description 'fbt::ffs:entry ' matched 2 probes
^C

            14430
            32989

Note: I suspect this is Xen specific...

r...@haggis:~# uname -a
SunOS haggis 5.11 snv_111 i86pc i386 i86xpv

I am really just playing with this stuff, and I really don't want to judge onhow much of a difference this would make for the system as a whole, but it seemslike a sensible thing to


 - either replace ffs() with an optimized version or
 - let the compiler optimize ffs()

Would anyone be interested in spending the 30 minutes it takes to integratethis? ;-) [I know it probably needs more effort, yes]


Thanks, Nils
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Re: [osol-code] ffs vs. lowbit

Reply via email to