Hi,

Not sure if this can be of any interest, but I noticed the ffs function is 
implemented using a loop.

A faster implementation exists using the "pcount" assembler instruction (where 
available and fast - this does not seem to be the case on all processors), or 
through another algorithm using only a few instructions.

Check out: http://supertech.csail.mit.edu/papers/debruijn.pdf

For a 32-bit word, the optimized function could be written this way:

int
ffs(int field)
{
        static const int index[] = { 1, 2, 29, 3, 30, 15, 25, 4, 31, 23, 21, 
16, 26, 18, 5, 9, 32, 28, 14, 24, 22, 20, 17, 8, 27, 13, 19, 7, 12, 6, 11, 10 };
        unsigned int w = field;
        if (w == 0)
                return (0);
        w &= -w;
        w *= 125613361U;
        w >>= 27;
        return index[w];
}

Regards,
Olivier.
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to