Hi Tim,
in fact I was trying the OR-alternative -- however, it's only a win on older
AMD Opterons (16 cycles vs. 20), but cannot beat the __builtin_clz alternative
on Intel.
Best regards,
Rainer
On Wednesday 12 October 2011 11:26:52 Tim Mattox wrote:
> All,
> If you wanted to speedup these ro
All,
If you wanted to speedup these routines for processors without __builtin_clz,
there are a variety of variations in C to implement clz efficiently.
See Hacker's Delight nlz (number of leading zeros):
http://www.hackersdelight.org/HDcode/nlz.c.txt
Or from my Ph.D. advisor's magic algorithm's pa