Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25270

2011-10-14 Thread Rainer Keller
Hi Tim, in fact I was trying the OR-alternative -- however, it's only a win on older AMD Opterons (16 cycles vs. 20), but cannot beat the __builtin_clz alternative on Intel. Best regards, Rainer On Wednesday 12 October 2011 11:26:52 Tim Mattox wrote: > All, > If you wanted to speedup these ro

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25270

2011-10-12 Thread Tim Mattox
All, If you wanted to speedup these routines for processors without __builtin_clz, there are a variety of variations in C to implement clz efficiently. See Hacker's Delight nlz (number of leading zeros): http://www.hackersdelight.org/HDcode/nlz.c.txt Or from my Ph.D. advisor's magic algorithm's pa