Re: [PATCH] count-leading-zeros: new module
Hi Eric, I needed gcc's clz to determine the most significant bit of a number This new module is redundant w.r.t. the 'integer_length', 'integer_length_l', 'integer_length_ll' modules that are in gnulib already since last year: For x != 0: count_leading_zeros (x) = sizeof x * CHAR_BIT - integer_length (x). Personally I prefer integer_length over count_leading_zeros, because the value of integer_length does not depend on the type size: integer_length_ll ((unsigned long long) x) == integer_length (x) whereas count_leading_zeros_ll ((unsigned long long) x) == 32 + count_leading_zeros (x), count_leading_zeros is mostly a helper routine for people who implement multiprecision division or square root algorithms. Ondřej Bílka wrote: Currently fastest way is convert to float and extract exponent. This is always log2(n) or log2(n)+1 which can be easily repaired. Yes, and lib/integer_length.c already implements this trick. Eric Blake wrote: Unfortunately, your implementation is not portable. Gnulib can't afford to make assumptions about the layout of a floating point number, even if it holds true for most platforms with IEEE doubles. The code in lib/integer_length.c uses the layout information of floating- point numbers, after having verified that these assumptions hold (see m4/exponentd.m4). +#define TEST_COUNT_LEADING_ZEROS(FUNC, TYPE, BITS, MAX, ONE)\ + ASSERT (FUNC (0) == BITS);\ + for (i = 0; i BITS; i++)\ +{ \ + ASSERT (FUNC (ONE i) == BITS - i - 1); \ + for (j = 0; j i; j++) \ +ASSERT (FUNC ((ONE i) | (ONE j)) == BITS - i - 1);\ +} \ + for (i = 0; i 1000; i++)\ +{ \ + TYPE value = rand () ^ (rand () 31 1); \ + int count = 0;\ + for (j = 0; j BITS; j++)\ +if (value (ONE j)) \ + count = BITS - j - 1; \ + ASSERT (count == FUNC (value)); \ +} \ + ASSERT (FUNC (MAX) == 0); I try to avoid such multi-line macros, because they are hard to single-step under gdb. Better move this code to a separate .h file that can be #included 3 times. Bruno
Re: [PATCH] count-leading-zeros: new module
Eric Blake wrote: I needed gcc's clz to determine the most significant bit of a number (useful for things like truncating to a power of 2), and was surprised it is not a standardized function (the opposite direction of finding the least significant bit is ... +/* Expand the code which computes the number of leading zeros of the local + variable 'x' of type TYPE (an unsigned integer type) and returns it + from the current function. */ +#if 0 __GNUC__ 3 || (__GNUC__ == 3 __GNUC_MINOR__ = 4) +# define COUNT_LEADING_ZEROS(BUILTIN, TYPE) \ + return x ? BUILTIN (x) : CHAR_BIT * sizeof x; +#else +# define COUNT_LEADING_ZEROS(BUILTIN, TYPE) \ + /* This condition is written so as to avoid shifting by more than \ + 31 bits at once, and also avoids a random HP-UX cc bug. */\ + verify (((TYPE) -1 31 31 2) == 0); /* TYPE has at most 64 bits */ \ + int count = 0;\ + if (1 (TYPE) -1 31) { /* TYPE has more than 32 bits? */ \ +count = count_leading_zeros_32 (x 31 1); \ +if (count 32) \ + return count; \ + } \ + return count + count_leading_zeros_32 (x); + +/* Compute and return the number of leading zeros in the least + significant 32 bits of X. */ +static inline int +count_leading_zeros_32 (unsigned int x) +{ + int count = 0; + if (x 0xU) +x = 16; + else +count += 16; + if (x 0xff00) +x = 8; + else +count += 8; + if (x 0xf0) +x = 4; + else +count += 4; + if (x 0xc) +x = 2; + else +count += 2; + if (x 2) +x = 1; + else +count++; + if (!(x 1)) +count++; + return count; +} +#endif Hi Eric, Did you consider using a variant of the following? It looks like it would be more efficient. [From http://graphics.stanford.edu/~seander/bithacks.html] Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup uint32_t v; // find the log base 2 of 32-bit v int r; // result goes here static const int MultiplyDeBruijnBitPosition[32] = { 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 }; v |= v 1; // first round down[sic*] to one less than a power of 2 v |= v 2; v |= v 4; v |= v 8; v |= v 16; r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) 27]; The code above computes the log base 2 of a 32-bit integer with a small table lookup and multiply. It requires only 13 operations, compared to (up to) 20 for the previous method. The purely table-based method requires the fewest operations, but this offers a reasonable compromise between table size and speed. [*] I've just reported the error in that comment: s/down/up/
Re: [PATCH] count-leading-zeros: new module
On Sat, Aug 11, 2012 at 09:45:16AM +0200, Jim Meyering wrote: Eric Blake wrote: I needed gcc's clz to determine the most significant bit of a number (useful for things like truncating to a power of 2), and was surprised it is not a standardized function (the opposite direction of finding the least significant bit is ... +/* Expand the code which computes the number of leading zeros of the local + variable 'x' of type TYPE (an unsigned integer type) and returns it + from the current function. */ +#if 0 __GNUC__ 3 || (__GNUC__ == 3 __GNUC_MINOR__ = 4) +# define COUNT_LEADING_ZEROS(BUILTIN, TYPE) \ + return x ? BUILTIN (x) : CHAR_BIT * sizeof x; +#else +# define COUNT_LEADING_ZEROS(BUILTIN, TYPE) \ + /* This condition is written so as to avoid shifting by more than \ + 31 bits at once, and also avoids a random HP-UX cc bug. */\ + verify (((TYPE) -1 31 31 2) == 0); /* TYPE has at most 64 bits */ \ + int count = 0;\ + if (1 (TYPE) -1 31) { /* TYPE has more than 32 bits? */ \ +count = count_leading_zeros_32 (x 31 1); \ +if (count 32) \ + return count; \ + } \ + return count + count_leading_zeros_32 (x); + +/* Compute and return the number of leading zeros in the least + significant 32 bits of X. */ +static inline int +count_leading_zeros_32 (unsigned int x) +{ + int count = 0; + if (x 0xU) +x = 16; + else +count += 16; + if (x 0xff00) +x = 8; + else +count += 8; + if (x 0xf0) +x = 4; + else +count += 4; + if (x 0xc) +x = 2; + else +count += 2; + if (x 2) +x = 1; + else +count++; + if (!(x 1)) +count++; + return count; +} +#endif Hi Eric, Did you consider using a variant of the following? It looks like it would be more efficient. [From http://graphics.stanford.edu/~seander/bithacks.html] Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup uint32_t v; // find the log base 2 of 32-bit v int r; // result goes here static const int MultiplyDeBruijnBitPosition[32] = { 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 }; v |= v 1; // first round down[sic*] to one less than a power of 2 v |= v 2; v |= v 4; v |= v 8; v |= v 16; r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) 27]; The code above computes the log base 2 of a 32-bit integer with a small table lookup and multiply. It requires only 13 operations, compared to (up to) 20 for the previous method. The purely table-based method requires the fewest operations, but this offers a reasonable compromise between table size and speed. [*] I've just reported the error in that comment: s/down/up/ Hi, Currently fastest way is convert to float and extract exponent. This is always log2(n) or log2(n)+1 which can be easily repaired. Here is implementation: #include stdint.h inline int clz64(int64_t x){ if (x0) return 0; union convert {uint64_t i;double f;} un; un.f=(double) x; int64_t ret= (un.i (64-12)) - 1023; return 63-(ret - 1 + (xret)); } inline int clz32(int32_t x){ if (x0) return 0; union convert {uint32_t i;float f;} un; un.f=(float) x; int ret=(un.i (32-9)) - 127; return 31-(ret - 1 + (xret)); } inline int clz32_alt(uint32_t x){ union convert {uint64_t i;double f;} un; un.f=(double) x; int ret =(un.i (64-12)) - 1023; return 31 - ret; } gcc messes this code a bit, it generates following code: cvtsi2sdq %rdi, %xmm0 movsd %xmm0, -8(%rsp) movq -8(%rsp), %rcx -- the printer thinks its a router.
Re: [PATCH] count-leading-zeros: new module
On 08/11/2012 04:23 AM, Ondřej Bílka wrote: Did you consider using a variant of the following? It looks like it would be more efficient. The fallback code is only for non-gcc compilation in the first place, but yes, the de Bruijn table lookup does seem nicer; I'll switch over to using it. Currently fastest way is convert to float and extract exponent. This is always log2(n) or log2(n)+1 which can be easily repaired. Unfortunately, your implementation is not portable. Gnulib can't afford to make assumptions about the layout of a floating point number, even if it holds true for most platforms with IEEE doubles. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH] count-leading-zeros: new module
On 08/11/2012 01:45 AM, Jim Meyering wrote: Eric Blake wrote: I needed gcc's clz to determine the most significant bit of a number (useful for things like truncating to a power of 2), and was surprised it is not a standardized function Did you consider using a variant of the following? It looks like it would be more efficient. [From http://graphics.stanford.edu/~seander/bithacks.html] Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup Pushed as follows: From d22f151db0e5217913ec1667a3b4e354dd88a973 Mon Sep 17 00:00:00 2001 From: Eric Blake ebl...@redhat.com Date: Sat, 11 Aug 2012 07:34:00 -0600 Subject: [PATCH] count-leading-zeros: use a lookup table on non-gcc compilers While this only affects non-gcc compilers, we can assume that lookups are faster than conditionals, even if it results in a slightly larger executable size. * lib/count-leading-zeros.h (count_leading_zeros_32): Use an alternate implementation, suggested by Jim Meyering. Signed-off-by: Eric Blake ebl...@redhat.com --- ChangeLog | 6 ++ lib/count-leading-zeros.h | 41 - 2 files changed, 22 insertions(+), 25 deletions(-) diff --git a/ChangeLog b/ChangeLog index 80d1b61..2b7090b 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +2012-08-11 Eric Blake ebl...@redhat.com + + count-leading-zeros: use a lookup table on non-gcc compilers + * lib/count-leading-zeros.h (count_leading_zeros_32): Use an + alternate implementation, suggested by Jim Meyering. + 2012-08-10 Eric Blake ebl...@redhat.com count-leading-zeros: new module diff --git a/lib/count-leading-zeros.h b/lib/count-leading-zeros.h index 4d7c291..cbb3264 100644 --- a/lib/count-leading-zeros.h +++ b/lib/count-leading-zeros.h @@ -26,7 +26,7 @@ /* Expand the code which computes the number of leading zeros of the local variable 'x' of type TYPE (an unsigned integer type) and returns it from the current function. */ -#if 0 __GNUC__ 3 || (__GNUC__ == 3 __GNUC_MINOR__ = 4) +#if __GNUC__ 3 || (__GNUC__ == 3 __GNUC_MINOR__ = 4) # define COUNT_LEADING_ZEROS(BUILTIN, TYPE) \ return x ? BUILTIN (x) : CHAR_BIT * sizeof x; #else @@ -47,30 +47,21 @@ static inline int count_leading_zeros_32 (unsigned int x) { - int count = 0; - if (x 0xU) -x = 16; - else -count += 16; - if (x 0xff00) -x = 8; - else -count += 8; - if (x 0xf0) -x = 4; - else -count += 4; - if (x 0xc) -x = 2; - else -count += 2; - if (x 2) -x = 1; - else -count++; - if (!(x 1)) -count++; - return count; + /* http://graphics.stanford.edu/~seander/bithacks.html */ + static const char deBruijnLookup[32] = { +0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, +8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 + }; + + x = 0xU; + if (!x) +return 32; + x |= x 1; + x |= x 2; + x |= x 4; + x |= x 8; + x |= x 16; + return 31 - deBruijnLookup[(x * 0x07c4acddU) 27]; } #endif -- 1.7.11.2 -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH] count-leading-zeros: new module
Eric Blake ebl...@redhat.com writes: +/* Expand the code which computes the number of leading zeros of the local + variable 'x' of type TYPE (an unsigned integer type) and returns it + from the current function. */ +#if 0 __GNUC__ 3 || (__GNUC__ == 3 __GNUC_MINOR__ = 4) Do you really want 0 there?