Hi David, On Tue, Jan 14, 2020 at 9:36 AM David Fetter <da...@fetter.org> wrote: > > Folks, > > The recent patch for distinct windowing aggregates contained a partial > fix of the FIXME that didn't seem entirely right, so I extracted that > part, changed it to use compiler intrinsics, and submit it here.
The changes in hash AM and SIMPLEHASH do look like a net positive improvement. My biggest cringe might be in pg_bitutils: > diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h > index 498e532308..cc9338da25 100644 > --- a/src/include/port/pg_bitutils.h > +++ b/src/include/port/pg_bitutils.h > @@ -145,4 +145,32 @@ pg_rotate_right32(uint32 word, int n) > return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n)); > } > > +/* ceil(lg2(num)) */ > +static inline uint32 > +ceil_log2_32(uint32 num) > +{ > + return pg_leftmost_one_pos32(num-1) + 1; > +} > + > +static inline uint64 > +ceil_log2_64(uint64 num) > +{ > + return pg_leftmost_one_pos64(num-1) + 1; > +} > + > +/* calculate first power of 2 >= num > + * per https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2 > + * using BSR where available */ > +static inline uint32 > +next_power_of_2_32(uint32 num) > +{ > + return ((uint32) 1) << (pg_leftmost_one_pos32(num-1) + 1); > +} > + > +static inline uint64 > +next_power_of_2_64(uint64 num) > +{ > + return ((uint64) 1) << (pg_leftmost_one_pos64(num-1) + 1); > +} > + > #endif /* PG_BITUTILS_H */ > 1. Is ceil_log2_64 dead code? 2. The new utilities added here (ceil_log2_32 and company, next_power_of_2_32 and company) all require num > 1, but don't clearly Assert (or at the very least document) so. 3. A couple of the callers can actively pass in an argument of 1, e.g. from _hash_spareindex in hashutil.c, while some other callers are iffy at best (simplehash.h maybe?) 4. It seems like you *really* would like an operation like LZCNT in x86 (first appearing in Haswell) that is well defined on zero input. ISTM the alternatives are: a) Special case 1. That seems straightforward, but the branching cost on a seemingly unlikely condition seems to be a lot of performance loss b) Use architecture specific intrinsic (and possibly with CPUID shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ intrinsic elsewhere. The CLZ GCC intrinsic seems to map to instructions that are well defined on zero in most ISA's other than x86, so maybe we can get away with special-casing x86? Cheers, Jesse