On Sun, Jan 22, 2023, at 11:06, Dean Rasheed wrote: > Seems like a reasonable idea, with some pretty decent gains. > > Note, however, that for a divisor having fewer than 5 or 6 digits, > it's now significantly slower because it's forced to go through > div_var_int64() instead of div_var_int() for all small divisors. So > the var2ndigits <= 2 case needs to come first.
Can you give a measurable example of when the patch the way it's written is significantly slower for a divisor having fewer than 5 or 6 digits, on some platform? I can't detect any difference at all at my MacBook Pro M1 Max for this example: EXPLAIN ANALYZE SELECT count(numeric_div_volatile(1,3333)) FROM generate_series(1,1e8); I did write the code like you suggest first, but changed it, since I realised the extra "else if" needed could be eliminated, and thought div_var_int64() wouldn't be slower than div_var_int() since I thought 64-bit instructions in general are as fast as 32-bit instructions, on 64-bit platforms. I'm not suggesting your claim is incorrect, I'm just trying to understand and verify it experimentally. > The implementation of div_var_int64() should be in an #ifdef HAVE_INT128 > block. > > In div_var_int64(), s/ULONG_MAX/PG_UINT64_MAX/ OK, thanks, I'll fix, but I'll await your feedback first on the above. /Joel