On Wed, Aug 13, 2025 at 9:42 AM Greg Burd <g...@burd.me> wrote: > Amazing, thank you. I'll try to replicate your tests tomorrow to see if > my optimized division and modulo functions do in fact help or not. I > realize that both you and Anders are (rightly) concerned that the > performance impact of IDIV on some CPUs can be excessive.
At the risk of posting untested crackpot theories on the internet, I wonder if there is a way to use a simple boundary condition and subtraction for this. If you correct overshoot compared to an advancing-in-strides base value, then I wonder how often you'd finish up having to actually do that under concurrency. Obviously in general, implementing modulo with subtraction is a terrible idea, but can you make it so that the actual cost works out as mostly 0, rarely 1 and exceedingly rarely more than 1 subtraction loops? If that's true, do the branches somehow kill you? Assume for now that we're OK with keeping % and / for the infrequent calls to StrategySyncStart(), or we can redefinine the bgwriter's logic so that it doesn't even need those (perhaps what it really wants to know is its total distance behind the allocator, so perhaps we can define that problem away? haven't thought about that yet...). What I'm wondering out loud is whether the hot ClockSweepTick() code might be able to use something nearly as dumb as this... /* untested pseudocode */ ticks_base = pg_atomic_read_u64(&x->ticks_base); ticks = pg_atomic_fetch_add_u64(&x->ticks, 1); hand = ticks - ticks_base; /* * Compensate for overshoot. Expected number of loops: none most of the * time, one when we overshoot, and maybe more if the system gets * around the whole clock before we see the base value advance. */ while (hand >= NBuffers) { /* Base value advanced by backend that overshoots by one tick. */ if (hand == NBuffers) pg_atomic_fetch_add_u64(&StrategyControl->ticks_base, NBuffers); hand -= NBuffers; }