Hi, On 2025-01-08 16:01:19 -0600, Nathan Bossart wrote: > On Wed, Jan 08, 2025 at 03:07:24PM -0500, Andres Freund wrote: > > Out of curiosity, have you measured whether this has a positive effect > > without > > pg_stat_statements? I think it'll e.g. also affect lwlocks, as they also use > > perform_spin_delay(). > > AFAICT TAS_SPIN() is only used for s_lock(), which doesn't appear to be > used by LWLocks.
Brainfart on my part, sorry. I was thinking of SPIN_DELAY() for a moment... > But I did retry my test from upthread without pg_stat_statements and was > surprised to find a reproducible 4-6% regression. Uh, huh. I assume this was readonly pgbench with 256 clients just as you had tested upthread? I don't think there's any hot spinlock meaningfully involved in that workload? A r/w workload is a different story, but upthread you mentioned select-only. Do you see any spinlock in profiles? > I'm not seeing any obvious differences in perf, but I do see that the thread > for adding TAS_SPIN() for PPC mentions a regression at lower contention > levels [0]. Perhaps the non-locked test is failing often enough to hurt > performance in this case... Whatever it is, it'll be mighty frustrating to > miss out on a > >7x gain because of a 4% regression. I don't think the explanation can be that simple - even with TAS_SPIN defined, we do try to acquire the lock once without using TAS_SPIN: #if !defined(S_LOCK) #define S_LOCK(lock) \ (TAS(lock) ? s_lock((lock), __FILE__, __LINE__, __func__) : 0) #endif /* S_LOCK */ Only s_lock() then uses TAS_SPIN(lock). I wonder if you're hitting an extreme case of binary-layout related effects? I've never seen them at this magnitude though. I'd suggest using either lld or mold as linker and comparing the numbers for a few -Wl,--shuffle-sections=$seed seed values. Greetings, Andres Freund