On Wed, Jan 08, 2025 at 05:25:24PM -0500, Andres Freund wrote: > On 2025-01-08 16:01:19 -0600, Nathan Bossart wrote: >> But I did retry my test from upthread without pg_stat_statements and was >> surprised to find a reproducible 4-6% regression. > > Uh, huh. I assume this was readonly pgbench with 256 clients just as you had > tested upthread? I don't think there's any hot spinlock meaningfully involved > in that workload? A r/w workload is a different story, but upthread you > mentioned select-only. > > Do you see any spinlock in profiles?
Yes, this was using 256 clients. Looking closer, I don't see anything spinlock related anywhere near the top of perf. >> I'm not seeing any obvious differences in perf, but I do see that the thread >> for adding TAS_SPIN() for PPC mentions a regression at lower contention >> levels [0]. Perhaps the non-locked test is failing often enough to hurt >> performance in this case... Whatever it is, it'll be mighty frustrating to >> miss out on a >> >7x gain because of a 4% regression. > > I don't think the explanation can be that simple - even with TAS_SPIN defined, > we do try to acquire the lock once without using TAS_SPIN: > > #if !defined(S_LOCK) > #define S_LOCK(lock) \ > (TAS(lock) ? s_lock((lock), __FILE__, __LINE__, __func__) : 0) > #endif /* S_LOCK */ > > Only s_lock() then uses TAS_SPIN(lock). Ah, right. FWIW I tried setting a cap on the number of times we do a non-locked test, and the results still showed the regression, which seems to match your intuition here. > I wonder if you're hitting an extreme case of binary-layout related effects? > I've never seen them at this magnitude though. I'd suggest using either lld > or mold as linker and comparing the numbers for a few > -Wl,--shuffle-sections=$seed seed values. Will do. -- nathan