On Mon, Sep 16, 2024 at 4:19 PM Tomas Vondra <to...@vondra.me> wrote: > On 9/16/24 15:11, Jakub Wartak wrote: > > On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra <to...@vondra.me> wrote: > > > >> [..] > > > >> Anyway, at this point I'm quite happy with this improvement. I didn't > >> have any clear plan when to commit this, but I'm considering doing so > >> sometime next week, unless someone objects or asks for some additional > >> benchmarks etc. > > > > Thank you very much for working on this :) > > > > The only fact that comes to my mind is that we could blow up L2 > > caches. Fun fact, so if we are growing PGPROC by 6.3x, that's going to > > be like one or two 2MB huge pages more @ common max_connections=1000 > > x86_64 (830kB -> ~5.1MB), and indeed: [..] > > then maybe(?) one could observe further degradation of dTLB misses in > > the perf-stat counter under some microbenchmark, but measuring that > > requires isolated and physical hardware. Maybe that would be actually > > noise due to overhead of context-switches itself. Just trying to think > > out loud, what big PGPROC could cause here. But this is already an > > unhealthy and non-steady state of the system, so IMHO we are good, > > unless someone comes up with a better (more evil) idea. > > > > I've been thinking about such cases too, but I don't think it can really > happen in practice, because: > > - How likely is it that the sessions will need a lot of OIDs, but not > the same ones? Also, why would it matter that the OIDs are not the same, > I don't think it matters unless one of the sessions needs an exclusive > lock, at which point the optimization doesn't really matter. > > - If having more fast-path slots means it doesn't fit into L2 cache, > would we fit into L2 without it? I don't think so - if there really are > that many locks, we'd have to add those into the shared lock table, and > there's a lot of extra stuff to keep in memory (relcaches, ...). > > This is pretty much one of the cases I focused on in my benchmarking, > and I'm yet to see any regression.
Sorry for answering this so late. Just for context here: I was imagining a scenario with high max_connections about e.g. schema-based multi-tenancy and no partitioning (so all would be fine without this $thread/commit ; so under 16 (fast)locks would be taken). The OIDs need to be different to avoid contention: so that futex() does not end up really in syscall (just user-space part). My theory was that a much smaller PGPROC should be doing much less (data) cache-line fetches than with-the-patch. That hash() % prime , hits various parts of a larger array - so without patch should be quicker as it wouldn't be randomly hitting some larger array[], but it might be noise as you state. It was a theoretical attempt at crafting the worst possible conditions for the patch, so feel free to disregard as it already assumes some anti-pattern (big & all active max_connections). > > Well the only thing I could think of was to add to the > > doc/src/sgml/config.sgml / "max_locks_per_transaction" GUC, that "it > > is also used as advisory for the number of groups used in > > lockmanager's fast-path implementation" (that is, without going into > > further discussion, as even pg_locks discussion > > doc/src/sgml/system-views.sgml simply uses that term). > > > > Thanks, I'll consider mentioning this in max_locks_per_transaction. > Also, I think there's a place calculating the amount of per-connection > memory, so maybe that needs to be updated too. > I couldn't find it in current versions, but maybe that's helpful/reaffirming: - up to 9.2. there were exact formulas used, see "(1800 + 270 * max_locks_per_transaction) * max_connections" [1] , that's a long time gone now. - if anything then Andres might want to improve a little his blog entry: [1] (my take is that is seems to be the most accurate and authoritative technical information that we have online) -J. [1] - https://www.postgresql.org/docs/9.2/kernel-resources.html [2] - https://blog.anarazel.de/2020/10/07/measuring-the-memory-overhead-of-a-postgres-connection/