With that fix applied to REL_17_5 things are working well. Limiting the
search sounds like an improvement too.

As an experiment I added a log for when semget in
InternalIpcSemaphoreCreate returns -1. When I'm running `pg_ctl init` for
this local build concurrently with `pg_ctl init` from PostgreSQL 15 (or
another version prior to 17), I saw ~8 logged failures when there was
contention. As I increased the concurrency, the maximum number of logged
failures looked to be ~8 times concurrency, roughly. For me, then, running
`pg_ctl init` with a concurrency of 125 would be needed to even begin
exceeding the max retries of 1000 – in the worst case. That sounds high
enough.

Then I thought: I'm only seeing the log from one of those instances, yet
they're all going through the same search for free semaphore sets. That's a
few system calls going to waste. Maybe not important in the big picture,
but it gave me an idea to left shift nextSemaKey in PGReserveSemaphores,
i.e. `nextSemaKey = statbuf.st_ino << 4`, to give each pg_ctl process a few
guaranteed uncontested keys (at least, uncontested between themselves). In
a small test this eliminated contention for semaphore sets due to
concurrency. It is more of an optimisation though, rather than a bug fix.

Gavin

Reply via email to