Scott Frazer wrote:
> Hi, we have a Postgres 9.6 setup using replication that has recently started
> seeing a lot of processes stuck in
> "SubtransControlLock" as a wait_event on the read-replicas. Like this, only
> usually about 300-800 of them:
> 179706 | LWLockNamed | SubtransControlLock
> 186602 | LWLockNamed | SubtransControlLock
> 186606 | LWLockNamed | SubtransControlLock
> 180947 | LWLockNamed | SubtransControlLock
> 186621 | LWLockNamed | SubtransControlLock
> The server then begins to crawl, with some queries just never finishing until
> I finally shut the server down.
> Searching for that particular combo of wait_event_type and wait_event only
> seems to turn up the page
> about statistics collection, but no helpful information on troubleshooting
> this lock.
> Restarting the replica server clears the locks and allows us to start working
> again, but it's happened
> twice now in 12 hours and I'm worried it will happen again.
> Does anyone have any advice on where to start looking?
I don't think there is any connection to statistics collection.
This lock is used when subtransactions (SAVEPOINTs in SQL or EXCEPTION blocks
in PL/pgSQL) are created, read or removed.
This sounds like a PostgreSQL bug.
What is the exact PostgreSQL version you are running?
It would be cool if you could get a stack trace from the backend that is
holding the lock.