This morning with our production database I began receiving reports
of the database being "down".
I checked the log and was surprised to see extremely long durations for a
LISTEN that happens after each connection is made by our database library.
This coincided with many(approx 600) new connections happening in a short time
window due to render nodes automatically being turned on when the first job of
the morning was submitted(idle nodes are turned off to save power). My
initial hunch was that there was some code in postgres that resulted
exponential execution time if enough listens on new connections happened at
the same time. As I was trying to gather more information the listen times
began to decrease and after about 20 minutes things were back to normal.
A few hours later the symptoms returned but this time the listen was taking
upwards of 15 minutes. I did some more reading and checked the pg_notify
directory and found that there were 49 files. Having never checked that
directory before I wasn't sure if that was normal. A short time later I
noticed there was a process sitting idle with an aborted transaction. After
killing the process things quickly returned to normal.
After doing a search for "idle in transaction (aborted)" I came upon
While this is a potential solution for dealing with the problem it seems that
the postgresql developers have decided to let connections stay in the "idle in
transaction (aborted)" state for a reason, most likely under the assumption
that it's relatively safe and only eats up the resources of a single
connection. However it's easy to demonstrate that doing:
select bad_column from non_existant_table;
...will eventually cause a denial of service situation if the DBA hasn't setup
guards against connection sitting idle in an aborted transaction.
Looking at the code in src/backend/commands/async.c I think there are a couple
ways to eliminate this problem.
1. When a connection issues it's first LISTEN command, in Exec_ListenPreCommit
QUEUE_BACKEND_POS(MyBackendId) = QUEUE_TAIL;
this causes the connection to iterate through every notify queued in the slru,
even though at that point I believe the connection can safely ignore any
notifications from transactions that are already committed, and if I
understand correctly notifications aren't put into the slru until precommit,
so wouldn't it be safe to do:
QUEUE_BACKEND_POS(MyBackendId) = QUEUE_HEAD;
inside Async_Listen? If that's not safe, then could a new member be added to
AsyncQueueControl that points to the first uncommitted QueuePosition (wouldn't
have to be kept completely up to date).
This would solve the problem of slow initial LISTEN in the face of a growing
2. Would it be possible when a backend is signaled to check if it is idle
inside an aborted transaction, and if so process the notifications and put any
that match what the backend is listening on into a local list. This would
allow the slru to be cleaned up. In practice I think most notifications would
either be disregarded or combined with a duplicate, so the list would most
likely end up staying very small. If the backend local list does grow too
large then the connection could be killed or handled in some other appropriate
I am happy to attempt coming up with a patch if the ideas are deemed
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: