Today I had an emergency production outage on a server. This
particular server was running 9.5.2. The symptoms were interesting
so I thought I'd report. Here is what I saw:
*) User CPU was pegged 100%
*) Queries reading data would block and not respond to cancel or terminate
*) pg_stat_activity reported no waiting queries (but worked fine otherwise).
Adding all this up it smells like processes were getting stuck on a spinlock.
Connections quickly got eaten up and situation was desperately urgent
so I punted and did an immediate restart and things came back
normally. I had a console to the database and did manage to grab
contents of pg_stat_activity and noticed several trivial queries were
running normally (according to pg_stat_activity) but were otherwise
stuck. Attempting to run one of them myself, I noted query got stuck
and did not cancel. I was in a terrible rush but am casting around
for stuff to grab out in case that happens again -- 'perf top' would
be a natural choice I guess.
Three autovacuum processes were running. Obviously going to do bugfix
upgrade but was wondering if anybody has seen anything like this.
This particular server was upgraded to 9.5 somewhat recently but ran
on 9.2 for years with no issues.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: