Yet another story about AIX. For some reasons AIX very slowly cleaning
If we launch pgbench with -C parameter then very soon limit for maximal
number of connections is exhausted.
If maximal number of connection is set to 1000, then after ten seconds
of pgbench activity we get about 900 zombie processes and it takes about
100 seconds (!)
before all of them are terminated.
proctree shows a lot of defunt processes:
[14:44:41]root@postgres:~ # proctree 26084446
26084446 /opt/postgresql/xlc/9.6/bin/postgres -D /postg_fs/postgresql/xlc
13893826 postgres: wal writer process
But ps shows that status of process is <existing>
[14:46:02]root@postgres:~ # ps -elk | grep 25691556
* A - 25691556 - - - - - <exiting>
Breakpoint set in reaper() function in postmaster shows that each
invocation of this functions (called by SIGCHLD handler) proceed 5-10
PIDS per invocation.
So there are two hypothesis: either AIX is very slowly delivering
SIGCHLD to parent, either exit of process takes too much time.
The fact the backends are in exiting state makes second hypothesis more
We have tried different Postgres configurations with local and TCP
sockets, with different amount of shared buffers and built both with gcc
In all cases behavior is similar: zombies do not want to die.
As far as it is not possible to attach debugger to defunct process, it
is not clear how to understand what's going on.
I wonder if somebody has encountered similar problems at AIX and may be
can suggest some solution to solve this problem.
Thanks in advance
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company