[HACKERS] Active zombies at AIX

Konstantin Knizhnik Tue, 24 Jan 2017 07:16:15 -0800

Hi hackers,

Yet another story about AIX. For some reasons AIX very slowly cleaningzombie processes.If we launch pgbench with -C parameter then very soon limit for maximalnumber of connections is exhausted.If maximal number of connection is set to 1000, then after ten secondsof pgbench activity we get about 900 zombie processes and it takes about100 seconds (!)

before all of them are terminated.


proctree shows a lot of defunt processes:

[14:44:41]root@postgres:~ # proctree 26084446
26084446 /opt/postgresql/xlc/9.6/bin/postgres -D /postg_fs/postgresql/xlc
4784362 <defunct>
4980786 <defunct>
11403448 <defunct>
11468930 <defunct>
11993176 <defunct>
12189710 <defunct>
12517390 <defunct>
13238374 <defunct>
13565974 <defunct>
13893826 postgres: wal writer process
14024716 <defunct>
15401000 <defunct>
...
25691556 <defunct>

But ps shows that status of process is <existing>

[14:46:02]root@postgres:~ # ps -elk | grep 25691556

 * A - 25691556 - - - - - <exiting>

Breakpoint set in reaper() function in postmaster shows that eachinvocation of this functions (called by SIGCHLD handler) proceed 5-10PIDS per invocation.So there are two hypothesis: either AIX is very slowly deliveringSIGCHLD to parent, either exit of process takes too much time.

The fact the backends are in exiting state makes second hypothesis morereliable.We have tried different Postgres configurations with local and TCPsockets, with different amount of shared buffers and built both with gccand xlc.

In all cases behavior is similar: zombies do not want to die.

As far as it is not possible to attach debugger to defunct process, itis not clear how to understand what's going on.

I wonder if somebody has encountered similar problems at AIX and may becan suggest some solution to solve this problem.

Thanks in advance

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Active zombies at AIX

Reply via email to