Hi all.

I have a task to create HA PostgreSQL cross datacenter cluster. I created a 
test bed on VirtualBox and I simulate different failures and disastrous in 
loops. And got one of funny bug of the Postgresql (11.4). One of the clusters 
has two instances of PostgreSQL on port 5433 and 5434. After simulating 
unexpected power off of the node the lock files of PostgreSQL still exist in 
/tmp directory. And on recovery the first instance on port 5433 starts without 
problem, but the second instance sometimes not. The reason is that the PID in 
/tmp/.s.PGSQL.5434.lock sometimes point to the one of process of PostgereSQL on 
port 5433, for instance the logger.

 1408 ?        S      0:00 /usr/pgsql-11/bin/postgres -D /var/lib/pgsql/krogan0a
 1415 ?        Ss     0:00  \_ postgres: logger
 1416 ?        Ss     0:02  \_ postgres: startup   recovering 
000000030000000000000007
 1417 ?        Ss     0:00  \_ postgres: checkpointer
 1418 ?        Ss     0:00  \_ postgres: background writer
 1419 ?        Ss     0:00  \_ postgres: stats collector
 1420 ?        Ss     0:10  \_ postgres: walreceiver   streaming 0/7269290

# cat /tmp/.s.PGSQL.5434.lock
1415
/var/lib/pgsql/krogan0b
1565355860
5434
/tmp

# sudo -u postgres /usr/pgsql-11/bin/pg_ctl -D /var/lib/pgsql/krogan0b start
waiting for server to start....2019-08-09 16:52:25.685 MSK [6673] LOG:  
listening on IPv4 address "0.0.0.0", port 5434
2019-08-09 16:52:25.685 MSK [6673] LOG:  listening on IPv6 address "::", port 
5434
2019-08-09 16:52:25.686 MSK [6673] LOG:  listening on Unix socket 
"/var/run/postgresql/.s.PGSQL.5434"
2019-08-09 16:52:25.686 MSK [6673] FATAL:  lock file "/tmp/.s.PGSQL.5434.lock" 
already exists
2019-08-09 16:52:25.686 MSK [6673] HINT:  Is another postmaster (PID 1415) 
using socket file "/tmp/.s.PGSQL.5434"?
2019-08-09 16:52:25.686 MSK [6673] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

So it will be good if postgres will check not only PID, but also that the 
different postgres is running on different or same port. And is there some 
workaround?

Reply via email to