Tom Lane wrote:
Stefan Kaltenbrunner <ste...@kaltenbrunner.cc> writes:
I'm currently testing SR/HS in 9.0beta1 and I noticed that it seems quite easy to end up in a situation where you have a standby that seems to be stuck in:

$ psql -p 5433
psql: FATAL:  the database system is shutting down

but not not actually shuting down ever. I ran into that a few times now (mostly because I'm trying to chase a recovery issue I hit during earlier testing) by simply having the master iterate between a pgbench run and "idle" while simple doing pg_ctl restart in a loop on the standby. I do vaguely recall some discussions of that but I thought the issue git settled somehow?

Hm, I haven't pushed this hard but "pg_ctl stop" seems to stop the
standby for me.  Which subprocesses of the slave postmaster are still
around?  Could you attach to them with gdb and get stack traces?

it is not always failing to shut down - it only fails sometimes - I have not exactly pinpointed yet what it is causing this but the standby is in a weird state now:

* the master is currently idle
* the standby has no connections at all

logs from the standby:

FATAL:  the database system is shutting down
FATAL:  the database system is shutting down
FATAL:  replication terminated by primary server
LOG:  restored log file "000000010000001900000054" from archive
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No such file or directory
LOG:  record with zero length at 19/55000078
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No such file or directory FATAL: could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "localhost" and accepting
                TCP/IP connections on port 5432?
        could not connect to server: Connection refused
                Is the server running on host "localhost" and accepting
                TCP/IP connections on port 5432?
        
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No such file or directory cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No such file or directory
LOG:  streaming replication successfully connected to primary
FATAL:  the database system is shutting down


the first two "FATAL: the database system is shutting down" are from me trying to connect using psql after i noticed that pg_ctl failed to shutdown the slave. The next thing I tried was restarting the master - which lead to the following logs and the standby noticing that and reconnecting but you cannot actually connect...

process tree for the standby is:

29523 pts/2 S 0:00 /home/postgres9/pginst/bin/postgres -D /mnt/space/pgdata_standby 29524 ? Ss 0:06 \_ postgres: startup process waiting for 000000010000001900000055 29529 ? Ss 0:00 \_ postgres: writer process 29835 ? Ss 0:00 \_ postgres: wal receiver process streaming 19/55000078



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to