> On 22.12.2009 23:05, Tomasz Chmielewski wrote: > > I followed the http://linuxsilo.net/articles/postgresql-pgpool.html to set > > up pgpool-ii replication. > > > > > > When I detach and recover a node with these commands: > > > > # pcp_detach_node -d 240 127.0.0.1 9898 user pass 1 > > # pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1 > > > > > > I can observer the following on node 1 in postgres logs - : > > > > 2009-12-23 06:03:15 SGT LOG: database system was interrupted; last known > > up at 2009-12-23 06:03:12 SGT > > 2009-12-23 06:03:15 SGT LOG: starting archive recovery > > 2009-12-23 06:03:15 SGT LOG: restore_command = '/usr/bin/scp > > db10:/var/lib/postgresql/8.3/main/pg_xlog_archive/%f %p' > > scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/00000002.history: No such > > file or directory > > > > Because of these errors, recovery sometimes fails. > > > > How does postgres on the node which is recovered determines the %f files it > > needs to copy? > > OK, I see it's normal that it asks for files which are not present: > > http://developer.postgresql.org/pgdocs/postgres/continuous-archiving.html > > It is important for the command to return a zero exit status if and > only if it succeeds. The command will be asked for file names that > are not present in the archive; it must return nonzero when so > asked. > > > However, postgres on recovered node fails to start if it finds no files to > copy, i.e.: > > 2009-12-23 06:21:40 SGT LOG: database system was shut down at 2009-12-23 > 06:21:36 SGT > 2009-12-23 06:21:40 SGT LOG: starting archive recovery > 2009-12-23 06:21:40 SGT LOG: restore_command = '/usr/bin/scp > db10:/var/lib/postgresql/8.3/main/pg_xlog_archive/%f %p' > scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/00000003.history: No such > file or directory > scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/000000030000000000000063: > No such file or directory > 2009-12-23 06:21:40 SGT LOG: could not open file > "pg_xlog/000000030000000000000063" (log file 0, segment 99): No such file or > directory > 2009-12-23 06:21:40 SGT LOG: invalid primary checkpoint record > 2009-12-23 06:21:40 SGT LOG: incomplete startup packet > scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/000000030000000000000063: > No such file or directory > 2009-12-23 06:21:40 SGT LOG: could not open file > "pg_xlog/000000030000000000000063" (log file 0, segment 99): No such file or > directory > 2009-12-23 06:21:40 SGT LOG: invalid secondary checkpoint record > 2009-12-23 06:21:40 SGT PANIC: could not locate a valid checkpoint record > 2009-12-23 06:21:40 SGT LOG: startup process (PID 24196) was terminated by > signal 6: Aborted > 2009-12-23 06:21:40 SGT LOG: aborting startup due to startup process failure > > > To reproduce: > > 1) on a failed node, do: > > tail -f /var/log/postgresql/postgresql-8.3-main.log > > > 2) start pcp_recovery_node, pcp_detach_node and then pcp_recovery_node again: > > pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1 > pcp_detach_node -d 240 127.0.0.1 9898 user pass 1 > pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1 > > The log on node 1 will show postgres startup failure; pcp_recovery_node will > "hang" until it times out. > > Is it expected?
I don't understand Spanish so, I'm not sure I read following URL correctly but... http://linuxsilo.net/articles/postgresql-pgpool.html I noticed in the article "base-backup" script does this: $LOGGER "Rsyncing directory pg_xlog" $RSYNC $SRC_DATA/pg_xlog/ $DST_HOST:$DST_DATA/pg_xlog/ I think this is not neccesary and probably not good. Instead, you would want to clear $DST_HOST:$DST_DATA/pg_xlog/*. -- Tatsuo Ishii SRA OSS, Inc. Japan _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
