On 22.12.2009 23:05, Tomasz Chmielewski wrote: > I followed the http://linuxsilo.net/articles/postgresql-pgpool.html to set up > pgpool-ii replication. > > > When I detach and recover a node with these commands: > > # pcp_detach_node -d 240 127.0.0.1 9898 user pass 1 > # pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1 > > > I can observer the following on node 1 in postgres logs - : > > 2009-12-23 06:03:15 SGT LOG: database system was interrupted; last known up > at 2009-12-23 06:03:12 SGT > 2009-12-23 06:03:15 SGT LOG: starting archive recovery > 2009-12-23 06:03:15 SGT LOG: restore_command = '/usr/bin/scp > db10:/var/lib/postgresql/8.3/main/pg_xlog_archive/%f %p' > scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/00000002.history: No such > file or directory
> Because of these errors, recovery sometimes fails. > > How does postgres on the node which is recovered determines the %f files it > needs to copy? OK, I see it's normal that it asks for files which are not present: http://developer.postgresql.org/pgdocs/postgres/continuous-archiving.html It is important for the command to return a zero exit status if and only if it succeeds. The command will be asked for file names that are not present in the archive; it must return nonzero when so asked. However, postgres on recovered node fails to start if it finds no files to copy, i.e.: 2009-12-23 06:21:40 SGT LOG: database system was shut down at 2009-12-23 06:21:36 SGT 2009-12-23 06:21:40 SGT LOG: starting archive recovery 2009-12-23 06:21:40 SGT LOG: restore_command = '/usr/bin/scp db10:/var/lib/postgresql/8.3/main/pg_xlog_archive/%f %p' scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/00000003.history: No such file or directory scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/000000030000000000000063: No such file or directory 2009-12-23 06:21:40 SGT LOG: could not open file "pg_xlog/000000030000000000000063" (log file 0, segment 99): No such file or directory 2009-12-23 06:21:40 SGT LOG: invalid primary checkpoint record 2009-12-23 06:21:40 SGT LOG: incomplete startup packet scp: /var/lib/postgresql/8.3/main/pg_xlog_archive/000000030000000000000063: No such file or directory 2009-12-23 06:21:40 SGT LOG: could not open file "pg_xlog/000000030000000000000063" (log file 0, segment 99): No such file or directory 2009-12-23 06:21:40 SGT LOG: invalid secondary checkpoint record 2009-12-23 06:21:40 SGT PANIC: could not locate a valid checkpoint record 2009-12-23 06:21:40 SGT LOG: startup process (PID 24196) was terminated by signal 6: Aborted 2009-12-23 06:21:40 SGT LOG: aborting startup due to startup process failure To reproduce: 1) on a failed node, do: tail -f /var/log/postgresql/postgresql-8.3-main.log 2) start pcp_recovery_node, pcp_detach_node and then pcp_recovery_node again: pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1 pcp_detach_node -d 240 127.0.0.1 9898 user pass 1 pcp_recovery_node -d 240 127.0.0.1 9898 user pass 1 The log on node 1 will show postgres startup failure; pcp_recovery_node will "hang" until it times out. Is it expected? -- Tomasz Chmielewski http://wpkg.org _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
