Sometimes, recovery fails with the second node saying i.e.: Mar 12 05:38:28 db20 postgres[32140]: [5-1] 2010-03-12 05:38:28 SGT LOG: invalid primary checkpoint record Mar 12 05:38:28 db20 postgres[32140]: [6-1] 2010-03-12 05:38:28 SGT LOG: could not open file "pg_xlog/000000030000002D00000024" (log file 45, segment 36): No such file or Mar 12 05:38:28 db20 postgres[32140]: [6-2] directory Mar 12 05:38:28 db20 postgres[32140]: [7-1] 2010-03-12 05:38:28 SGT LOG: invalid secondary checkpoint record Mar 12 05:38:28 db20 postgres[32140]: [8-1] 2010-03-12 05:38:28 SGT PANIC: could not locate a valid checkpoint record Mar 12 05:38:28 db20 postgres[32139]: [1-1] 2010-03-12 05:38:28 SGT LOG: startup process (PID 32140) was terminated by signal 6: Aborted Mar 12 05:38:28 db20 postgres[32139]: [2-1] 2010-03-12 05:38:28 SGT LOG: aborting startup due to startup process failure
When this happens, such command never exits (it should take up to 240 seconds): # pcp_recovery_node -d 240 127.0.0.1 9898 user password 1 DEBUG: send: tos="R", len=45 DEBUG: recv: tos="r", len=21, data=AuthenticationOK DEBUG: send: tos="D", len=6 The only way to kill pgpool in this state is to use kill -9. Is it a known issue? I use pgpool-II 2.3.2.2. -- Tomasz Chmielewski http://wpkg.org _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
