Sometimes, recovery fails with the second node saying i.e.:

Mar 12 05:38:28 db20 postgres[32140]: [5-1] 2010-03-12 05:38:28 SGT LOG:  
invalid primary checkpoint record
Mar 12 05:38:28 db20 postgres[32140]: [6-1] 2010-03-12 05:38:28 SGT LOG:  could 
not open file "pg_xlog/000000030000002D00000024" (log file 45, segment 36): No 
such file or
Mar 12 05:38:28 db20 postgres[32140]: [6-2]  directory
Mar 12 05:38:28 db20 postgres[32140]: [7-1] 2010-03-12 05:38:28 SGT LOG:  
invalid secondary checkpoint record
Mar 12 05:38:28 db20 postgres[32140]: [8-1] 2010-03-12 05:38:28 SGT PANIC:  
could not locate a valid checkpoint record
Mar 12 05:38:28 db20 postgres[32139]: [1-1] 2010-03-12 05:38:28 SGT LOG:  
startup process (PID 32140) was terminated by signal 6: Aborted
Mar 12 05:38:28 db20 postgres[32139]: [2-1] 2010-03-12 05:38:28 SGT LOG:  
aborting startup due to startup process failure


When this happens, such command never exits (it should take up to 240 seconds):

# pcp_recovery_node -d 240 127.0.0.1 9898 user password 1
DEBUG: send: tos="R", len=45
DEBUG: recv: tos="r", len=21, data=AuthenticationOK
DEBUG: send: tos="D", len=6

The only way to kill pgpool in this state is to use kill -9.


Is it a known issue?

I use pgpool-II 2.3.2.2.

-- 
Tomasz Chmielewski
http://wpkg.org
_______________________________________________
Pgpool-general mailing list
[email protected]
http://pgfoundry.org/mailman/listinfo/pgpool-general

Reply via email to