Hello, everybody I have one problem and I need some help.
My environment: one master and one slave (PostgreSQL 9.2.2). My cluster has about 160GB and pg_basebackup to syncronize them (master and slave). The syntax is: pg_basebackup -h productionaddress -p productionport -U productionuser -D datadirectory -P -v My recovery.conf: standby_mode = 'on' primary_conninfo = 'host=productionaddress port=productionport user=productionuser' archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r' My postgresql.conf: (master) wal_level = hot_standby checkpoint_segments = 10 archive_mode = on archive_command = 'rsync -Crap %p postgres@slaveaddress :/slave/transactionlogs/%f' max_wal_senders = 1 wal_keep_segments = 50 My postgresql.conf: (slave) checkpoint_segments = 10 hot_standby = on In my slave (logs) I have the following erros: My first attempt 2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG: database system was interrupted; last known up at 2013-03-07 15:55:43 BRT 2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG: entering standby mode 2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG: streaming replication successfully connected to primary 2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG: consistent recovery state reached at 141/8FBB5F0 2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG: redo starts at 141/2251F90 2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL: could not access status of transaction 30622931 2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL: Could not read from file "pg_clog/001D" at offset 49152: Success. 2013-03-07 15:58:25 BRT [11817]: [7-1] user=,db= CONTEXT: xlog redo commit: 2013-03-07 15:55:40.673623-03 2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG: startup process (PID 11817) exited with exit code 1 2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG: terminating any other active server processes In my slave (logs) I have the following erros: My second attempt 2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG: database system was interrupted; last known up at 2013-03-11 12:06:31 BRT 2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG: entering standby mode 2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG: streaming replication successfully connected to primary 2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG: consistent recovery state reached at 168/816AE10 2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG: redo starts at 167/FEC3D828 2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL: could not access status of transaction 36529670 2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL: Could not read from file "pg_clog/0022" at offset 212992: Success. 2013-03-11 12:07:53 BRT [5862]: [7-1] user=,db= CONTEXT: xlog redo commit: 2013-03-11 12:05:35.069759-03 2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG: startup process (PID 5862) exited with exit code 1 2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG: terminating any other active server processes I had the same problem, but in different files (pg_clog): First attempt: 2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL: could not access status of transaction 30622931 2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL: Could not read from file "pg_clog/001D" at offset 49152: Success. Second attempt: 2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL: could not access status of transaction 36529670 2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL: Could not read from file "pg_clog/0022" at offset 212992: Success. When I started the synchronism at the first time (no data), it was OK. Now, my cluster has about 160GB and when the replication finishes I have these problems. I noticed that pg_clog/current_file is the problem. Any idea? Suggestions? Thank you. Best Regards João Paulo -- JotaComm http://jotacomm.wordpress.com