Hi, I have a cluster of 3 nodes Primary is connected by StandbyA (streaming), Standby A is connected by Standby B (streaming). I failed over the cluster 1) stop primary 2) promoted StandbyA
Now i see from syslog on Standby B that it is complaining about the timeline mismatch. Replication Status from Primary ============================================= |Parameters | Value | ============================================= |backend_start | 2013-01-16 23:05:48 | |pid | 17851 | |usesysid | 10 | |usename | postgres | |application_name | StandbyA | |client_addr | 10.89.94.31 | |client_hostname | | |client_port | 43558 | |state | streaming | |sent_location | 0/1EAC3E68 | |write_location | 0/1EAC3E68 | |flush_location | 0/1EAC3E68 | |replay_location | 0/1EAC3E68 | |sync_priority | 0 | |sync_state | async | ============================================= Replication Status from Standby A ============================================= |Parameters | Value | ============================================= |backend_start | 2013-01-16 23:06:56 | |pid | 12320 | |usesysid | 10 | |usename | postgres | |application_name | StandByB | |client_addr | 10.89.94.29 | |client_hostname | | |client_port | 48214 | |state | streaming | |sent_location | 0/1EAC3E68 | |write_location | 0/1EAC3E68 | |flush_location | 0/1EAC3E68 | |replay_location | 0/1EAC3E68 | |sync_priority | 0 | |sync_state | async | ============================================= now fail over Primary On StandByA syslog, Jan 16 23:08:12 se032c-94-31 postgres[12316]: [3-1] 12316FATAL: replication terminated by primary server Jan 16 23:08:12 se032c-94-31 postgres[12312]: [5-1] 12312LOG: redo starts at 0/1EAC3E68 On StandByB syslog Jan 16 23:09:48 localhost postgres[3932]: [5-1] LOG: redo starts at 0/1EAC3E68 Now as soon as I promoted the StandByA, i see replication between A & B is broken, from StandBy B syslog, it shows the following. Jan 16 23:11:28 localhost postgres[3945]: [2-1] FATAL: timeline 15 of the primary does not match recovery target timeline 14 Now my question is while A & B are in sync, why promoting B will break the replication. To resolve the problem, I need to do stop the engine on B, rsync from A, and start back the B engine. rsync -a --progress --exclude postgresql.conf --exclude recovery.done --exclude pg_hba.conf root@10.89.94.31:/opt/postgres/9.2/data/* /opt/postgres/9.2/data Do I need to sync the whole data directory from A? I have a small DB now (2 tables with only few rows). This may take a long time if I have a much larger DB. Any shortcut? Why do i need to do the rync while A & B are originally in sync? Thanks~ Ning