[GENERAL] Streaming Replication Failover

ning chan Wed, 16 Jan 2013 21:19:15 -0800

Hi,
I have a cluster of 3 nodes Primary is connected by StandbyA (streaming),
Standby A is connected by Standby B (streaming).
I failed over the cluster
1) stop primary
2) promoted StandbyA


Now i see from syslog on Standby B that it is complaining about the
timeline mismatch.

Replication Status from Primary
=============================================
|Parameters           |        Value        |
=============================================
|backend_start        | 2013-01-16 23:05:48 |
|pid                  |        17851        |
|usesysid             |          10         |
|usename              |       postgres      |
|application_name     |       StandbyA      |
|client_addr          |     10.89.94.31     |
|client_hostname      |                     |
|client_port          |        43558        |
|state                |      streaming      |
|sent_location        |      0/1EAC3E68     |
|write_location       |      0/1EAC3E68     |
|flush_location       |      0/1EAC3E68     |
|replay_location      |      0/1EAC3E68     |
|sync_priority        |          0          |
|sync_state           |        async        |
=============================================

Replication Status from Standby A
=============================================
|Parameters           |        Value        |
=============================================
|backend_start        | 2013-01-16 23:06:56 |
|pid                  |        12320        |
|usesysid             |          10         |
|usename              |       postgres      |
|application_name     |       StandByB      |
|client_addr          |     10.89.94.29     |
|client_hostname      |                     |
|client_port          |        48214        |
|state                |      streaming      |
|sent_location        |      0/1EAC3E68     |
|write_location       |      0/1EAC3E68     |
|flush_location       |      0/1EAC3E68     |
|replay_location      |      0/1EAC3E68     |
|sync_priority        |          0          |
|sync_state           |        async        |
=============================================

now fail over Primary
On StandByA syslog,
Jan 16 23:08:12 se032c-94-31 postgres[12316]: [3-1] 12316FATAL:
replication terminated by primary server
Jan 16 23:08:12 se032c-94-31 postgres[12312]: [5-1] 12312LOG:  redo starts
at 0/1EAC3E68

On StandByB syslog
Jan 16 23:09:48 localhost postgres[3932]: [5-1] LOG:  redo starts at
0/1EAC3E68

Now as soon as I promoted the StandByA,
i see replication between A & B is broken, from StandBy B syslog, it shows
the following.
Jan 16 23:11:28 localhost postgres[3945]: [2-1] FATAL:  timeline 15 of the
primary does not match recovery target timeline 14

Now my question is while A & B are in sync, why promoting B will break the
replication.

To resolve the problem, I need to do stop the engine on B, rsync from A,
and start back the B engine.
rsync -a --progress --exclude postgresql.conf --exclude recovery.done
--exclude pg_hba.conf root@10.89.94.31:/opt/postgres/9.2/data/*
/opt/postgres/9.2/data

Do I need to sync the whole data directory from A? I have a small DB now (2
tables with only few rows). This may take a long time if I have a much
larger DB. Any shortcut? Why do i need to do the rync while A & B are
originally in sync?

Thanks~
Ning

[GENERAL] Streaming Replication Failover

Reply via email to