Fujii Masao <masao.fu...@gmail.com> writes: > But, even though we will have done that, it should be noted that WAL in > A might be ahead of that in B. For example, A might crash right after > writing WAL to the disk and before sending it to B. So when we restart > the old master A as the standby after failover, we should need to delete > some WAL files (in A) which are inconsistent with the WAL sequence in B.
The idea to send from master to slave the current last applied LSN has been talked about already. It would allow to send the WAL content in parallel of it's local fsync() on the master, the standby would refrain from applying any WAL segment until it knows the master is past that. Now, given such a behavior, that would mean that when A joins again as a standby, it would have to ask B for the current last applied LSN too, and would notice the timeline change. Maybe by adding a facility to request the last LSN of the previous timeline, and with the behavior above applied there (skipping now-known-future-WALs in recovery), that would work automatically? There's still the problem of WALs that have been applied before recovery, I don't know that we can do anything here. But maybe we could also tweak the CHECKPOINT mecanism not to advance the restart point until we know the standbys have already replayed anything up to the restart point? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers