On Tue, Jul 17, 2012 at 9:16 PM, Bruce Momjian <br...@momjian.us> wrote: > WAL is not guaranteed to be the same between PG major versions, so doing > anything with WAL is pretty much a no-go.
I understand that the WAL format changes, sometimes dramatically between versions. What I'm suggesting that the first WAL-record emitted by the binary upgrade process could be entitled "WAL-stream upgrade to 9.4" that would fail to be understood by old versions or possibly understood to mean "stop replay, you won't even understand what's about to be said." At that point, start up new version in the same cluster and have it continue replay from that position on forward, which should all be in the new format that it can understand. It need not understand the old format in that case, but the tricky part is this single record that tells the replayer of the old version to stop while a replayer of the new version somehow will know it is the right place to start. One mechanism could be a WAL file segment boundary: the standby could be told to exit when it finishes recovery of the segment 0000000100001234000055CD, and to start the new version beginning recovery at 0000000100001234000055CF (one higher), and that would be the first WAL emitted by pg_upgrade. In principle the same is possible using the fine-grained record position, such as XXXXX/NN, but may be more complex for not much gain. This also means the database would be stuck in an inconsistent state when it starts, not unlike when recovering from a on-line base backup. And that's totally reasonable: the new version has to start up presuming that the database cluster makes not enough sense to enter hot standby yet. Yet another mechanism is to not have the Postgres recovery-process apply the WAL, but rather some special purpose program that knows how to count through and apply specially-formatted WAL segments, and then set the resultant cluster to start recovering from the WAL past this span of specially-formatted WAL. The crux is to get some continuity in this stream, and there are many ways to slice it. Otherwise, the continuous archives will have a gap while a new base backup is taken of data that mostly rests unchanged. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers