On 15.01.2013 20:22, Fujii Masao wrote:
On Tue, Jan 15, 2013 at 11:05 PM, Heikki Linnakangas
<hlinnakan...@vmware.com>  wrote:
Now that a standby server can follow timeline switches through streaming
replication, we should do teach pg_receivexlog to do the same. Patch
attached.

I made one change to the way START_STREAMING command works, to better
support this. When a standby server reaches the timeline it's streaming from
the master, it stops streaming, fetches any missing timeline history files,
and parses the history file of the latest timeline to figure out where to
continue. However, I don't want to parse timeline history files in
pg_receivexlog. Better to keep it simple. So instead, I modified the
server-side code for START_STREAMING to return the next timeline's ID at the
end, and used that in pg_receivexlog. I also modifed BASE_BACKUP to return
not only the start XLogRecPtr, but also the corresponding timeline ID.
Otherwise we might try to start streaming from wrong timeline if you issue a
BASE_BACKUP at the same moment the server switches to a new timeline.

When pg_receivexlog switches timeline, what to do with the partial file on
the old timeline? When the timeline changes in the middle of a WAL segment,
the segment old the old timeline is only half-filled. For example, when
timeline changes from 1 to 2, you'll have this in pg_xlog:

000000010000000000000006
000000010000000000000007
000000010000000000000008
000000020000000000000008
00000002.history

The segment 000000010000000000000008 is only half-filled, as the timeline
changed in the middle of that segment. The beginning portion of that file is
duplicated in 000000020000000000000008, with the timeline-changing
checkpoint record right after the duplicated portion.

When we stream that with pg_receivexlog, and hit the timeline switch, we'll
have this situation in the client:

000000010000000000000006
000000010000000000000007
000000010000000000000008.partial

What to do with the partial file? One option is to rename it to
000000010000000000000008. However, if you then kill pg_receivexlog before it
has finished streaming a full segment from the new timeline, on restart it
will try to begin streaming WAL segment 000000010000000000000009, because it
sees that segment 000000010000000000000008 is already completed. That'd be
wrong.

Can't we rename .partial file safely after we receive a full segment
of the WAL file
with new timeline and the same logid/segmentid?

I'd prefer to leave the .partial suffix in place, as the segment really isn't complete. It doesn't make a difference when you recover to the latest timeline, but if you have a more complicated scenario with multiple timelines that are still "alive", ie. there's a server still actively generating WAL on that timeline, you'll easily get confused.

As an example, imagine that you have a master server, and one standby. You maintain a WAL archive for backup purposes with pg_receivexlog, connected to the standby. Now, for some reason, you get a split-brain situation and the standby server is promoted with new timeline 2, while the real master is still running. The DBA notices the problem, and kills the standby and pg_receivexlog. He deletes the XLOG files belonging to timeline 2 in pg_receivexlog's target directory, and re-points pg_recevexlog to the master while he re-builds the standby server from backup. At that point, pg_receivexlog will start streaming from the end of the zero-padded segment, not knowing that it was partial, and you have a hole in the archived WAL stream. Oops.

The DBA could avoid that by also removing the last WAL segment on timeline 1, the one that was partial. But it's really not obvious that there's anything wrong with that segment. Keeping the .partial suffix makes it clear.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to